This benchmark compares thinking_sphinx with acts_as_xapian. We need a search engine that gives us the IDs of matching documents from a fulltext index, basic text search only.
Data
- one table with 200k entries with 5k of text (avg) in one column
- one table with 500k entries with 7k of text (avg) in 6 columns
- one table with 500k entries with 7k of text (avg) in 4 columns
Indexing
Initial indexing took 10 mins with thinking_sphins and 75 mins(!!) on acts_as_xapian
Search performance
The search performance on queries that return only a few items is nearly identical.
The search performance on queries that return many items (~10000) is nearly
identical, 90% of the time is spend in ActiveRecord.
In our case – we only need IDs and not the entire documents – sphinx runs
at 0.6 secs for a particular query (with 10000 results),
where acts_as_xapian needs 4.5 secs. This is because thinking_sphinx allows
you to only fetch the ids, where acts_as_xapian insists of pulling the
models from the database. When patching acts_as_xapian to allow for pulling
ids only, we land at 0.6 vs 0.4 secs.
Results
We will choose sphinx because
- it is similarily fast to xapian
- runs over the network by default
- Indexing is way faster (I guess because acts_as_xapian pulls all data to be index from the database to hand it over, while sphinx can do that itself)
- acts_as_xapian would need to be patched for performance reasons.
And here is some food for our beloved web spiders
Filed under: Uncategorized | Leave a Comment
Tags: acts_as_xapian, fulltext, performance, rails, ruby, search, Sphinx, thinking_sphinx, Xapian