Search is a big issue here at XING. Each day, we are processing numerous search queries, spread across events, jobs, company profiles and user search. For such a high search volume, you need rock solid technology that you can trust. For the last two and a half years, this technology has been Ferret.
Why we chose Ferret initally
Ferret was introduced when we needed a search service for XING Jobs, which was the first part of our application using the Ruby on Rails framework. At that time, Ferret was just right. As it is written in super fast C code, it became first choice when it comes to performance. But it is also partly written in Ruby, which allowed us to write the analyzer, query parser and tokenizer in our favorite language (oh the joy!). Ferret has served us very well until now.
But things have changed. Over the years, we have extended the Ferret-search to XING Events and Company Profiles, increasing our initial search volume by magnitudes. Furthermore, the search queries created by our application became far more complex, for example using several languages in just one indexed field. This lead to scaling problems, and our experiences using a DRb-based (think RMI for Ruby) remote interface for the search service turned out to be brittle and crash prone under certain circumstances, especially when multiple clients caused high load on the search servers. Another problem we faced was that, as a new developer it takes quite some time to become acquainted with Ferret and our implementations around it.
So on one of those occasions where we had to go into debugging mode deep within the DRb world, we really wished to have a simpler remote interface, maybe something like HTTP that could easily be scaled by standard means (think load balancers). On another occasion we encountered some serious memory leaks that had gone unnoticed under moderate load with simple queries. They had to be fixed by ourselves because the few official maintainers were busy with other projects or generally unavailable when we tried to reach them. Patches in their bug tracker were left untouched for months, and the official website is frequently unavailable.
We were left with a couple of choices: We could basically maintain ferret ourselves, find someone who would do it for us or find an alternative and switch. As you can imagine this ensured a lot of heated discussions within our engineering department, since in the end ferret was “kind of working”. As the frequency of problems increased, we chose to investigate other possibilities and technologies.
Ferret-ing out alternatives
One of the main lessons we learned while staying with ferret, and we learned it the hard way, was that a good search engine needs a good community of users and developers contributing. There was no such thing for Ferret. Of all the named arguments, this is the most important one. In general we think that when using an open source product, you better choose one that has a lively community around it. This guarantees that errors that were fixed by third parties will flow back into the main product, that you find experts to consult you around the world and that you can exchange knowledge on forums, wikis and conferences. So we started looking for a search service that:
- Was easy to scale
- Had a “developer-friendly” interface we could hook into
- Was actively developed and supported
So we evaluated SOLR, which is based on the Lucene information retrieval library, just like Ferret. This means the syntax for our analyzers, query parsers and tokenizers do not differ so much from what we are used to. But unlike Ferret, SOLR is a real industry standard. There are lots of maintainers, bug reporters, wiki authors, consulting companies and conference panels, so in short: There is a profound ecosystem surrounding the SOLR search engine.
We carefully watched how long it takes until a SOLR maintainer reacts to an issue created in their JIRA bugtracker, and these guys are really fast and willing to solve all incoming bugs. They include patches written by third parties in the official sources, and release new versions of their software regularly. Furthermore, as SOLR is an industry standard, we already had developers in our team that have worked with it before. And, last but not least, SOLR works over a standard HTTP interface. On the one hand, this liberates us from awkward DRb code, on the other hand this allows painless scaling with standard HTTP load balancers. Furthermore, the acts_as_solr Rails plugin makes it feel like any other Ruby library.
While Ferret works with Ruby only, SOLR can be used with a plethora of programming languages. So it is no wonder that our fellow perl programmers have already made the switch to SOLR about a year ago. Employing only one search engine for all parts of our application will reduce the maintenance effort in the future. The shorter training period will ensure that all of our developers (even the ones trying to avoid DRb) will be familiar with our search technology. And, beyond that, SOLR offers far more features than Ferret, like faceted search, allowing us to implement a more powerful and user friendly search, for example through freely filtering and ordering search results by all available dimensions.
The upcoming migration
The introduction of SOLR follows a very smooth migration path. First, we cleaned up our own search interface for the Rails application, which is search engine agnostic now. Then we configured SOLR to mimic the behavior of our proven and tested Ferret search engine, to have a drop in replacement. Next, we will pick a beta group out of our 7 million members. This is what happens next, so you might be among them.
Search queries of this beta group will be run by SOLR instead of Ferret to compare the quality of the results. As soon as we are sure everything works as expected, we will extend the beta group to all users gradually. At this point in time, we are ready to develop new and awesome search features with SOLR technology under the hood.
Link to this article:
http://blog.xing.com/2009/07/migrating-our-search-from-ferret-to-solr/trackback/




XING´s official twitter account
Leave a comment