Techniques To Improve Your Solr Search Results
Techniques To Improve Your Solr Search Results
Chris Johnson | VP, Engineering
August 20, 2013
Solr is a tremendously popular option for providing search functionality in Drupal. While Solr provides pretty good default results, making search results great requires analysis of what your users search for, consideration of which data is sent to Solr, and tuning of Solr's 'boosting'.
In this post, I will show you a few techniques that can help you leverage Solr to produce great results. I will specifically be covering the Apache Solr Search module. Similar concepts exist in the Search API Solr Search module, but with different methods of configuring boosting and altering data sent to Solr. Boosting is the method by which you can adjust how Solr calculates a score for the items in its index for a particular query.
A very simple example is when you may want an item with a word in the title field to score higher than an item with the word in the body field. Boosting is how you tell Solr to consider the title field more important and how much more important to consider it. After you've installed the Apache Solr Search module and connected it to your Solr instance, the first page you should check out is the bias tab for your search environment. It is here that you'll be able to set particular boosts that get set for things like the "sticky" setting on a node, the content type of a node, which field the data appears in, and even consideration of HTML tags surrounding content. For many cases, these settings will be all you need. When your tuning configuration exceeds these needs, you have a few options. You can alter the query being sent to Solr to add more refined boosting at search time, you can alter the document being sent to Solr to include more data or boost information at index time or you can do a combination of each. Let's take an example in which the taxonomy terms that appear higher in a hierarchy should get scored higher than those which are lower. This will allow top level term pages to show up higher in search results. First, we'll adjust the document being sent to Solr to include a single integer field indicating what the depth of the taxonomy term is in the hierarchy. Note that for this example we're using the Apache Solr Term module to add taxonomy terms to the index directly.
/**
* Implements hook_apachesolr_index_document_build().
*
* Add the depth of the taxonomy term to the index
*/
function MYMODULE_apachesolr_index_document_build(ApacheSolrDocument $document, $entity, $entity_type, $env_id) {
// 1 here should be replaced with the vocabulary id that matches up with the vocabulary of interest
$section_tree = taxonomy_get_tree(1, 0);
switch($document->bundle) {
case 'vocabulary_machine_name':
foreach ($section_tree as $term_data) {
if ($term_data->tid == $entity->tid) {
$document->addField('is_taxonomy_depth', $term_data->depth + 1);
break;
}
}
break;
default:
break;
}
}
Now, at query time, we'll use the value of the depth to affect the score. This could have been done at index time, but doing the calculation at query time allows us to change the value of the boost without the need to reindex.
/**
* Implements hook_apachesolr_query_alter().
*
* Add a boost based on the depth of a term in the hierarchy, lower depth = higher score
*/
function MYMODULE_apachesolr_query_alter($query) {
/*
* Add a boost for depth of the section so higher sections win over lower ones
* depth values run from 1 to INF (but we'll use 10 as a max). Items that aren't
* terms have no depth field.
*
* Add a boost by subtracting the depth from 10 but ensure items that aren't a
* section don't get the maximum boost because they have no depth (0) but instead
* give them no boost. Also, items outside of the 1-10 range will get no boost.
* boost = 5 * (10 - depth if depth between 1 and 10, 10 otherwise)
*
* map(field,min,max,target,value) function set anything within min and max to target,
* otherwise set to value
* max(x,y) return max of x and y
* sub(x,y) return x - y
* product(x,y) return x * y
*/
$query->addParam('bf', 'product(5,sub(10,max(is_taxonomy_depth,map(is_taxonomy_depth,1,10,0,10))))');
}
For more advanced reading, check out the following:
- Solr Wiki Relvancy FAQ
- DisMax query parser wiki page - includes description of many of the parameters that can be used for boosting
- Solr Function Reference - describes the functions that can be used in calculating boost values.