Configuring Solr for Suggestive/Predictive Auto Complete Search

11,265

Solution 1

I would suggest you a couple of blogpost:

  • This one which shows you a really nice complete solution which works well but requires some additional work to be made, and uses a specific lucene index (solr core) for that specific purpose

Solution 2

I used the Highlight approach because the facet.prefix one is too heavy for big index, and the other ones had few or unclear documentation (i'm a stupid programmer)

So let's suppose the user has just typed "aaa bbb ccc"

Our autocomplete function (java/javascript) will call solr using the following params

q="aaa bbb"~100 ...base query, all the typed words except the last
fq=ccc* ...suggest word filter using last typed word
hl=true
hl.q=ccc* ...highlight word will be the one to suggest
fl=NONE ...return empty docs in result tag
hl.pre=### ...escape chars to locate highlight word in the response
hl.post=### ...see above

you can also control the number of suggestion with 'rows' and 'hl.fragsize' parameters

the highlight words in each document will be the right candidates for the suggestion with "aaa bbb" string

more suggestion words are the ones before/after the highlight words and, of course, you can implement more filters to extract valid words, avoid duplicates, limit suggestions

if interested i can send you some examples...

EDITED: Some further details about the approach

The portion of example i give supposes the 'autocomplete' mechanism given by jquery: we invoke a jsp (or a servlet) inside a web application passing as request param 'q' the words just typed by user.

This is the code of the jsp

ByteArrayInputStream is=null; // Used to manage Solr response
try{

  StringBuffer queryUrl=new StringBuffer('putHereTheUrlOfSolrServer');
  queryUrl.append("/select?wt=xml");
  String typedWords=request.getParameter("q");
  String base="";
  if(typedWords.indexOf(" ")<=0) {
    // No space typed by user: the 'easy case'
    queryUrl.append("&q=text:");
    queryUrl.append(URLEncoder.encode(typedWords+"*", "UTF-8"));
    queryUrl.append("&hl.q=text:"+URLEncoder.encode(typedWords+"*", "UTF-8"));
   } else {
    // Space chars present
    // we split the search in base phrase and last typed word
    base=typedWords.substring(0,typedWords.lastIndexOf(" "));
    queryUrl.append("&q=text:");
    if(base.indexOf(" ")>0)
        queryUrl.append("\""+URLEncoder.encode(base, "UTF-8")+"\"~1000");
    else
        queryUrl.append(URLEncoder.encode(base, "UTF-8"));

    typedWords=typedWords.substring(typedWords.lastIndexOf(" ")+1);
    queryUrl.append("&fq=text:"+URLEncoder.encode(typedWords+"*", "UTF-8"));
    queryUrl.append("&hl.q=text:"+URLEncoder.encode(typedWords+"*", "UTF-8"));
}

  // The additional parameters to control the solr response
  queryUrl.append("&rows="+suggestPageSize); // Number of results returned, a parameter to control the number of suggestions
  queryUrl.append("&fl=A_FIELD_NAME_THAT_DOES_NOT_EXIST"); // Interested only in highlights section, Solr return a 'light' answer
  queryUrl.append("&start=0"); // Use only first page of results
  queryUrl.append("&hl=true"); // Enable highlights feature
  queryUrl.append("&hl.simple.pre=***"); // Use *** as 'highlight border'
  queryUrl.append("&hl.simple.post=***"); // Use *** as 'highlight border'
  queryUrl.append("&hl.fragsize="+suggestFragSize); // Another parameter to control the number of suggestions
  queryUrl.append("&hl.fl=content,title"); // Look for result only in some fields
  queryUrl.append("&facet=false"); // Disable facets

  /* Omitted section: use a new URL(queryUrl.toString()) to get the solr response inside a byte array */

  is=new ByteArrayInputStream(solrResponseByteArray);

  DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
  DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
  Document doc = dBuilder.parse(is);
  XPathFactory xPathfactory = XPathFactory.newInstance();
  XPath xpath = xPathfactory.newXPath();
  XPathExpression expr = xpath.compile("//response/lst[@name=\"highlighting\"]/lst/arr[@name=\"content\"]/str");
  NodeList valueList = (NodeList) expr.evaluate(doc, XPathConstants.NODESET);

  Vector<String> suggestions=new Vector<String>();
  for (int j = 0; j < valueList.getLength(); ++j) {
     Element value = (Element) valueList.item(j);
     String[] result=value.getTextContent().split("\\*\\*\\*");
     for(int k=0;k<result.length;k++){
        String suggestedWord=result[k].toLowerCase();
        if((k%2)!=0){
             //Highlighted words management
             if(suggestedWord.length()>=suggestedWord.length() && !suggestions.contains(suggestedWord))
                 suggestions.add(suggestedWord);
        }else{
            /* Words before/after highlighted words
               we can put these words inside another vector
               and use them if not enough suggestions */
        }
     }
  }

  /* Finally we build a Json Answer to be managed by our jquery function */
  out.print(request.getParameter("json.wrf")+"({ \"suggestions\" : [");
  boolean firstSugg=true;       
  for(String suggestionW:suggestions) {
    out.print((firstSugg?" ":" ,"));
    out.print("{ \"suggest\" : \"");
    if(base.length()>0) {
        out.print(base);
        out.print(" ");
    }
    out.print(suggestionW+"\" }");
    firstSugg=false;
  }
  out.print(" ]})");
}catch (Exception x) {
  System.err.println("Exception during main process: " + x);
  x.printStackTrace();
}finally{
  //Gracefully close streams//
  try{is.close();}catch(Exception x){;}
}

Hope to be helpfull, Nik

Share:
11,265
Krunal
Author by

Krunal

Updated on June 04, 2022

Comments

  • Krunal
    Krunal almost 2 years

    We are working on integrating Solr 3.6 to an eCommerce site. We have indexed data & search is performing really good.

    We have some difficulties figuring how to use Predictive Search / Auto Complete Search Suggestion. Also interested to learn the best practices for implementing this feature.

    Our goal is to offer predictive search similar to http://www.amazon.com/, but don't know how to implement it with Solr. More specifically I want to understand how to build those terms from Solr, or is it managed by something else external to solr? How the dictionary should be built for offering these kind of suggestions? Moreover, for some field, search should offer to search in category. Try typing "xper" into Amazon search box, and you will note that apart from xperia, xperia s, xperia p, it also list xperia s in Cell phones & accessories, which is a category.

    Using a custom dictionary this would be difficult to manage. Or may be we don't know how to do it correctly. Looking to you to guide us on how best utilize solr to achieve this kind of suggestive search.

  • Dharmik Bhandari
    Dharmik Bhandari over 11 years
    Can you help me figure out this for Solr, for similar issue related to Suggester? stackoverflow.com/questions/12453600/…
  • Krunal
    Krunal over 8 years
    Hi, Its a completely different approach. Yes, please share the examples, that will be very helpful. Thank you!!