How to config SOLR server for spell check functionality

15,331

Solution 1

1.How do I make the database connection with my DB and search the content to see if there are any words that could match?

You would need to Index the data from MySql to Solr.
This can either be done by build an app to read the records from MySql and feeding the data to Solr.
Or as already answered use Data Import Handler (DIH) which will enable you to Make connection to MySql and load data and index it into Solr. Also, enable you to do incremental updates

2.How do I make the configuration.(solr-config.xml,shema.xml...etc)?

The field for Spell checker should be marked with Text analysis.
As your field is marked as string there is no tokenization.
Schema.xml

<field name="Cuvint" type="text" indexed="true" stored="true" required="true"/>

Also, for solrconfig.xml, replace the field you want to be consider for spell suggestion

<str name="field">examplew</str> <!-- Replace with field name as per your scheme -->

Check for the Example.

3.How do I send a string from my view(xhtml) so that the solr server knows what he looks for?

Usually, we implement this feature with Search and Spell suggestion combined in Solr request.
When we don't get any results from Solr, we check if spell check suggestion are available and display as Did you mean suggestion Also, instead of waiting for the Spell suggestion, we provide type ahead suggestion to the User that prevents a round trip to the Server.

4.How do I get the correct word from Cuvine database column, for example wodr I want solr to return word.

Check the Example to configure Spell Check and that should provide the suggestion.

Solution 2

Import your database into Solr using DataImoprtHandler to be able to search spellings in Solr.

Share:
15,331
MSA
Author by

MSA

Updated on June 05, 2022

Comments

  • MSA
    MSA almost 2 years

    I want to implement spellcheck functionality offered by Solr using MySql database, but I don't understand how.
    Here the basic flow of what I want to do.

    I have a simple inputText (in JSF) and if I type the word shwo the response to OutputLabel should be show.

    First of all I'm using the following tools and frameworks:

    JBoss application server 6.1.
    Eclipse
    JPA
    JSF(Primefaces)

    Steps I've done until now:

    Step 1: Download Solr server from: http://lucene.apache.org/solr/downloads.html Extract content.

    Step 2: Add to Envoierment variable(where you have the solr server):

    solr.solr.home=D:\JBOSS\solr-4.4.0\solr-4.4.0\example\solr
    

    Step 3:

    Open solr war and to solr.war\WEB-INF\web.xml add env-entry - (the easy way)

    <env-entry>
        <env-entry-name>solr/home</env-entry-name>
        <env-entry-value>D:\JBOSS\solr-4.4.0\solr-4.4.0\example\solr</env-entry-value>
        <env-entry-type>java.lang.String</env-entry-type>
    </env-entry>
    

    OR import project change and bulid war.

    Step 4: Browser: localhost:8080/solr/ And the solr console appears. Until now all works well.

    I have found some usefull code (my opinion) that returns:

    [collection1] webapp=/solr path=/spell params={spellcheck=on&q=whatever&wt=javabin&qt=/spell&version=2&spellcheck.build=true} hits=0 status=0 QTime=16

    Here is the code that gives the result from above:

    SolrServer solr;
    try {
      solr = new CommonsHttpSolrServer("http://localhost:8080/solr");
      ModifiableSolrParams params = new ModifiableSolrParams();
      params.set("qt", "/spell");
      params.set("q", "whatever");
      params.set("spellcheck", "on");
      params.set("spellcheck.build", "true");
    
      QueryResponse response = solr.query(params);
      SpellCheckResponse spellCheckResponse = response.getSpellCheckResponse();
      if (!spellCheckResponse.isCorrectlySpelled()) {
        for (Suggestion suggestion : spellCheckResponse.getSuggestions()) {
          System.out.println("original token: " + suggestion.getToken() + " - alternatives: " + suggestion.getAlternatives());
        }
      }
    } catch (Exception e) {
      // TODO Auto-generated catch block
      e.printStackTrace();
    }
    

    Also I added in data-config.xml

    <?xml version="1.0" encoding="UTF-8" ?>
    <dataConfig>
      <dataSource type="JdbcDataSource" name="altadict"
        driver="com.mysql.jdbc.Driver" 
        url="jdbc:mysql://localhost:3306/myproject"
        user="root"
        password=""
      />
    
      <document name="myproject">
        <entity name="myproject" query="SELECT * FROM words">
          <field column="Id" name="Id" />
          <field column="Cuvint" name="Cuvint" />
          <field column="TradDiac" name="TradDiac" />
          <field column="Explicatie" name="Explicatie" />
          <field column="TipCuvint" name="TipCuvint" />
          <field column="ItalicParant" name="ItalicParant" />
        </entity>
      </document>
    </dataConfig>
    

    schema.xml

    <field name="Id" type="tlong" indexed="true" stored="true" required="true"/>
    <field name="Cuvint" type="string" indexed="true" stored="true" required="true"/>
    <field name="TradDiac" type="string" indexed="true" stored="true" required="true"/>
    <field name="Explicatie" type="string" indexed="true" stored="true"/>
    <field name="TipCuvint" type="string" indexed="true" stored="true" required="true"/>
    <field name="ItalicParant" type="string" indexed="true" stored="true"/>
    

    solrconfig.xml

    <!-- altadict Request Handler -->
    
    <requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler">
      <lst name="defaults">
        <str name="config">data-config.xml</str>
      </lst>
    </requestHandler>
    
    <requestHandler name="/spell" class="solr.SearchHandler" startup="lazy">
      <lst name="defaults">
        <str name="df">Cuvint</str> 
        <str name="spellcheck.dictionary">default</str> 
        <str name="spellcheck">on</str>
        <str name="spellcheck.extendedResults">true</str> 
        <str name="spellcheck.count">10</str>
        <str name="spellcheck.maxResultsForSuggest">5</str> 
        <str name="spellcheck.collate">true</str>
        <str name="spellcheck.collateExtendedResults">true</str> 
        <str name="spellcheck.maxCollationTries">10</str>
        <str name="spellcheck.maxCollations">5</str> 
      </lst>
      <arr name="last-components">
        <str>spellcheck</str>
      </arr>
    </requestHandler>
    
    <searchComponent name="spellcheck" class="solr.SpellCheckComponent">
      <str name="queryAnalyzerFieldType">string</str> <!-- Replace with Field Type of your schema -->
      <lst name="spellchecker">
        <str name="name">default</str>
        <str name="field">examplew</str> <!-- Replace with field name as per your scheme -->
        <str name="spellcheckIndexDir">./spellchecker</str>
        <str name="buildOnOptimize">true</str>
        <str name="buildOnCommit">true</str>
      </lst>
    
      <!-- a spellchecker that uses a different distance measure -->
      <lst name="spellchecker">
        <str name="name">jarowinkler</str> 
        <str name="field">spell</str>
        <str name="distanceMeasure">org.apache.lucene.search.spell.JaroWinklerDistance</str>
        <str name="spellcheckIndexDir">./spellchecker2</str>
      </lst>
    </searchComponent>
    

    and libs

    Questions:

    1.How do I make the database connection whit my DB and search the content to see if there are any words that could match?
    2.How do I make the configuration.(solr-config.xml,shema.xml...etc)?
    3.How do I send a string from my view(xhtml) so that the solr server knows what he looks for?
    4.How do I get the correct word from Cuvine database column, for example wodr I want solr to return word.

    I read all the information about solr but it's still unclear:

    Links:Main Page:
    http://lucene.apache.org/solr/

    Main Page tutorial: http://lucene.apache.org/solr/4_4_0/tutorial.html

    Solr Wiki:
    http://wiki.apache.org/solr/Solrj --- official solrj documentation
    http://wiki.apache.org/solr/SpellCheckComponent

    Solr config: http://wiki.apache.org/solr/SolrConfigXml http://www.installationpage.com/solr/solr-configuration-tutorial-schema-solrconfig-xml/ http://wiki.apache.org/solr/SchemaXml

    StackOverflow proof: Solr Did you mean (Spell check component)

    Solr Database Integration:
    http://www.slideshare.net/th0masr/integrating-the-solr-search-engine
    http://www.cabotsolutions.com/2009/05/using-solr-lucene-for-full-text-search-with-mysql-db/

    Solr Spell Check:
    http://docs.lucidworks.com/display/solr/Spell+Checking
    http://searchhub.org/2010/08/31/getting-started-spell-checking-with-apache-lucene-and-solr/
    http://techiesinsight.blogspot.ro/2012/06/using-solr-spellchecker-from-java.html
    http://blog.websolr.com/post/2748574298/spellcheck-with-solr-spellcheckcomponent
    How to use SpellingResult class in SolrJ

    I really need your help.Regards.