Can a raw Lucene index be loaded by Solr?

10,022

Solution 1

I have never tried this, but you would have to adjust the schema.xml to include all the fields of the documents that are in your Lucene index, because Solr won't allow you to search for a field if it is not defined in schema.xml.

The adjustment to schema.xml should also include defining the query-time analyzers to properly search in your field, especially if the field where indexed using custom analyzers.

In solrconfig.xml you may have to change settings in the indexDefaults and the mainIndex sections.

But I'd be happy to read answers from people who actually did it.

Solution 2

Three steps in the end:

  1. Change schema.xml or (managed-schema)
  2. Change <dataDir> in solrconfig.xml
  3. Restart Solr

I have my study notes here for those who are new to Solr, like me :)
To generate some lucene indexes yourself, you can use my code here.

public class LuceneIndex {
    private static Directory directory;

    public static void main(String[] args) throws IOException {
        long startTime = System.currentTimeMillis();

        // open
        Path path = Paths.get("/tmp/myindex/index");
        directory = new SimpleFSDirectory(path);
        IndexWriter writer = getWriter();

        // index
        int documentCount = 10000000;
        List<String> fieldNames = Arrays.asList("id", "manu");

        FieldType myFieldType = new FieldType();
        myFieldType.setIndexOptions(IndexOptions.DOCS_AND_FREQS);
        myFieldType.setOmitNorms(true);
        myFieldType.setStored(true);
        myFieldType.setTokenized(true);
        myFieldType.freeze();

        for (int i = 0; i < documentCount; i++) {
            Document doc = new Document();
            for (int j = 0; j < fieldNames.size(); j++) {
                doc.add(new Field(fieldNames.get(j), fieldNames.get(j) + Integer.toString(i), myFieldType));
            }
            writer.addDocument(doc);
        }
        // close
        writer.close();
        System.out.println("Finished Indexing");
        long estimatedTime = System.currentTimeMillis() - startTime;
        System.out.println(estimatedTime);
    }
    private static IndexWriter getWriter() throws IOException {
        return new IndexWriter(directory, new IndexWriterConfig(new WhitespaceAnalyzer()));
    }
}
Share:
10,022
Admin
Author by

Admin

Updated on June 03, 2022

Comments

  • Admin
    Admin almost 2 years

    Some colleagues of mine have a large Java web app that uses a search system built with Lucene Java. What I'd like to do is have a nice HTTP-based API to access those existing search indexes. I've used Nutch before and really liked how simple the OpenSearch implementation made it to grab results as RSS.

    I've tried setting Solr's dataDir in solrconfig.xml, hoping it would happily pick up the existing index files, but it seems to just ignore them.

    My main question is:

    Can Solr be used to access Lucene indexes created elsewhere? Or might there be a better solution?