Training own model in opennlp

18,779

Solution 1

https://opennlp.apache.org/docs/1.5.3/manual/opennlp.html

This website is very useful, shows both in code, and using the OpenNLP application to train models for all different types, like entity extraction and part of speech etc.

I could give you some code examples in here, but the page is very clear to use.

Theory-wise:

Essentially you create a file which lists the stuff you want to train

eg.

Sport [whitespace] this is a page about football, rugby and stuff

Politics [whitespace] this is a page about tony blair being prime minister.

The format is described on the page above (each model expects a different format). once you have created this file, you run it through either the API or the opennlp application (via command line), and it generates a .bin file. Once you have this .bin file, you can load it into a model, and start using it (as per the api in the above website).

Solution 2

First you need to train the data with the required Entity.

Sentences should be separated with new line character (\n). Values should be separated from and tags with a space character.
Let's say you want to create medicine entity model, so data should be something like this:

<START:medicine> Augmentin-Duo <END> is a penicillin antibiotic that contains two medicines - <START:medicine> amoxicillin trihydrate <END> and 
<START:medicine> potassium clavulanate <END>. They work together to kill certain types of bacteria and are used to treat certain types of bacterial infections.

You can refer a sample dataset for example. Training data should have at least 15000 sentences to get the better results.

Further you can use Opennlp TokenNameFinderTrainer. Output file will be in the .bin format.

Here is the example: Writing a custom NameFinder model in OpenNLP

For more details, refer the Opennlp documentation

Solution 3

Perhaps this article will help you out. It describes how to do TokenNameFinder training from data extracted from Wikipedia...

Solution 4

Copy the data in data and run below code to get your own mymodel.bin .

Can refer for data=https://github.com/mccraigmccraig/opennlp/blob/master/src/test/resources/opennlp/tools/namefind/AnnotatedSentencesWithTypes.txt

public class Training {
       static String onlpModelPath = "mymodel.bin";
       // training data set
       static String trainingDataFilePath = "data.txt";

       public static void main(String[] args) throws IOException {
                       Charset charset = Charset.forName("UTF-8");
                       ObjectStream<String> lineStream = new PlainTextByLineStream(
                                                       new FileInputStream(trainingDataFilePath), charset);
                       ObjectStream<NameSample> sampleStream = new NameSampleDataStream(
                                                       lineStream);
                       TokenNameFinderModel model = null;
                       HashMap<String, Object> mp = new HashMap<String, Object>();
                       try {
                              //         model = NameFinderME.train("en","drugs", sampleStream, Collections.<String,Object>emptyMap(),100,4) ;
                                       model=  NameFinderME.train("en", "drugs", sampleStream, Collections. emptyMap());
                       } finally {
                                       sampleStream.close();
                       }
                       BufferedOutputStream modelOut = null;
                       try {
                                       modelOut = new BufferedOutputStream(new FileOutputStream(onlpModelPath));
                                       model.serialize(modelOut);
                       } finally {
                                       if (modelOut != null)
                                                       modelOut.close();
                       }
       }
}
Share:
18,779

Related videos on Youtube

user1482228
Author by

user1482228

Updated on December 28, 2020

Comments

  • user1482228
    user1482228 over 3 years

    I am finding it difficult to create my own model openNLP. Can any one tell me, how to own model. How the training shouls be done.

    What should be the input and where the output model file will get stored.

    • wcolen
      wcolen almost 12 years
      For which tool are you creating a model?
  • Toby Speight
    Toby Speight over 7 years
    Welcome to Stack Overflow! Although this code may help to solve the problem, it doesn't explain why and/or how it answers the question. Providing this additional context would significantly improve its long-term educational value. Please edit your answer to add explanation, including what limitations and assumptions apply.
  • demongolem
    demongolem over 7 years
    Or could have said RTFM to save yourself some typing.
  • Suneel Marthi
    Suneel Marthi almost 7 years
    let me point you to the latest documentation at opennlp.apache.org/docs/1.8.1/manual/opennlp.html