What is a good Java library for Parts-Of-Speech tagging?

22,948

Solution 1

Are you looking to tag POS in a specific domain? Most of the general purpose taggers are trained on newswire text. Typically they don't perform well when you are using them in specific domains (such and biomedical text). There are other taggers specifically trained for such domains such as dTagger (java) for biomedical text.

For newswire text, Adwait Ratnaparkhi's MXPOST is very good and is the one I would recommend.

Other Java implementations include:

  1. MontyLingua
  2. Berkeley Parser (Not really a POS tagger but all full blown parsers will typically include POS taggers. Google for Java syntactic parsers and you will find many.)
  3. QTag
  4. LBJ

OpenNLP and Lingpipe as posted by the other posters are also pretty decent.

Info on the state-of-the-art on POS tagging can be found here. As you can see LTAG-Spinal (also mentioned by another poster) ranks best as of now, but the variation across the various taggers is not much. I have not used LTAG myself.

Also note that the baseline performance for POS tagging is about 90%. Baseline means - (a) tag every word by most frequent POS tag from a lexicon, and (b) tag every unknown word as a noun.

Solution 2

I have used OpenNLP with good results. You can also check out MorphAdorner.

Solution 3

I've used both LingPipe and Stanford's POS Tagger. The later is a state-of-the-art POS Tagger but, from my experience, it is too slow (although they do provide less accurate models, which are reasonably fast). Of course, it always depends on what you are trying to achieve, and there will always be a trade-off between speed and accuracy.

I've also once used an LBJ-based NER software and, although it was pretty accurate, the source code was a complete mess. Both LingPipe and Stanford's source is very clean and well documented.

You can also take a look at LTAG-spinal. I haven't used it yet, but from the algorithm description, and from the listed accuracy, it sure seems better than the alternatives you have so far.

Hope it helps.

Share:
22,948
Glenn
Author by

Glenn

My focus is working with engineers in order to deliver scalable, server side, 12 factor compliant application architectures. I was a breakout speaker at Adobe's internal Advertising Cloud developer's conference in 2018 and 2017 and at the 2012 Lucene Revolution conference in Boston. I specialize in breaking monolithic applications up into microservices and in deep integration with Real-Time Communications infrastructure.

Updated on January 14, 2020

Comments