Python's NLTK vs. related Java Libraries?

12,999

Solution 1

NLTK is good for natural language processing. I've used it for my data-mining project. You can train your own analyzer. The learning curve is not steep.

NLTK got huge corpus for training of your analyzer. You can also provide your own set of data, for example, a journal which a part-of-speech tagged.

Because python is very good for text processing, you may to give it a try. Plus, it got a online tutorial

Please don't forget to use python 2.x version. Try python 2.6. NLTK may not be good with python 3.x

Solution 2

If you already understand the basics of NLP, I think NLTK should be pretty easy to pick up. It's got a bunch of documentation, 2 books, and I've written a number of articles & tutorials on streamhacker.com. And if there's anything from the Java packages you don't want to lose, you could theoretically combine it with NLTK using Jython (and perhaps execnet).

You also may want to take a look at the Pattern library.

Share:
12,999

Related videos on Youtube

wnewport
Author by

wnewport

Updated on June 04, 2022

Comments

  • wnewport
    wnewport almost 2 years

    I've used LingPipe, Stanford's NER, RiTa and various sentence similarity libraries for my previous Java projects that focused on text (pre)processing (indexing, xml tagging, topic detection, etc.) of large amounts of English text (around 10,000 documents summing to > 1gb of text). Maybe I'm a bad Java programmer, but I find myself typing a lot of code and using a lot of libraries when I switch to a different corpus. Overall, I feel like there might be a better tool for the job.

    I guess my question is, will I benefit from switching to Python and NLTK for information retrieval / language processing? Or are there enough pros and cons to make it very subjective? Is NLTK intuitive enough to be learned quickly?

    I'd get my hands dirty, but I won't have access to a personal machine for the next few days.

  • wnewport
    wnewport about 13 years
    Thank you for the advice. It seems like most people in the field prefer Python and NLTK, but the shear number of Java solutions made me think I was abandoning a more buoyant ship.
  • lamwaiman1988
    lamwaiman1988 about 13 years
    I think I know you.....Jacob, quoted some article of you in my graduation paper!
  • Jacob
    Jacob about 13 years
    Cool, always nice to be quoted :)
  • Andy Chase
    Andy Chase over 11 years
    You can't use Jython because there's no numpy on jython :(.

Related