Extracting the person names in the named entity recognition in NLP using Python

15,008

In Long

Please read these carefully:

Understand the solution, don't just copy and paste.


TL;DR

In terminal:

pip install -U nltk

wget http://nlp.stanford.edu/software/stanford-corenlp-full-2016-10-31.zip
unzip stanford-corenlp-full-2016-10-31.zip && cd stanford-corenlp-full-2016-10-31

java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer \
-preload tokenize,ssplit,pos,lemma,parse,depparse \
-status_port 9000 -port 9000 -timeout 15000

In Python

from nltk.tag.stanford import CoreNLPNERTagger

def get_continuous_chunks(tagged_sent):
    continuous_chunk = []
    current_chunk = []

    for token, tag in tagged_sent:
        if tag != "O":
            current_chunk.append((token, tag))
        else:
            if current_chunk: # if the current chunk is not empty
                continuous_chunk.append(current_chunk)
                current_chunk = []
    # Flush the final current_chunk into the continuous_chunk, if any.
    if current_chunk:
        continuous_chunk.append(current_chunk)
    return continuous_chunk


stner = CoreNLPNERTagger()
tagged_sent = stner.tag('Rami Eid is studying at Stony Brook University in NY'.split())

named_entities = get_continuous_chunks(tagged_sent)
named_entities_str_tag = [(" ".join([token for token, tag in ne]), ne[0][1]) for ne in named_entities]


print(named_entities_str_tag)

[out]:

[('Rami Eid', 'PERSON'), ('Stony Brook University', 'ORGANIZATION'), ('NY', 'LOCATION')]

You might find this help too: Unpacking a list / tuple of pairs into two lists / tuples

Share:
15,008
Doubt Dhanabalu
Author by

Doubt Dhanabalu

Updated on June 12, 2022

Comments

  • Doubt Dhanabalu
    Doubt Dhanabalu almost 2 years

    I have a sentence for which i need to identify the Person names alone:

    For example:

    sentence = "Larry Page is an American business magnate and computer scientist who is the co-founder of Google, alongside Sergey Brin"
    

    I have used the below code to identify the NERs.

    from nltk import word_tokenize, pos_tag, ne_chunk
    print(ne_chunk(pos_tag(word_tokenize(sentence))))
    

    The output i received was:

    (S
      (PERSON Larry/NNP)
      (ORGANIZATION Page/NNP)
      is/VBZ
      an/DT
      (GPE American/JJ)
      business/NN
      magnate/NN
      and/CC
      computer/NN
      scientist/NN
      who/WP
      is/VBZ
      the/DT
      co-founder/NN
      of/IN
      (GPE Google/NNP)
      ,/,
      alongside/RB
      (PERSON Sergey/NNP Brin/NNP))
    

    I want to extract all the person names, such as

    Larry Page
    Sergey Brin
    

    In order to achieve this, I refereed this link and tried this.

    from nltk.tag.stanford import StanfordNERTagger
    st = StanfordNERTagger('/usr/share/stanford-ner/classifiers/english.all.3class.distsim.crf.ser.gz','/usr/share/stanford-ner/stanford-ner.jar')
    

    However i continue to get this error:

    LookupError: Could not find stanford-ner.jar jar file at /usr/share/stanford-ner/stanford-ner.jar
    

    Where can i download this file?

    As informed above, the result that i am expecting in the form of list or dictionary is :

    Larry Page
    Sergey Brin
    
  • Akshat Zala
    Akshat Zala almost 4 years
    Which version is this?
  • Akshat Zala
    Akshat Zala almost 4 years
    ImportError: cannot import name 'CoreNLPNERTagger' from 'nltk.tag.stanford' (/home/akshatz/.local/lib/python3.8/site-packages/nltk/tag/s‌​tanford.py)