Extracting the person names in the named entity recognition in NLP using Python
15,008
In Long
Please read these carefully:
- https://stackoverflow.com/a/49345866/610569
- Extract list of Persons and Organizations using Stanford NER Tagger in NLTK
Understand the solution, don't just copy and paste.
TL;DR
In terminal:
pip install -U nltk
wget http://nlp.stanford.edu/software/stanford-corenlp-full-2016-10-31.zip
unzip stanford-corenlp-full-2016-10-31.zip && cd stanford-corenlp-full-2016-10-31
java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer \
-preload tokenize,ssplit,pos,lemma,parse,depparse \
-status_port 9000 -port 9000 -timeout 15000
In Python
from nltk.tag.stanford import CoreNLPNERTagger
def get_continuous_chunks(tagged_sent):
continuous_chunk = []
current_chunk = []
for token, tag in tagged_sent:
if tag != "O":
current_chunk.append((token, tag))
else:
if current_chunk: # if the current chunk is not empty
continuous_chunk.append(current_chunk)
current_chunk = []
# Flush the final current_chunk into the continuous_chunk, if any.
if current_chunk:
continuous_chunk.append(current_chunk)
return continuous_chunk
stner = CoreNLPNERTagger()
tagged_sent = stner.tag('Rami Eid is studying at Stony Brook University in NY'.split())
named_entities = get_continuous_chunks(tagged_sent)
named_entities_str_tag = [(" ".join([token for token, tag in ne]), ne[0][1]) for ne in named_entities]
print(named_entities_str_tag)
[out]:
[('Rami Eid', 'PERSON'), ('Stony Brook University', 'ORGANIZATION'), ('NY', 'LOCATION')]
You might find this help too: Unpacking a list / tuple of pairs into two lists / tuples
Author by
Doubt Dhanabalu
Updated on June 12, 2022Comments
-
Doubt Dhanabalu almost 2 years
I have a sentence for which i need to identify the Person names alone:
For example:
sentence = "Larry Page is an American business magnate and computer scientist who is the co-founder of Google, alongside Sergey Brin"
I have used the below code to identify the NERs.
from nltk import word_tokenize, pos_tag, ne_chunk print(ne_chunk(pos_tag(word_tokenize(sentence))))
The output i received was:
(S (PERSON Larry/NNP) (ORGANIZATION Page/NNP) is/VBZ an/DT (GPE American/JJ) business/NN magnate/NN and/CC computer/NN scientist/NN who/WP is/VBZ the/DT co-founder/NN of/IN (GPE Google/NNP) ,/, alongside/RB (PERSON Sergey/NNP Brin/NNP))
I want to extract all the person names, such as
Larry Page Sergey Brin
In order to achieve this, I refereed this link and tried this.
from nltk.tag.stanford import StanfordNERTagger st = StanfordNERTagger('/usr/share/stanford-ner/classifiers/english.all.3class.distsim.crf.ser.gz','/usr/share/stanford-ner/stanford-ner.jar')
However i continue to get this error:
LookupError: Could not find stanford-ner.jar jar file at /usr/share/stanford-ner/stanford-ner.jar
Where can i download this file?
As informed above, the result that i am expecting in the form of list or dictionary is :
Larry Page Sergey Brin
-
Akshat Zala almost 4 yearsWhich version is this?
-
Akshat Zala almost 4 yearsImportError: cannot import name 'CoreNLPNERTagger' from 'nltk.tag.stanford' (/home/akshatz/.local/lib/python3.8/site-packages/nltk/tag/stanford.py)