Named Entity Recognition for NLTK in Python. Identifying the NE
Solution 1
This answer may be off base, and in which case I'll delete it, as I don't have NLTK installed here to try it, but I think you can just do:
>>> sent3[2].node
'NE'
sent3[2][0]
returns the first child of the tree, not the node itself
Edit: I tried this when I got home, and it does indeed work.
Solution 2
Below is my code:
chunks = ne_chunk(postags, binary=True)
for c in chunks:
if hasattr(c, 'node'):
myNE.append(' '.join(i[0] for i in c.leaves()))
Solution 3
This will work
for sent in chunked_sentences:
for chunk in sent:
if hasattr(chunk, "label"):
print(chunk.label())
Solution 4
Now sent3[2].node is outdated.
use sent3[2].label() instead
Solution 5
I agree with bdk
sent3[2].node
O/P - 'NE'
I think there is no function in nltk to do it.Above solution will work but for reference you can check here
for looping problem you can do :-
for i in range(len(sent3)):
if "NE" in str(sent3[i]):
print sent3[i].node
I have executed this in nltk and it works fine..
Asl506
Updated on June 29, 2022Comments
-
Asl506 almost 2 years
I need to classify words into their parts of speech. Like a verb, a noun, an adverb etc.. I used the
nltk.word_tokenize() #to identify word in a sentence nltk.pos_tag() #to identify the parts of speech nltk.ne_chunk() #to identify Named entities.
The out put of this is a tree. Eg
>>> sentence = "I am Jhon from America" >>> sent1 = nltk.word_tokenize(sentence ) >>> sent2 = nltk.pos_tag(sent1) >>> sent3 = nltk.ne_chunk(sent2, binary=True) >>> sent3 Tree('S', [('I', 'PRP'), ('am', 'VBP'), Tree('NE', [('Jhon', 'NNP')]), ('from', 'IN'), Tree('NE', [('America', 'NNP')])])
When accessing the element in this tree, i did it as follows:
>>> sent3[0] ('I', 'PRP') >>> sent3[0][0] 'I' >>> sent3[0][1] 'PRP'
But when accessing a Named Entity:
>>> sent3[2] Tree('NE', [('Jhon', 'NNP')]) >>> sent3[2][0] ('Jhon', 'NNP') >>> sent3[2][1] Traceback (most recent call last): File "<pyshell#121>", line 1, in <module> sent3[2][1] File "C:\Python26\lib\site-packages\nltk\tree.py", line 139, in __getitem__ return list.__getitem__(self, index) IndexError: list index out of range
I got the above error.
What i want is to get the output as 'NE' similar to the previous 'PRP' so i cant identify which word is a Named Entity. Is there any way of doing this with NLTK in python?? If so please post the command. Or is there a function in the tree library to do this? I need the node value 'NE'
-
Jacob about 13 yearsBefore looking at the node attribute, you'll want to check if isinstance(sent3[2], Tree) (after doing from nltk.tree import Tree).
-
Asl506 about 13 years@Jacob Thanks mate, Really helpful. The next problem i faced was on how to know if an element is a tree or not. As i needed to iterate through the elements using a for loop. The if isinstance(sent3[2], Tree) is what i have been looking for all this while. Thanks again.
-
Vladimir over 8 yearsin current version (3.1)
node
is replaced bylabel()