Tagging a single word with the nltk pos tagger tags each letter instead of the word
11,183
Solution 1
nltk.tag.pos_tag
accepts a list of tokens, separate and tags its elements. Therefore you need to put your words in an iterable like list:
>>> nltk.tag.pos_tag(['going'])
[('going', 'VBG')]
Solution 2
>>> word = 'going'
>>> word = nltk.word_tokenize(word)
>>> l1 = nltk.pos_tag(word)
>>> l1
[('going', 'VBG')]
Solution 3
Return the POS
tag of one word
nltk.pos_tag(["going"])
----->[('going', 'VBG')]
Author by
jksnw
Updated on June 16, 2022Comments
-
jksnw almost 2 years
I'm try to tag a single word with the nltk pos tagger:
word = "going" pos = nltk.pos_tag(word) print pos
But the output is this:
[('g', 'NN'), ('o', 'VBD'), ('i', 'PRP'), ('n', 'VBP'), ('g', 'JJ')]
It's tagging each letter rather than just the one word.
What can I do to make it tag the word?
-
jksnw about 9 yearsI know it's meant to work on list but can it work on a single word?
-
Alaa M. over 6 yearsNote that this tags the sentence as whole (I know the OP asked about 1 word but this might be confusing)
-
Mazdak over 6 years@AlaaM. What do you mean by tagging a sentence as a whole? The
pos
tagging aims to tag the words based on their initial character and their position in the sentence. That's why the tag is composed of multiple character. -
Alaa M. over 6 yearsI'm just saying if you have more than one word then do
nltk.tag.pos_tag('a sentence'.split())
, and notnltk.tag.pos_tag(['a sentence'])
, because the latter would produce a single tag -
Mazdak over 6 years@AlaaM. Definitely, that's why I liked to the documentation. Also I updated the answer since it was for a long while ago and full of confusion ;)).
-
JoeF over 5 yearsThis is just a technical (and probably overly pedantic) clarification. The problem is that
pos_tag
accepts any iterable, not just lists. It iterates over the items in that iterable (characters in the case of strings, items in the case of lists) and attempts to tag those items. I'm sure you are aware of this, but I thought I would just provide more clarification for those who are wondering why the output is the way it is. -
Jonathan over 2 yearsThis solution has already been provided in this answer.