How to identify the subject of a sentence?
Solution 1
As NLTK book (exercise 29) says, "One common way of defining the subject of a sentence S in English is as the noun phrase that is the child of S and the sibling of VP."
Look at tree example: indeed, "I" is the noun phrase that is the child of S that is the sibling of VP, while "elephant" is not.
Solution 2
You can use Spacy.
Code
import spacy
nlp = spacy.load('en')
sent = "I shot an elephant"
doc=nlp(sent)
sub_toks = [tok for tok in doc if (tok.dep_ == "nsubj") ]
print(sub_toks)
Solution 3
English language has two voices: Active voice and passive voice. Lets take most used voice: Active voice.
It follows subject-verb-object
model. To mark the subject, write a rule set with POS tags. Tag the sentence I[NOUN] shot[VERB] an elephant[NOUN]
. If you see the first noun is subject, then there is a verb and then there is an object.
If you want to make it more complicated, a sentence- I shot an elephant with a gun
. Here the prepositions or subordinate conjunctions like with, at, in can be given roles. Here the sentence will be tagged as I[NOUN] shot[VERB] an elephant[NOUN] with[IN] a gun[NOUN]
. You can easily say that word with gets instrumentative role. You can build a rule based system to get role of every word in the sentence.
Also look at the patterns in passive voice and write rules for the same.
Related videos on Youtube
singhalc
Updated on August 15, 2021Comments
-
singhalc over 2 years
Can Python + NLTK be used to identify the subject of a sentence? From what I have learned till now is that a sentence can be broken into a head and its dependents. For e.g. "I shot an elephant". In this sentence, I and elephant are dependents to shot. But How do I discern that the subject in this sentence is I.
-
singhalc about 9 yearsI wanted to link my question to a similar one asked previously - link and an answer given by @mjv. Perhaps the author of the question and/or the responder can shed more light. Thanks.
-
-
singhalc about 9 yearsThanks for pointing me to the appropriate section. I was able to identify the NP using the examples in the book, but I understand now that identifying the subject will be a combination of two criteria- child of S and sibling of VP. Can you also point me to a code example that identifies the subject in a sentence? Thanks.
-
John Sly almost 7 yearsThis is an old post, but how do you generate the tree without manually defining it? I haven't seen that yet.
-
rayryeng over 5 years? I don't understand this answer.
-
Darren Cook about 5 yearsThat issue is about giving non-unicode data to Spacy. Nothing to do with this question.
-
Sundeep Pidugu almost 4 yearsHow to find the active verb and passive verb?
-
Chitrang Sharma almost 3 yearsis there any way in js ?