How to identify the subject of a sentence?

29,287

Solution 1

As NLTK book (exercise 29) says, "One common way of defining the subject of a sentence S in English is as the noun phrase that is the child of S and the sibling of VP."

Look at tree example: indeed, "I" is the noun phrase that is the child of S that is the sibling of VP, while "elephant" is not.

Solution 2

You can use Spacy.

Code

import spacy
nlp = spacy.load('en')
sent = "I shot an elephant"
doc=nlp(sent)

sub_toks = [tok for tok in doc if (tok.dep_ == "nsubj") ]

print(sub_toks) 

Solution 3

English language has two voices: Active voice and passive voice. Lets take most used voice: Active voice.

It follows subject-verb-object model. To mark the subject, write a rule set with POS tags. Tag the sentence I[NOUN] shot[VERB] an elephant[NOUN]. If you see the first noun is subject, then there is a verb and then there is an object.

If you want to make it more complicated, a sentence- I shot an elephant with a gun. Here the prepositions or subordinate conjunctions like with, at, in can be given roles. Here the sentence will be tagged as I[NOUN] shot[VERB] an elephant[NOUN] with[IN] a gun[NOUN]. You can easily say that word with gets instrumentative role. You can build a rule based system to get role of every word in the sentence.

Also look at the patterns in passive voice and write rules for the same.

Share:
29,287

Related videos on Youtube

singhalc
Author by

singhalc

Updated on August 15, 2021

Comments

  • singhalc
    singhalc over 2 years

    Can Python + NLTK be used to identify the subject of a sentence? From what I have learned till now is that a sentence can be broken into a head and its dependents. For e.g. "I shot an elephant". In this sentence, I and elephant are dependents to shot. But How do I discern that the subject in this sentence is I.

    • singhalc
      singhalc about 9 years
      I wanted to link my question to a similar one asked previously - link and an answer given by @mjv. Perhaps the author of the question and/or the responder can shed more light. Thanks.
  • singhalc
    singhalc about 9 years
    Thanks for pointing me to the appropriate section. I was able to identify the NP using the examples in the book, but I understand now that identifying the subject will be a combination of two criteria- child of S and sibling of VP. Can you also point me to a code example that identifies the subject in a sentence? Thanks.
  • John Sly
    John Sly almost 7 years
    This is an old post, but how do you generate the tree without manually defining it? I haven't seen that yet.
  • rayryeng
    rayryeng over 5 years
    ? I don't understand this answer.
  • Darren Cook
    Darren Cook about 5 years
    That issue is about giving non-unicode data to Spacy. Nothing to do with this question.
  • Sundeep Pidugu
    Sundeep Pidugu almost 4 years
    How to find the active verb and passive verb?
  • Chitrang Sharma
    Chitrang Sharma almost 3 years
    is there any way in js ?