Python: Find a list of words in a text and return its index

python text word nltk

15,670

Solution 1

try this, where log is the txt and word_search is the term you are trying to index in log

 [i for i, item in enumerate(log) if item == word_search]

Solution 2

I know its been a while since you've asked the question, but since you're already using nltk I would sugest you to use its tool word_tokenize:

text = 'Lorem ipsum dolor sit amet, consectetur adipiscing elit.'
for index, word in enumerate(nltk.word_tokenize(text)):
  print(index, word)

The result would be:

0 Lorem 1 ipsum 2 dolor 3 sit 4 amet 5 , 6 consectetur 7 adipiscing 8 elit 9 .

Hope it helps :)

Solution 3

If I understand well, building a positional index is what you want

from collections import defaultdict

text = "your text goes here"
pos_index = defaultdict(list)
for pos, term in enumerate(text.split()):
    pos_index[term].append(pos)

Now you have an index with each word's positions. Just query it by term..

15,670

Author by

adrisons

Software engineering. Working as frontend developer. 👨‍💻 Interested in web development.

Updated on June 05, 2022

Comments

adrisons almost 2 years

I have to process a document in plain text, looking for a word list and returning a text window around each word found. I'm using NLTK.

I found posts on Stack Overflow where they use regular expressions for finding words, but without getting their index, just printing them. I don't think use RE is right, cause I have to find specific words.
adrisons over 11 years

Thank you. I will study this possibility, but I try to find a few words in a text. I may not have explained well, I could do a dictionary as you do but with occurrences of certain words in a text, rather than for all the words of the text.