Python: Find a list of words in a text and return its index
Solution 1
try this, where log
is the txt and word_search
is the term you are trying to index in log
[i for i, item in enumerate(log) if item == word_search]
Solution 2
I know its been a while since you've asked the question, but since you're already using nltk I would sugest you to use its tool word_tokenize:
text = 'Lorem ipsum dolor sit amet, consectetur adipiscing elit.'
for index, word in enumerate(nltk.word_tokenize(text)):
print(index, word)
The result would be:
0 Lorem 1 ipsum 2 dolor 3 sit 4 amet 5 , 6 consectetur 7 adipiscing 8 elit 9 .
Hope it helps :)
Solution 3
If I understand well, building a positional index is what you want
from collections import defaultdict
text = "your text goes here"
pos_index = defaultdict(list)
for pos, term in enumerate(text.split()):
pos_index[term].append(pos)
Now you have an index with each word's positions. Just query it by term..
adrisons
Software engineering. Working as frontend developer. 👨💻 Interested in web development.
Updated on June 05, 2022Comments
-
adrisons almost 2 years
I have to process a document in plain text, looking for a word list and returning a text window around each word found. I'm using NLTK.
I found posts on Stack Overflow where they use regular expressions for finding words, but without getting their index, just printing them. I don't think use RE is right, cause I have to find specific words.
-
adrisons over 11 yearsThank you. I will study this possibility, but I try to find a few words in a text. I may not have explained well, I could do a dictionary as you do but with occurrences of certain words in a text, rather than for all the words of the text.