Finding topics of an unseen document via Gensim

11,215

The vector returned by [] on an LSI model is actually a list of (topic, weight) pairs. You can inspect a topic by means of the method LsiModel.show_topic

Share:
11,215
Peter Kirby
Author by

Peter Kirby

Updated on June 05, 2022

Comments

  • Peter Kirby
    Peter Kirby over 1 year

    I am using Gensim to do some large-scale topic modeling. I am having difficulty understanding how to determine predicted topics for an unseen (non-indexed) document. For example: I have 25 million documents which I have converted to vectors in LSA (and LDA) space. I now want to figure out the topics of a new document, lets call it x.

    According to the Gensim documentation, I can use:

    topics = lsi[doc(x)]
    

    where doc(x) is a function that converts x into a vector.

    The problem is, however, that the above variable, topics, returns a vector. The vector is useful if I am comparing x to additional documents because it allows me to find the cosine similarity between them, but I am unable to actually return specific words that are associated with x itself.

    Am I missing something, or does Gensim not have this capability?

    Thank you,

    EDIT

    Larsmans has the answer.

    I was able to show the topics by using:

    for t in topics:
        print lsi.show_topics(t[0])