Pattern recognition for Datamining and text based anaylysis

12,437

Solution 1

I really encourage you to watch the following NLP Stanford lectures and in particular:

  • Week 3 - Sentiment Analysis (which it is what you want to achieve)
  • Week 4 - Relation Extraction (Hearst's patterns, etc...)
  • I think you'll find them a very valuable resource.

    Solution 2

    In case you understand the basics of pattern recognition:

    1. Manually create two sets (positive and negative) of Twitter postings for your product.
    2. Define a metric, a kernel, or a similarity measure for the postings. You may use high dimensional binary vectors, every component representing a word with the value 1 standing for presence and 0 for absence of that word. You might also add special weighting for negative words, like "not".
    3. Use a machine learning algorithm to train your classifier on your manually created sets (classes). You can use SVMs, neural networks, nearest neighbor classifier...
    4. Use the trained classifier to classify new Twitter postings.

    That's the basic idea on a high level. There are, of course, many fine details to take care of, but explaining them is beyond the scope of an SO answer.

    Solution 3

    This subdomain is called sentiment analysis. There are tons of lectures and articles available on this topic. The real results I've seen so far have not been that convincing to me, unfortunately.

    Key to this challenge is to have good training data. Make yourself a tool that allows you to quickly go through the data and manually tag it as positive/neutral/negative to quickly get a substantial training set.

    See Stanford NLP Lectures, in particular week 3 for details on the overall process and some state of the art approaches and tricks.

    Share:
    12,437
    sunny_dev
    Author by

    sunny_dev

    Updated on June 23, 2022

    Comments

    • sunny_dev
      sunny_dev almost 2 years

      I am in process of creating a software for dumping plain text of whatever users have commented and posted on their Twitter profile regarding some Cosmetic Product "XYZ". I have parsed the JSON objects received from Twitter API and dumped the raw data in MySql database.

      Now I have to do analysis of those plain text to fetch patterns whether it is a good comment or a bad comment etc regarding the Cosmetic Product "XYZ" and feed this info into a separate API for creating dymnamic visual charts in HTML.

      I am totally new in this field of DataMining and text based pattern recognition. Will really appreciate if anyone could suggest how to go ahead with pattern recognition algorithm from this plain text in my database to provide feed to my separate visual charts API.

    • HW-Scientist
      HW-Scientist about 6 years
      Hi, @user278064, could you please update the 'NLP Stanford lectures' linkage if possible? Since current link seems deprecated by now. Thank you.
    • Jonathan Scholbach
      Jonathan Scholbach almost 5 years
      This is not an answer to the question, esp. since the link is now not available any more. That's why I downvote this answer.