How to train a classifier with only positive and neutral data?

10,284

Solution 1

The Spy EM algorithm solves exactly this problem.

S-EM is a text learning or classification system that learns from a set of positive and unlabeled examples (no negative examples). It is based on a "spy" technique, naive Bayes and EM algorithm.

The basic idea is to combine your positive set with a whole bunch of random documents, some of which you hold out. You initially treat all the random documents as the negative class, and learn a naive bayes classifier on that set. Now some of those crawled documents will actually be positive, and you can conservatively relabel any documents that are scored higher than the lowest scoring held out true positive document. Then you iterate this process until it stablizes.

Solution 2

If you have a lot of positive feedback by different users, you have a rather typical collaborative filtering scenario.

Here are some CF solutions:

There exist publicly available implementations of those algorithms, e.g.

By the way, if you use a classifier for such problems, have a look at the literature on positive-only learning, e.g. http://users.csc.tntech.edu/~weberle/Fall2008/CSC6910/Papers/posonly.pdf

Solution 3

As explained here, you could use LibSvm, specifically the option one-class SVM.

Hope it helps!

Solution 4

This is obviously an old post but I have a similiar problem and hopefully you can save some time with information I found myself using the following techniques:

Share:
10,284

Related videos on Youtube

log0
Author by

log0

Updated on September 15, 2022

Comments

  • log0
    log0 over 1 year

    My question : How to train a classifier with only positive and neutral data?

    I am building a personalized article recommendation system for education purposes. The data I use is from Instapaper.

    Datasets

    I only have positive data: - Articles that I have read and "liked", regardless of read/unread status

    And neutral data (because I have expressed interest in it, but I may not like it later anyway): - Articles that are unread - Articles that I have read and marked as read but I did not "like" it

    The data I do not have is negative data: - Articles that I did not send to Instapaper to read it later (I am not interested, although I have browsed that page/article) - Articles that I might not even have clicked into, but I might have or might not have archive it.

    My problem

    In such a problem, negative data is basically missing. I have thought of the following solution(s) but did not resolve to them yet:

    1) Feed a number of negative data to the classifier Pros: Immediate negative data to teach the classifier Cons: As the number of articles I like increase, the negative data effect on the classifier dims out

    2) Turn the "neutral" data into negative data Pros: Now I have all the positive and (new) negative data I need Cons: Despite the neutral data is of mild interest to me, I'd still like to get recommendations on such article, but perhaps as a less value class.

    • ThiS
      ThiS over 11 years
      What are your features for classification?
  • tysonjh
    tysonjh over 11 years
    you are correct that a recommender system is well suited to this problem, but you did not answer the original question
  • ThiS
    ThiS over 11 years
    I can't answer correctly his question if he don't give me what features his classifier is trying to learn. You can't just "make" two binary classifiers if there is no features to learn.
  • tysonjh
    tysonjh over 11 years
    you assumed the words were a feature for the "bag of words", I was just trying to help you improve your answer so I could remove my down vote. Please post your comments about my answer in the correct place.
  • user3001
    user3001 almost 11 years
    Hi, can you explain how I have to interpret the derivative of x_uij for matrix factorization in the BPR paper? Thanks :)
  • zenog
    zenog over 10 years
    It is the derivative of the score difference between 2 items.