Filter data in Twitter Streaming API

10,014

Take a look at the filter stream of the api:

https://dev.twitter.com/docs/api/1.1/post/statuses/filter

You can enter a set of keywords as a filter to track twitter, according to current limitations you can track up to 400 keywords.

After retrieving the tweets you have to make a manual filtering again to remove noisy data.

So if you can specify what you are looking by a set of keywords, you will achieve what you want; but there will always be noise in your data because it is almost impossible to define smtg that precisely through simple keyword filtering.

For example lets assume you wanna track all tweets related to a brand named XYZ. For getting tweets about brand XYZ you might have a one word keyword set which contains only "XYZ". API will give all the tweets containing XYZ to you, but assume that "XYZ" has a meaning in some language and people of speaking that language will tweet about that word and you will receive that too. Also assume there is a city called XYZ and people will send check-in mesasgees. So at that point you need to filter out tweets that are not related to your topic, either by language detection or contextual information retrieval. But the key is to specify your keyword set about the topic you wanna cover.

Cheers.

Share:
10,014

Related videos on Youtube

Lukas
Author by

Lukas

Updated on October 09, 2022

Comments

  • Lukas
    Lukas over 1 year

    I'm currently experimenting with the Twitter Streaming API. Everything work's like a charm, but the API sends me ton's of data, which I don't need. Is there a possibility to filter the data the API send me?

    I'm using the following stream: https://stream.twitter.com/1.1/statuses/filter.json

  • Lukas
    Lukas about 11 years
    Hi, thank's for that, but the problem is that i don't even want to receive the "Noisy" data, as i want to process lot's of tweets in less time :) Maybe it isn't even podsible to get a "short" version of the tweets from the api.
  • cubbuk
    cubbuk about 11 years
    @LucèBrùlè I edited my answer to clarify whats the noise data.
  • user1599964
    user1599964 almost 11 years
    @cubbuk : Suppose i specified 3 keywords in the filter. Now when i get data from streaming API, is there a way (other than manually searching on my own) to detect that the tweet corresponds to WHICH of the three keywords that i specified in the filter ?
  • cubbuk
    cubbuk almost 11 years
    @user1599964 as far as I know, twitter doesn't provide any info about that, you have to figure it out manually yourself.
  • user1599964
    user1599964 almost 11 years
    @cubbuk : Yes, i figured that out. Can you have a look at this question and let me know your views: stackoverflow.com/questions/16602483/…
  • S Gaber
    S Gaber over 10 years
    is there any tool which can help me for language detection?
  • Krishna Kalyan
    Krishna Kalyan about 8 years
    @cubbuk : will the streaming API also include tweets like abXYZcd or XYZmn. Does it give me tweets which contain the filter substring?. For example if I filter for "fast", will it give me tweets like "breakfast"?.
  • cubbuk
    cubbuk about 8 years
    @KrishnaKalyan I just don't know the current status, sorry.