reading only specific messages from kafka topic

10,039

In Kafka it's not possible doing something like that. The consumer consumes messages one by one, one after the other starting from the latest committed offset (or from the beginning, or seeking at a specific offset). Depends on your use case, maybe you could have a different flow in your scenario: the message taking the process to do goes into a topic but then the application which processes the action, then writes the result (completed or failed) in two different topics: in this way you have all completed separated from failed. Another way is to use a Kafka Streams application for doing the filtering but taking into account that it's just a sugar, in reality the streams application will always read all the messages but allowing you to filter messages easily.

Share:
10,039

Related videos on Youtube

Prabhanj
Author by

Prabhanj

Updated on June 04, 2022

Comments

  • Prabhanj
    Prabhanj almost 2 years

    Scenario:

    I am writing data JSON object data into kafka topic while reading I want to read an only specific set of messages based on the value present in the message. I am using kafka-python library.

    sample messages:

    {flow_status: "completed", value: 1, active: yes}
    {flow_status:"failure",value 2, active:yes}
    

    Here I want to read only messages having flow_Status as completed.

    • Sivaram Rasathurai
      Sivaram Rasathurai about 2 years
      I have created kafka customer who can do that with the help of spring kafka. You may get some idea when you read this blog rcvaram.medium.com/…
  • Prabhanj
    Prabhanj about 5 years
    so I can have 3 topics, 1 for whole log, 1 for completed status , 1 for failure status... job will write to topic 1, then will filter data based on status to other topic.
  • ppatierno
    ppatierno about 5 years
    exactly, somehow the status for you is the kind of the message which deserves a different topic in this use case (one for completed and one for failure)
  • Prabhanj
    Prabhanj about 5 years
    is it good approach, to have single topic with two partitons(one for completed,one for failure), while sending will keep logic in producer to send data to respective partitions... at consumer end, will create separated consumer_groups , one group to read from failed partition and other to read from completed partition
  • ppatierno
    ppatierno about 5 years
    the producer side could be good yes but you need to implement a custom partitioner for doing that. On the consumer side is quite the opposite, two consumers needs to be in the same consumer group in order to have one partition assigned each one. If they are part of different consumer groups they will get all messages from both partitions. In any case it doesn't work well, because if one consumer crashes, the other will get the other partition (receiving completed and failed messages). You could avoid using consumer groups but direct partitions assignment.