Kafka producer difference between flush and poll

23,974

The difference between flush() and poll() is explained in the client's documentation.

For flush(), it states:

Wait for all messages in the Producer queue to be delivered. This is a convenience method that calls poll() until len() is zero or the optional timeout elapses.

For poll():

Polls the producer for events and calls the corresponding callbacks (if registered).

Calling poll() just after a send() does not make the producer synchronous as it's unlikely the message just sent will already have reached the broker and a delivery report was already sent back to the client.

Instead flush() will block until the previously sent messages have been delivered (or errored), effectively making the producer synchronous.

Share:
23,974
shakeel
Author by

shakeel

My name is Shakeel. interested in learning

Updated on October 02, 2020

Comments

  • shakeel
    shakeel over 3 years

    We have a Kafka consumer which will read messages and do so stuff and again publish to Kafka topic using below script

    producer config :

    {
      "bootstrap.servers": "localhost:9092"
    }
    

    I haven't configured any other configuration like queue.buffering.max.messages queue.buffering.max.ms batch.num.messages

    I am assuming these all will be going to be default values from configuration

    queue.buffering.max.messages : 100000
    queue.buffering.max.ms : 0
    batch.num.messages : 10000
    

    my understanding : When internal queue reaches either of queue.buffering.max.ms or batch.num.messages messages will get published to Kafka in separate thread. in my configuration queue.buffering.max.ms is 0, so every message will be published as soon as when I call produce(). correct me if I am wrong.

    My producer snippet:

    def send(topic, message):
        p.produce(topic, json.dumps(message), callback=delivery_callback(err, msg))
        p.flush()
    

    from this post i understand that using flush after every message, producer is going to be sync producer . if I use above script it is taking ~ 45ms to publish to Kafka

    If I change above snippet to

    def send(topic, message):
        p.produce(topic, json.dumps(message), callback=delivery_callback(err, msg))
        p.poll(0)
    

    Is there any performance will be improved ? Can you clarify my understanding.

    Thanks

  • shakeel
    shakeel over 5 years
    Thanks for info. it really worked for me. time come from 45ms to ~0.2ms
  • dtc
    dtc over 2 years
    This post doesn't explain OP's questions about the configuration though. Regarding that here's my understanding: 1.(Have already tested this) You end up running produce past the queue.buffering.max.messages limit. You get an exception. 2.(I'm uncertain and this is just my assumption) You end up running produce past batch.num.messages this setting. It's possible that messages are delivered even before you flush.
  • Louis Go
    Louis Go about 2 years
    flush() will wait until delivery report is acknowledged which is much longer than poll(0) which only tries to get the delivery reports of previous sent messages. If one is interested in details, this issue worth a read.