Does Kafka python API support stream processing?

20,116

Solution 1

Kafka Streams is only available as a JVM library, but there are at least two Python implementations of it

In theory, you could try playing with Jython or Py4j to work with the JVM implementation, but probably would require more work than necessary.

Outside of those options, you can also try Apache Beam, Flink or Spark, but they each require an external cluster scheduler to scale out (and also require a Java installation).

If you are okay with HTTP methods, then running a KSQLDB instance (again, requiring Java for that server) and invoking its REST interface from Python with the built-in SQL functions can work. However, building your own functions there will requiring writing Java code, last I checked.

If none of those options are suitable, then you're stuck with the basic consumer/producer methods.

Solution 2

If you are using Apache Spark, you can use it as producer and as consumer. No need to rely on 3rd part libraries like Faust, but you will need a Spark cluster manager (Standalone, YARN, or Kubernetes to scale it out)

To consume Kafka data streams in Spark, use the Structured Streaming + Kafka Integration Guide.

Keep in mind that you will have to append spark-sql-kafka package when using spark-submit:

spark-submit --packages org.apache.spark:spark-sql-kafka-0-10_2.12:3.0.1 StructuredStreaming.py

This solution has been tested with Spark 3.0.1 and Kafka 2.7.0 with PySpark.

This resource can also be useful.

Share:
20,116
user3126637
Author by

user3126637

Updated on July 09, 2022

Comments

  • user3126637
    user3126637 almost 2 years

    I have used Kafka Streams in Java. I could not find similar API in python. Do Apache Kafka support stream processing in python?

  • Mahamutha M
    Mahamutha M about 5 years
    Is there any example or tutorials to use docs.confluent.io/current/ksql/docs/tutorials/… with faust streaming?
  • OneCricketeer
    OneCricketeer about 5 years
    KSQL is implemented in Java, so I'm not sure I understand the question
  • Mahamutha M
    Mahamutha M about 5 years
    @circket_007, KSQL is not available in python. This is what you mean. Am I right?
  • OneCricketeer
    OneCricketeer about 5 years
    @Maha KSQL server has a REST API, so you can submit queries from any language