What does commit-log mean in Kafka?

10,196

Conceptually there's no difference between the "commit log" that Kafka provides and the commit log/transaction log/write ahead log that a DBMS uses: They're both about recording the changes made to something so that it can be replayed later.

In the case of a DBMS this replay will happen if the DB was not shut down cleanly and is necessary to ensure the DB resumes service in a consistent state. Importantly, in a DB this commit log is an implementation detail of the database and is not a concern of the database clients.

In a Kafka application this commit log is a first class concept. Subscribers to a topic can reconstruct the state of the application for themselves, if they want to (in effect, "replaying the log"). They can also react to particular events in the topic, and understand how a particular state was arrived at, neither of which is easy with a traditional DBMS.

Share:
10,196

Related videos on Youtube

Joe.wang
Author by

Joe.wang

An Asp.net developer. Sometimes in Java

Updated on September 14, 2022

Comments

  • Joe.wang
    Joe.wang over 1 year

    Forgive me I am just learning the Kafka. I have encountered a word named commit-log many times when I was reading the material of Kafka. but still have no idea of what exactly it is. the mentioned link like below.

    https://kafka.apache.org/documentation/#uses_commitlog

    Kafka can serve as a kind of external commit-log for a distributed system. The log helps replicate data between nodes and acts as a re-syncing mechanism for failed nodes to restore their data.

    https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying

    One of the most useful things I learned in all this was that many of the things we were building had a very simple concept at their heart: the log. Sometimes called write-ahead logs or commit logs or transaction logs,

    https://kafka.apache.org/protocol.html#protocol_partitioning

    Kafka is a partitioned system so not all servers have the complete data set. Instead recall that topics are split into a pre-defined number of partitions, P, and each partition is replicated with some replication factor, N. Topic partitions themselves are just ordered "commit logs" numbered 0, 1, ..., P.

    What does commit-log means ? Any difference with the concept from DBMS? How to understand it ? Thanks.