Skip to main content
Version: 1.0.1

What is Kafka?

kafkanig

Apache Kafka is an open-source distributed event streaming platform used by thousands of companies for high-performance data pipelines, streaming analytics, data integration, and mission-critical applications.

  • Message broker
  • Index based message read/write, that is why Kafka is fast
  • Written with Scala and Java
  • Name came from Franz Kafka
  • Funded from LinkedIn
  • Developed with leadership of Jay Kreps

Features

  • Fast ( high throughput and low latency )
  • Scalable ( Horizontally scalable with node and partitions )
  • Reliable ( Fault tolerant and distrubuted )
  • Durable ( Zero data loss, messages persisted to disk with immutable log )

Use Cases

  • Application analytics
  • Monitoring/Metrics
  • Log collecting
  • Stream processing
  • Recommendation engine
  • Fraud and anomaly detection
  • Integrate systems

Companies Using Kafka

  • Uber
  • Netflix
  • Spotify
  • Activision
  • Slack
  • Pinterest
  • Linkedin
  • Shopify

Concepts

  • Producer
    • Producer acknowledgment
      • acks = 0, Fastest but most risky, message loss possibility is high. Send message to kafka but don’t wait response and keep going
      • acks = 1, Mid level fast and safe, message loss possibilty is little. Send message to kafka and wait until leader gets message, don’t wait for followers gets message.
      • acks = all or -1, Slower but most safe, message loss possibility is none. Send message and wait untill leader and followers gets messages
  • Consumer ( Assign 1 consumer to 1 partition ⇒ best practice )
    • Read Strategies
      • At Most Once
      • At Least Once ( most used )
      • Exactly Once ( transactional, performance impact )
  • Partition ( event/message/record holder )
  • Record/Event/Message ( each item in partition )
  • Offset ( message position/index in partition )
  • Topic ( partition holder )
  • Kafka Broker ( topics holder )
  • Consumer Group ( allows parallel processing for partitions, like pub-sub pattern )
  • Distrubuted Systems
    • Leader ( Master )
    • Follower ( Slave )
    • Topic Based Scaling
    • Partition Based Scaling
  • Kafka Connect
  • Kafka Streams
  • Apache ZooKeeper ( Distribution management, Gossip Protocol ⇒ Who is leader? Who is slave? Ok you are leader, take this message )
  • Confluent Cloud
  • Apache Flink ( Stateful Computations over Data Streams **)**
  • Apache Hadoop

Key Differences With Other Messaging Systems

  • Kafka differs from traditional messaging queues in several ways. Kafka retains a message after it has been consumed. Quite the opposite, competitor RabbitMQ deletes messages immediately after they've been consumed.
  • RabbitMQ pushes messages to consumers and Kafka fetches messages using pulling.
  • Kafka can be scaled horizontally and traditional messaging queues can scale vertically.