Skip to main content
Version: 1.0.1

Kafka Connect

Kafka Connect is a framework for connecting Kafka with external systems such as databases, key-value stores, search indexes, and file systems, using so-called Connectors.

Concepts

  • Connectors ( Sources, Sinks ⇒ re-usable piece of code(java jars) )
  • Transforms ( Simple logic to alter each message produced by or sent to a connector )
  • Tasks ( Connectors + user configuration )
  • Workers ( Tasks are executed by )
  • Dead Letter Queue  ( How Connect handles connector errors )
  • Converters ( The code used to translate data between Connect and the system sending or receiving data )

Sample Connectors

  • Elasticsearch Service Sink
  • HDFS 2 Sink
  • Amazon S3 Sink
  • Replicator (Replicator allows you to easily and reliably replicate topics from one Apache Kafka cluster to another.)
  • Jira
  • MySQL Source
  • Much more on here…

Why Not Write Your Own Integrations?

All of this sounds great, but you’re probably asking, “Why Kafka Connect? Why not write our own integrations?”

Apache Kafka has its own very capable producer and consumer APIs and client libraries available in many languages, including C/C++, Java, Python, and Go. So it makes sense for you to wonder why you wouldn’t just write your own code to move data from a system and write it to Kafka—doesn’t it make sense to write a quick bit of consumer code to read from a topic and push it to a target system?

The problem is that if you are going to do this properly, then you need to be able to account for and handle failures, restarts, logging, scaling out and back down again elastically, and also running across multiple nodes. And that’s all before you’ve thought about serialization and data formats. Of course, once you’ve done all of these things, you’ve written something that is probably similar to Kafka Connect, but without the many years of development, testing, production validation, and community that exists around Kafka Connect. Even if you have built a better mousetrap, is all the time that you’ve spent writing that code to solve this problem worth it? Would your effort result in something that significantly differentiates your business from anyone else doing similar integration?

The bottom line is that integrating external data systems with Kafka is a solved problem. There may be a few edge cases where a bespoke solution is appropriate, but by and large, you’ll find that Kafka Connect will become the first thing you think of when you need to integrate a data system with Kafka.