Apache Kafka – Definition and Use Cases
- Posted by Adham Jan
- On December 15, 2019
If you’ve been working in or with IT for any length of time, you’ve probably heard of Kafka. And you learned that over one third of the Fortune 500 companies use Kafka in their applications and programs. Most of the heavy hitters such as Microsoft, LinkedIn, and Netflix use Kafka to handle their transactions, as does the top ten travel companies and nine out of the top ten telecoms. But what is Kafka and why is it used?
What is Kafka?
Kafka is a real-time, event streaming platform that collects and processes data. Originally designed as a message queue, it now handles multiple events and data real-time. It is a distributed, publish-subscribe message system that can handle over a trillion events and data every day. This makes Kafka flexible and robust, as well as very powerful. And since 2011, Kafka has been open-sourced, which makes it an affordable option as well. So, how did something so useful come into being?
Brief History of Apache Kafka
In 2010, LinkedIn developed Kafka using Java and Scala. The idea behind Kafka was to design a highly scalable messaging system that could handle LinkedIn’s growth and transactions. Once LinkedIn made Kafka open sourced in 2011, the Apache Software Foundation took control of Kafka’s main development and source code, making it accessible to all in 2012. Since then, the original developers of Kafka created their own company, Confluent, which leverages Kafka with their Confluent Platform.
Within a few years, even slow to adapt companies were taking notice of Kafka’s popularity in handling events and messaging. The popularity was due in large part to its simplicity, its robustness, and its distributed nature. Because it was distributed, large amounts of data and events could be processed effortlessly, and because it utilized a mix of queue and publish-subscribe message systems in order to gain the best of both worlds.
When to Use Kafka?
Kafka is best used to process large numbers of events as well as big data in real-time. Because of its simplicity and distributed application, it relies heavily on each platform’s operating system kernel to do the heavy work of moving the data around fast. This enables Kafka to move data around in batches and keeps track of them using the Kafka topic log. Kafka is best used to stream data in real-time between systems.
Kafka both allows you to publish data and read data off its message queues as a message broker. This means you can read data as it comes in, whether from something like an Internet of Things device, user actions on a website, or data coming in from banks. Kafka is great at real-time solutions, which means that a company has a better chance in catching a problem, whether it’s bank fraud with a debit card to a faulty device. It’s also great at processing and responding to users’ inputs in real-time.
Who Uses Kafka?
As mentioned before, many companies now use Kafka to handle their event streaming and real-time messaging. LinkedIn designed Kafka to handle their growing need of a flexible message queue that quickly grew into an event processor, but there are many other companies that use Kafka. Here are some of them:
- Yahoo.com — To process metrics at peak times. Plus they developed Kafka Manager, an open source tool to manage Kafka.
- Twitter — As part of the Storm stream processing infrastructure.
- Netflix — For real-time event processing and real-time monitoring.
- AirBnB — For event streaming and exception handling.
- Mozilla — To collect performance and usage data from end-users’ browsers for projects.
- AddThis — To collect events and send that data to their analytics clusters and web analytics platform.
- Box — For the production analytics pipeline and real-time monitoring infrastructure.
Why Kafka is Terrific for Agile Integration?
At this point, one may be wondering if Kafka is good for agile integration. The answer is, yes, it most certainly is. The reason why Kafka is so useful has to do with how it was designed. Here are some good reasons why Kafka is terrific for agile integration.
- It is a distributed design, meaning that it doesn’t rely on a monolithic structure.
- It is an open source solution, which means there is no licensing fee for using or modifying it.
- It is simple in design. Unlike other messaging solutions, it is designed for one thing and does it well.
- It uses TCP and supports many different programming languages including C, C#, Java, Python, and Ruby, among others.