Introduction to Apache Kafka

Ivan Gavlik, Software Developer

Tech

17.09.2020.

featured image

So, the idea of this short article is to give you a short overview of Apache Kafka. After reading this, you’ll have a notion of what Kafka is, why it was created and how you can integrate it in the microservice architecture.

History

In the very beginning, LinkedIn was a monolithic application. As the complexity and the number of users increased, it was noticed that the architecture LinkedIn was using was not ideal. So, LinkedIn’s engineering team started migrating it to microservices. However, as fate would have it, as soon as you solve one problem, a new one arises. During the migration process, the LinkedIn team noticed they had issues with tracking, metrics and messaging, and that their analytics, search services had trouble working in real-time. In order to overcome these obstacles, they started building custom data pipelines for these services. Instead of maintaining each pipeline individually, they decided to develop a single, distributed pub-sub system – and thus, Kafka was born.

With time, streaming was developed, which is what Kafka is known for today. 

Kafka is a platform

I have just recently subscribed to a culinary channel on YouTube and now every time a new video is posted, I get notified. In Kafka’s world, I’m a consumer, meaning I consume information. The cook who published the video on YouTube is a producer, meaning he produces information.

The cook publishes his videos on YouTube, and a Kafka producer publishes his content to Kafka. Simply put, Kafka is a platform that makes sure that the content published by the producer reaches the consumer. Here’s a LinkedIn example: The user tracking service (producer) publishes the information that the user John Doe liked a certain post – Kafka will make sure that all the interested services (consumers) are informed about this happening.

The information about something happening is called an event. I won’t be going into the technical details of what constitutes an event, but it’s important to note that it should contain information on who, when, where and most importantly, why the event was created (of course, this varies depending on what you need).

Kafka is powerful because we can combine events in order to create new ones.

Kafka event

Integration

In most microservice architectures, Kafka is integrated in one of two possible ways

1. Broker

Kafka Broker

Pros:
               huge performance
               scalability
Cons:
               error handling
               coordination

2. Orchestrator

Kafka Orchestrator

Pros:
                coordination and flow control, all in one place
                better error handling
Cons:
                huge coupling

Broker is more frequently used because it has fewer coupling and better performances than Orchestrator.

Now you, as the reader, have a good overview of Kafka which will certainly help you further. I’ve left out the technical details in order to place focus on the questions every beginner has: What is Kafka? Why was it developed? How do you integrate it?

Kafka is used by an increasingly large number of organizations because it simplifies working with data in real-time. We can use it to achieve a high level of decoupling between services in the ever-growing microservice architecture.

Any questions? Let us know:

RELATED

11.02.2020.

Breaking down a monolithic application to microservices – where to start?

This approach might not be possible for every part of your system but you’d be surprised in how many cases you can implement it.

Read more