Skip to main content

· 7 min read
Javier Montón

Why should you name your state stores? Because otherwise, you'll lose data.

But first, a bit of context.

Kafka Streams

Kafka Streams is a client library for building applications and microservices where the input and output data are stored in Kafka clusters. It combines the simplicity of writing and deploying standard Java and Scala applications on the client side with the benefits of Kafka's server-side cluster technology.

Unlike other stream processing frameworks, Kafka Streams is not a separate processing cluster but a library that runs within your application. This means you don't need to set up and manage a separate cluster - your application becomes the stream processing engine.

· 7 min read
Javier Montón

Kafka Connect is a tool that allows you to stream data between Apache Kafka and other systems, sometimes the data might be converted from Protobuf to something different, other times, it might be converted to Protobuf.

Protobuf Converters

Confluent has a Protobuf Converter that can be used with any Kafka Connect Source or Sink, but it isn't as simple as it seems.

If you enable:

value.converter=io.confluent.connect.protobuf.ProtobufConverter
value.converter.schema.registry.url=http://localhost:8081

· 4 min read
Javier Montón

When working with Kafka, increasing or decreasing the number of brokers isn't as trivial as it seems. If you add a new broker, it will stand there doing nothing. You have to manually reassign partitions of your topics to the new broker. But you don't want to just move some topics completely to your new broker, you want to spread your partitions and their replicas equitably across all your brokers. You also want to have the number of leader partitions balanced across all your brokers.

Reassign partitions

To reassign partitions to different brokers, you can use the Kafka binaries (bin/kafka-reassign-partitions.sh), but it isn't trivial if you have to reassign thousands of topics.

The binary file has three operations:

· 7 min read
Javier Montón

This post is about Kafka Connect, Mirror Maker 2, how they manage offsets, and how to deal with them.

Kafka Offsets

When a consumer starts consuming messages from Kafka, it will probably use a consumer-group and Kafka will store the offset of the last message consumed by that consumer-group. This offset is stored in a Kafka topic called __consumer_offsets.

· 20 min read
Javier Montón

A guide to move data from Kafka to an AWS RDS using Kafka Connect and the JDBC Sink Connector with IAM Auth.

Kafka Connect

For these examples, we are using Confluent's Kafka Connect on its Docker version, as we are going to deploy it in a Kubernetes cluster.

Single and distributed modes

Kafka Connect comes with two modes of execution, single and distributed. The main difference between them is that the single mode runs all the connectors in the same JVM, while the distributed mode runs each connector in its own JVM. The distributed mode is the recommended one for production environments, as it provides better scalability and fault tolerance. In the case of K8s, it means we will be using more than one pod to run Kafka Connect.

· 9 min read
Javier Montón

Big Data Types is a library that can safely convert types between different Big Data systems.

The power of the library

The library implements a few abstract types that can hold any kind of structure, and using type-class derivations, it can convert between multiple types without having any code relating them. In other words, there is no need to implement a transformation between type A to type B, the library will do it for you.

As an example, let's say we have a generic type called Generic. Now we want to convert from type A to type B. If we implement the conversion from A to Generic and the conversion from Generic to B, automatically we can convert from A to B although there is no single line of code mixing A and B.