Skip to main content

3 posts tagged with "kafka"

View All Tags

· 7 min read
Javier Montón

Why should you name your state stores? Because otherwise, you'll lose data.

But first, a bit of context.

Kafka Streams

Kafka Streams is a client library for building applications and microservices where the input and output data are stored in Kafka clusters. It combines the simplicity of writing and deploying standard Java and Scala applications on the client side with the benefits of Kafka's server-side cluster technology.

Unlike other stream processing frameworks, Kafka Streams is not a separate processing cluster but a library that runs within your application. This means you don't need to set up and manage a separate cluster - your application becomes the stream processing engine.

· 7 min read
Javier Montón

This post is about Kafka Connect, Mirror Maker 2, how they manage offsets, and how to deal with them.

Kafka Offsets

When a consumer starts consuming messages from Kafka, it will probably use a consumer-group and Kafka will store the offset of the last message consumed by that consumer-group. This offset is stored in a Kafka topic called __consumer_offsets.

· 20 min read
Javier Montón

A guide to move data from Kafka to an AWS RDS using Kafka Connect and the JDBC Sink Connector with IAM Auth.

Kafka Connect

For these examples, we are using Confluent's Kafka Connect on its Docker version, as we are going to deploy it in a Kubernetes cluster.

Single and distributed modes

Kafka Connect comes with two modes of execution, single and distributed. The main difference between them is that the single mode runs all the connectors in the same JVM, while the distributed mode runs each connector in its own JVM. The distributed mode is the recommended one for production environments, as it provides better scalability and fault tolerance. In the case of K8s, it means we will be using more than one pod to run Kafka Connect.