Software Engineering

Kafka Backfill Playbook: Accessing Historical Data

Event-driven architectures with Kafka have become a standard way of building modern microservices. At first, everything works smoothly - services communicate via events, state is rebuilt from event streams, and the system scales well. But as your data grows, you face an inevitable challenge: what happens when you need to access historical events that are no longer in Kafka? 1. The Problem: Finite Retention & The Need for Backfills In a perfect world, we would keep every event log in Kafka forever. In the real world, however, storing an ever-growing history on high-performance broker disks is prohibitively expensive. ...

Data Oriented Programming (DOP) in Java

What is Data Oriented Programming? Data Oriented Programming (DOP) is gaining momentum in the Java ecosystem due to recent language features streamlining its adoption. While conceptually straightforward, DOP offers significant advantages. But what is it? How do we build our objects? Where does the state go? Where does the behavior go? OOO encourages us to bundle state and behavior together. But what if we separated this? What if data became the primary focus, with logic completely separated? This is the central idea of Data Oriented Programming (DOP), simple. ...

Idempotent Processing with Kafka

Duplicate Messages are Inevitable Duplicate messages are an inherent aspect of message-based systems and can occur for various reasons. In the context of Kafka, it is essential to ensure that your application is able to handle these duplicates effectively. As a Kafka consumer, there are several scenarios that can lead to the consumption of duplicate messages: There can be an actual duplicate message in the kafka topic you are consuming from. The consumer is reading 2 different messages that should be treated as duplicates. You consume the same message more than once due to various error scenarios that can happen, either in your application, or in the communication with a Kafka broker. To ensure the idempotent processing and handle these scenarios, it’s important to have a proper strategy to detect and handle duplicate messages. ...

Avoid Tight Coupling of Tests to Implementation Details

Building backend systems today will likely involve building many small, independent services that communicate and coordinate with one another to form a distributed system. While there are many resources available discussing the pros and cons of microservices, the architecture, and when it is appropriate to use, I want to focus on the functional testing of microservices and how it differs from traditional approaches. In my experience, the “best testing practices” have evolved with the introduction of microservices, and traditional testing pyramids may not be the most effective or even potentially harmful in this context. In my work on various projects and companies, including the development of new digital banks and the migration of older systems to microservices as they scale, I have often encountered disagreements about the most appropriate testing strategies for microservices. ...

Stream unzip files in S3 with Java

I’ve been spending a lot of time with AWS S3 recently building data pipelines and have encountered a surprisingly non-trivial challenge of unzipping files in an S3 bucket. A few minutes with Google and StackOverflow made it clear many others have faced the same issue. I’ll explain a few options to handle the unzipping as well as the end solution which has led me to build nejckorasa/s3-stream-unzip. To sum up: there is no support to unzip files in S3 in-line, there also is no unzip built-in api available in AWS SDK. In order to unzip you therefore need to download the files from S3, unzip and upload decompressed files back. ...