Recently, in BookingSync, we were performing a migration from Karafka 1.4 to 2.0, which we use for communication with Kafka. One of the great features available in version 1.4 was a custom partition assignment strategy for consumers. It was particularly useful for us as we’ve had some topics that had a way higher throughput than the other ones, so just a round-robin strategy with even distribution of topics for consumers was not a suitable choice as we needed dedicated consumers for these specific topics/partitions. Unfortunately, custom partition assignment strategy for consumers is no longer available in Karafka 2.0. Nevertheless, we managed to perform the migration and replaced the custom partition assignment strategy with a more straightforward and robust solution.

Anatomy of the problem

Most of our Kafka topics didn’t have a very high throughput of the messages, so just using a default round-robin strategy for partition assignment was perfectly fine. However, two topics (with several partitions) had a significantly larger throughput than the other topics. To avoid issues with a huge lag, we took advantage of the custom partition assignment strategy that was available in Karafka 1.4 and implemented the following strategy:

  1. Use a fixed number of consumers for a high-throughput topic A so that each partition has its own consumer. Effectively, it’s still a round-robin strategy but applied to a smaller scope so that we could assign ten consumers to ten partitions
  2. Use a fixed number of consumers for a high-throughput topic B so that each partition has its own consumer, the same thing as in point 1.
  3. Assign remaining consumers using a round-robin strategy to the rest of the topics/partitions.

This solution was exactly what we needed to solve our problem with a huge lag. Unfortunately, the custom partition assignment strategy is no longer supported in Karafa 2.0 (because internally, Karafka 2.0 uses librdkafka, which doesn’t support it, Karafka 1.4 used ruby-kafka under the hood), so we had to find some alternative solution.

It turned out that there was even a better, more flexible, and more straightforward solution introduced in Karafka 2.0.

The solution

One thing that is easy to spot in the original solution is the fact that we were still using a round-robin strategy; it was just “partitioned” into three independent scenarios. That also implies that we might not need a custom partition assignment strategy at all; we just need to find a way to achieve this kind of partitioning in some alternative way. And we can do that by using three different processes that consume from different topics.

Apparently, there is an out-of-box way to limit the topics to some specific ones, as documented here.

Here is what our example routing might look like:

class KarafkaApp < Karafka::App
  setup do |config|
    # ...
  end

  routes.draw do
    consumer_group :group_name do
      topic :example do
        consumer ExampleConsumer
      end

      topic :example2 do
        consumer ExampleConsumer2
      end


      topic :example3 do
        consumer ExampleConsumer3
      end
    end
  end
end

And then, we could run three different processes the following way:

  1. bundle exec karafka server --topics example
  2. bundle exec karafka server --topics example2
  3. bundle exec karafka server --topics example3

As a bonus, it turns out that this solution provides more flexibility in at least two aspects:

  1. More flexiblity for deployments - since there are three separate types of processes/pods, we can apply different configurations, such as cpu/memory limits.
  2. we can fine-tune the config of the consumers themselves, e.g., by setting different values for session.timeout.ms, heartbeat.interval.ms or max.partition.fetch.bytes (via ENV variables as well) if we experience rebalancing of the consumers too often due to increased time of processing batches for the high-throughput topics.

Conclusions

Thanks to some new additions, the lack of a custom partition assignment strategy for consumers should not be a blocker for migrating to Karafka 2.0. Being a more flexible solution that is also easier to implement is also a significant advantage compared to the custom partition assignment strategy.

posted in: Ruby, Ruby on Rails, Kafka, Karafka