Karol Galanciak - Ruby on Rails and Ember.js consultant

Durable Sidekiq Jobs: How to Maximize Reliability of Sidekiq and Redis

Sidekiq is one of the most popular (if not the most popular one) background job framework in Ruby world, which is not a big surprise: it allows to achieve a decent throughput, is stable and well-maintained, has some great features (including also all the gems extending its built-in functionality) and is easy to get started with. It seems like you could simply install Redis, add Sidekiq to your application and you are good to go!

That would work if you didn’t have any business-critical background jobs where reliability doesn’t matter that much. However, if you cannot afford to lose jobs every now and then, there are some things that are absolutely critical to review in your configuration and infrastructure.

How can you lose a Sidekiq job?

There are multiple scenarios of how you can lose a Sidekiq job without preventive measures. Some of them are quite obvious; some of them are not.

One obvious way would be having Redis down and trying to enqueue a job. This one is pretty intuitive as there is no way of pushing something to Redis if there is no connection to Redis in the first place. From the less obvious scenarios, what is going to happen when Redis crashes while processing a job, due to, e.g., Out Of Memory exception? In “standard” Sidekiq (although not in Sidekiq Pro), processing a job means that the data is dequeued from Redis and if the worker is interrupted and killed, the job will be lost.

To make the matter worse, there are even more ways how to lose jobs. Imagine that Redis has been working just fine so far, there are some jobs enqueued to be processed in the future, nothing is getting processed at the moment and… the server gets restarted. Depending on your setup and how much data is stored in memory before it gets “persisted”, it might turn out that all (or at least some of them) the enqueued jobs are lost!

As terrible as it sounds, there are some relatively quick fixes to these problems. Even though it might be still possible to lose jobs under extreme circumstances, these solutions will significantly minimize the likelihood of critical problems.

Sidekiq Pro and its reliability features

The very first thing that would be worth doing would be buying a license for Sidekiq Pro which is a low hanging-fruit, and you can quickly add it to your applications.

Sidekiq Pro, unlike “standard” Sidekiq, offers extra server reliability by using Redis’s RPOPLPUSH. Thanks to that, the job is not entirely removed from Redis once it starts being processed. Rather, it is atomically moved to “processing list” and is removed from there only after it’s processed. In that sense, we can be confident that if the job ended up in Redis, it would almost certainly be processed by Sidekiq workers, even if there will be multiple Out Of Memory exceptions and the Sidekiq processes will be killed multiple times. Enabling it is as easy as enabling super_fetch in Sidekiq config. If you want to learn more about Sidekiq Pro Reliability Server, you can check the docs.

What is more, Sidekiq Pro also brings some client reliability features, although they are more limited. The scenario where it’s supposed to help is when Redis is down. When calling MyJob.perform_async, we usually assume that things will just work. Sadly, it might sometimes happen that Redis will be down or there will be some network issue, and we will see a nasty 500 error. Sidekiq Pro tries to mitigate this problem by keeping the jobs locally and enqueuing them once the connection with Redis is reestablished. However, the queue where the jobs are stored is an in-memory one, and there is a limitation on how many jobs are stored there, so this solution is not perfect. If you don’t find this solution reliable enough for your needs, you can establish a process for recovery from Redis outages, like storing the jobs in the database when rescuing from exceptions and enqueuing them later once Redis is back.

The only issue with Sidekiq Pro is that it’s not that cheap. If you don’t require strong guarantees and reliability, “standard” Sidekiq will probably be enough, but for processing business-critical jobs, Sidekiq Pro will be a great addition.

Redis and its persistence modes

Another scenario that can cause jobs to be lost is a crash or a restart of Redis process. To understand the potential consequences of these scenarios, we need to take a look at the persistence modes that are available.

Redis offers two strategies that can be used as standalone modes or be combined:

  1. RDB (Redis Database Backup) – when using that mode, Redis will periodically create snapshots allowing point-in-time recovery. What it means is that if the snapshots are created every 5 minutes, you can lose the jobs from the maximum last 5 minutes. And in case of recovery, you might get some jobs that have been already processed within the last 5 minutes.
  2. AOF (Append Only File) – when using that mode, Redis will log all the operations using fsync. It can happen for every write operation, every 1 second or not at all.

As you might have guessed, using AOF is necessary for reasonable durability. However, the exact config (i.e., whether fsync should happen every second or on every write) is highly dependent on the throughput and acceptable performance in your applications, since those operations add extra overhead. If the performance is good enough with fsync on every write, by all means go for that. But if the throughput suffers significantly, you might consider going with fsync every second and establishing a process of what to do if Redis goes down and some jobs are lost. This is again highly dependent on your applications and needs but usually logging jobs and what exactly gets enqueued is a good idea, including the name of the worker’s class, its arguments, and timestamps. That way, you can predict which jobs might have been lost and enqueue them manually. You risk that some jobs might be processed more than once, so idempotency of the jobs would greatly help.

What about RDB? It would still be a good idea to have it enabled – Redis’s docs recommend it in case there is a bug in the AOF engine, which sounds scary, but it’s worth keeping in mind that durability is not the primary focus of Redis.

Wrapping Up

Even though getting started with Sidekiq is not rocket science, and it might initially seem like everything is fine, it won’t be enough for an application processing business-critical backgrounds jobs. If you want to make sure you won’t be losing jobs, you should consider buying Sidekiq Pro license and make sure Redis persistence is configured optimally (ideally, AOF persistence with fsync on every write).

Messages on Rails Part 3: RabbitMQ

In the first part of this series, we were exploring some potential options for communication between services – what their advantages and disadvantages are, why HTTP API is not necessarily the best possible choice and suggesting that asynchronous messaging might be a better solution, using, e.g. RabbitMQ and Kafka. We’ve already covered Kafka in the part 2, now it’s the time for RabbitMQ.

Messages on Rails Part 2: Kafka

In the first part of this series, we were exploring some potential options for communication between services – what their advantages and disadvantages are, why HTTP API is not necessarily the best possible choice and suggesting that asynchronous messaging might be a better solution, using, e.g. RabbitMQ and Kafka. Let’s focus this time entirely on the latter.

Messages on Rails Part 1 - Introduction to Kafka and RabbitMQ

Microservices, Service-Oriented Architecture (SOA) and in general, distributed ecosystems, have been on hype in the last several years. And that’s for a good reason! At certain point, The Majestic Monolith “pattern” might start causing issues, both from the purely technical reasons like scalability, tight coupling of the code if you don’t follow Domain-Driven Design or some other practices improving modularity, maintenance overhead, and also from organizational perspective since working in smaller teams on smaller apps is more efficient than working with huge team on an even bigger monolith which suffers from tight coupling and low cohesion. However, this is only true if the overall architecture addresses the potential problems that are common in the micro/macro-services world. One of these problems I would like to focus on is communication between apps and how the data flows between them.

How to Tell the Difference Between a Default and a Provided Value for Optional Arguments in Ruby?

It is sometimes required for the methods with optional arguments to be able to differentiate between its default value and the value passed from the caller. Passing nil might initially sound like a good idea since it represents “nothingness”. However, it might turn out that nil is a legit value and there might be cases where it is desirable for the caller to pass nil. In such a case, we cannot use it as a default value if we want to implement a special logic for the case of not providing that value.

Fortunately, there is an easy way to deal with it – use a special constant:

1
2
3
4
5
6
7
8
9
10
11
12
class SomeClass
  NO_VALUE_PROVIDED = Object.new
  private_constant :NO_VALUE_PROVIDED

  def call(argument: NO_VALUE_PROVIDED)
    if argument == NO_VALUE_PROVIDED
      handle_scenario_for_no_value_provided
    else
      do_something_with_argument(argument)
    end
  end
end

In call method, we allow to pass an optional argument with a default of NO_VALUE_PROVIDED, which is a private constant defined in that class that is an instance of Object.

By depending on the instance of Object that is initialized inside that class, we can avoid cases where the equality check returns true even if this is not an expected outcome, which could happen if we used strings or symbols. We could use some symbol that would be very unlikely to be passed from the caller, like :__no_value_provided__, but it arguably looks more like a workaround than a dedicated solution for the problem.

Also, a private constant ensures it is not used anywhere outside the class, which minimizes the chances that the passed argument would the same as our placeholder for no-value-provided scenario even more.

Inheritance and Define_method - How to Make Them Work Together

Imagine that you are implementing some form object because you are fed up with treating ActiveRecord models as such, and you need some extra flexibility. You start with a straightforward implementation for a base class of a form object where you can just whitelist attributes. That could look like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
class FormObject
  def self.attributes_registry
    @attributes_registry ||= []
  end

  def self.attribute(attribute_name)
    attributes_registry << attribute_name

    define_method(attribute_name) do
      instance_variable_get("@#{attribute_name}")
    end

    define_method("#{attribute_name}=") do |value|
      instance_variable_set("@#{attribute_name}", value)
    end
  end
end

Since the base class is ready, you can create a first form object that would inherit from this class:

1
2
3
class MyForm < FormObject
  attribute :some_attribute
end

Initially, it does the job, but then it turns out that you might need a default value if some_attribute turns out to be nil. So you try something like that:

1
2
3
4
5
6
7
class MyFormWithDefaultValue < FormObject
  attribute :some_attribute

  def some_attribute
    super || "Default"
  end
end

After checking if the default value works, this is what you get:

1
2
> MyFormWithDefaultValue.new.some_attribute
=> NoMethodError: super: no superclass method `some_attribute' for #<MyFormWithDefaultValue:0x007f84a50ae8e0>

Whoops! How did it happen? The method was defined in the superclass so it should be inheritable, right?

Well, this is not really true. However, the problem is easy to fix.

The Problems With Validating ActiveRecord Models and Why State Validation Is a Bad Idea

In the typical Rails application, you can find the most of the validations in the ActiveRecord models, which is nothing surprising – ActiveRecord models are used for multiple things. Whether it is a good thing, or a bad thing (in most cases it’s the latter) deserves a separate book or at least blog post-series as it’s not a simple problem, there is one specific thing that can cause a lot of issues that are difficult to solve and go beyond design decisions and ease of maintenance of the application, something that impacts the behavior of the model – the validations.

Just to give you a real-world example of what validation in ActiveRecord model looks like (as impossible as it seems, it really did happen) – when updating the check-in time of the reservation, which is a simple attribute on Reservation model, the record turned out to be invalid because… the format of guest’s phone didn’t match some regexp.

There are multiple ways to bypass this problem: use validate: false flag with save method: save(validate: false) or use update_columns method, but this is definitely not something that can be applied in a “normal” use case. In a typical scenario, this will be the error message displayed in the UI/returned to API consumer, and it will be confusing.

However, this is the expected behavior of ActiveRecord (or in general, ActiveModel-style) validations, which is a validation of the state of the model. And judging from this example, it’s evident that it leads to problematic scenarios. What kind of design then would be the most appropriate to prevent such issues?

Indexes on Rails: How to Make the Most of Your Postgres Database

Optimizing database queries is arguably one of the fastest ways to improve the performance of the Rails applications. There are multiple ways how you can approach it, depending on the kind of a problem. N+1 queries seem to be a pretty common issue, which is, fortunately, easy to address. However, sometimes you have some relatively simple-looking queries that seem to take way longer than they should be, indicating that they might require some optimization. The best way to improve such queries is adding a proper index.

But what does “proper index” mean? How to figure out what kind of index is exactly needed for a given query? Here are some essential facts and tips that should cover a majority of the queries you may encounter and make your database no longer a bottleneck.

Trolling in Ruby - Implementing JavaScript-like Maths With Implicit Conversion Hijacking

If you’ve ever worked with JavaScript, especially in pre-SPA/pre-frameworks era with just jQuery, you probably had a chance to see an “exotic” maths in action that looks similar to this:

1
2
"3" + 4
// => "34"

That kind of behavior usually comes as a big surprise and due to that fact, JavaScript has gotten some bad reputation (even though there is a rationale behind it). If we tried that in Ruby, we would get an obvious TypeError:

1
2
"3" + 4
# => TypeError (no implicit conversion of Integer into String)

Would it be possible though to obtain the same result somehow in Ruby?

Rails and Conditional Validations in Models

Adding consents for accepting Terms of Service/Privacy Policies must have been a top popular feature in the majority of the applications due to enforcement of GDPR in May ;). From the technical aspects that GDPR requires, there is a proof of consent for processing the personal information. In that case, you need to have some actual attributes in the database that would confirm the fact that some user has indeed accepted Terms of Service/Privacy Policy.

That makes a significant impact on how we approach this kind of features. However, in the past, such things were quite often not stored in a database at all – it just took some UI acceptance validation or maybe a validation of the virtual attribute on the backend to be on the safe side.

Let’s focus on the latter case where we don’t need to store anything in DB and see what the possible solutions to that problems are. As trivial as this problem initially sounds, it will get quite interesting ;).