In the previous blog post we learned some basics about table partitioning: how it works and what kind of problems it solves. So far we've been discussing mostly basic concepts with raw SQL examples. But the essential question in our case would be: how to make it work inside Rails application then? Let's see what we can do about it.

Partitioning tables with partitioned gem

It turns out there's no built-in support for table partitioning in ActiveRecord. Fortunately, there's a gem that makes it pretty straight-forward to apply this concept to your models: partitioned. Not only does it have several strategies for partitioning (e.g. by foreign key or by yearly / weekly / monthly and you can easily create custom ones by subclassing base class and defining proper methods) making it easy to perform CRUD operations when dealing with multiple tables, but it also provides some methods to create and destroy infrastructure (separate schema for partitioned tables) and some helper methods for generating tables based on partitioning criteria, even with indexes and constraints! Let's get back to example from previous the blog post with orders. Firstly, add partitioned gem to the Gemfile. Unfortunately, there are some issues with compatibility with Rails 4.2 at the time I was experimenting with it, so it might be necessary to use some forks. The following combination should work with Rails 4.2.6:

gem 'activerecord-redshift-adapter',  git: "git@github.com:arp/activerecord-redshift-adapter.git", branch: "rails4-compatibility"
gem 'partitioned', git: "git@github.com:dkhofer/partitioned.git", branch: "rails-4-2"

and of course run bundle install. Now we can generate model:

rails generate model Order

Firstly, let's set up the partitioned Order model. To handle partitioning strategy for separate tables for every year based on created_at column, we could define the following base class:

# app/models/partitioned_by_created_at_yearly.rb
class PartitionedByCreatedAtYearly < Partitioned::ByYearlyTimeField
  self.abstract_class = true

  def self.partition_time_field
    :created_at
  end

  partitioned do |partition|
    partition.index :id, unique: true
  end
end

This class inherits from Partitioned::ByYearlyTimeField to handle exactly the strategy we need for orders. We set this class to be an abstract one to make it clear it's not related to any table in the database. We also need to provide partition_time_field, in our case it's created_at column. In partitioned block we can define some extra constraints and indexes that will be used when creating children tables. The next thing would be to make it a parent class for Order model:

# app/models/order.rb
class Order < PartitionedByCreatedAtYearly
end

Creating migration for partitioned tables

Let's get back to our migration. What we want to do is to create orders table, a schema for children partitioned tables of orders and the tables themselves for the next several years. We could do it the following way:

class CreateOrders < ActiveRecord::Migration
  def up
    create_table :orders do |t|
      t.timestamps null: false
    end

    Order.create_infrastructure
    dates = Order.partition_generate_range(Date.today, Date.today + 5.year)
    Order.create_new_partition_tables(dates)
  end

  def down
    Order.delete_infrastructure
    drop_table :orders
  end
end

The gem also provides excellent helper method partition_generate_range to help with setting up new partition tables. That way we will generate tables handling orders from 2016 to 2021. Now you can simply run rake db:migrate.

CRUD operations on partitioned tables

So far we've managed to set up the database for handling table partitioning. But the essential question is: can our app handle management of these tables? Will it insert / update / delete records to and from proper tables? Let's play with some operations to find out:

> Order.create
   (0.1ms)  BEGIN
  SQL (11.7ms)  INSERT INTO "orders_partitions"."p2016" ("created_at", "updated_at") VALUES ($1, $2) RETURNING "id"  [["created_at", "2016-06-03 17:21:36.221268"], ["updated_at", "2016-06-03 17:21:36.221268"]]
   (5.9ms)  COMMIT
> Order.create(created_at: 1.year.from_now)
   (0.1ms)  BEGIN
  SQL (0.8ms)  INSERT INTO "orders_partitions"."p2017" ("created_at", "updated_at") VALUES ($1, $2) RETURNING "id"  [["created_at", "2017-06-03 17:25:05.413114"], ["updated_at", "2016-06-03 17:25:05.414208"]]
   (121.4ms)  COMMIT
 => #<Order id: 2, created_at: "2017-06-03 17:25:05", updated_at: "2016-06-03 17:25:05">
> Order.create(created_at: 2.years.from_now)
   (0.1ms)  BEGIN
  SQL (0.3ms)  INSERT INTO "orders_partitions"."p2018" ("created_at", "updated_at") VALUES ($1, $2) RETURNING "id"  [["created_at", "2018-06-03 17:25:11.634532"], ["updated_at", "2016-06-03 17:25:11.635389"]]
   (2.1ms)  COMMIT
 => #<Order id: 3, created_at: "2018-06-03 17:25:11", updated_at: "2016-06-03 17:25:11">


> Order.all
  Order Load (0.6ms)  SELECT "orders".* FROM "orders"
 => #<ActiveRecord::Relation [#<Order id: 1, created_at: "2016-06-03 17:21:36", updated_at: "2016-06-03 17:21:36">, #<Order id: 2, created_at: "2017-06-03 17:25:05", updated_at: "2016-06-03 17:25:05">, #<Order id: 3, created_at: "2018-06-03 17:25:11", updated_at: "2016-06-03 17:25:11">]>
> Order.all.count
   (0.6ms)  SELECT COUNT(*) FROM "orders"
 => 3

> Order.find(1)
  Order Load (0.3ms)  SELECT  "orders".* FROM "orders" WHERE "orders"."id" = $1 LIMIT 1  [["id", 1]]
 => #<Order id: 1, created_at: "2016-06-03 17:21:36", updated_at: "2016-06-03 17:21:36">

> Order.from_partition(Date.new(2017, 1, 1)).find(2)
  Order Load (0.3ms)  SELECT  "orders".* FROM "orders_partitions"."p2017" "orders" WHERE "orders"."id" = $1 LIMIT 1  [["id", 2]]
 => #<Order id: 2, created_at: "2017-06-03 17:25:05", updated_at: "2016-06-03 17:25:05">

 > Order.from_partition(Date.new(2017, 1, 1)).find(2).update!(updated_at: 5.years.from_now)
  Order Load (0.3ms)  SELECT  "orders".* FROM "orders_partitions"."p2017" "orders" WHERE "orders"."id" = $1 LIMIT 1  [["id", 2]]
   (0.1ms)  BEGIN
  SQL (0.2ms)  UPDATE "orders_partitions"."p2017" SET "updated_at" = $1, "created_at" = $2 WHERE "orders_partitions"."p2017"."id" = 2  [["updated_at", "2021-06-03 17:29:07.077931"], ["created_at", "2017-06-03 17:25:05.413114"]]
   (1.7ms)  COMMIT

> Order.from_partition(Date.new(2018, 1, 1)).find(3).destroy
  Order Load (0.3ms)  SELECT  "orders".* FROM "orders_partitions"."p2018" "orders" WHERE "orders"."id" = $1 LIMIT 1  [["id", 3]]
   (0.1ms)  BEGIN
  SQL (0.3ms)  DELETE FROM "orders_partitions"."p2018" WHERE "orders_partitions"."p2018"."id" = $1  [["id", 3]]
   (1.7ms)  COMMIT

> Order.from_partition(Date.new(2017, 1, 1)).update_all(updated_at: Time.zone.now)
  SQL (1.6ms)  UPDATE "orders_partitions"."p2017" "orders" SET "updated_at" = '2016-06-03 17:30:59.926682'

> Order.from_partition(Date.new(2017, 1, 1)).destroy_all
  Order Load (0.3ms)  SELECT "orders".* FROM "orders_partitions"."p2017" "orders"
   (0.1ms)  BEGIN
  SQL (0.2ms)  DELETE FROM "orders_partitions"."p2017" WHERE "orders_partitions"."p2017"."id" = $1  [["id", 2]]
   (1.6ms)  COMMIT

Awesome! Looks like all the CRUD operations work without any problems! We even have extremely helpful query method from_partition to scope queries to the specific child table.

Wrapping up

Table partitioning might a great solution to solve database performance issues. Even though it's not supported out-of-the-box by Rails, you can easily integrate it with your app thanks to partitioned gem.

posted in: PostgreSQL, databases, Ruby, Ruby on Rails, Scaling, Performance, Architecture