You’ve probably heard many times that the database is the bottleneck of many web applications. This isn’t necessarily true. Often it happens that some heavy queries can be substiantially optimized, making them really efficient and fast. As the time passes by, however, the data can remarkably grow in size, especially in several years time which can indeed make the database a bottleneck - the tables are huge and don’t fit into memory any longer, the indexes are even bigger making queries much slower and you can’t really optimize further any query. In many cases deleting old records that are no longer used is not an option - they still can have some value, e.g. for data analysis or statystical purposes. Fortunately, it’s not a dead end. You can make your database performance great again with a new secret weapon: table partitioning.
Table partitioning and inheritance 101
To put it in simple words, table partitioning is about splitting one big table into several smaller units. In PostgreSQL it can be done by creating master table serving as a template for other children tables that will inherit from it. Master table contains no data and you shouldn’t add any indexes and unique constraints to it. However, if you need some
CHECK CONSTRAINTS in all children tables, it is a good idea to add them to master table as they will be inherited. Due to inheritance mechanism, there is no point in adding any columns to children tables either. Creating tables inheriting from other tables is pretty straight-forward - you just need to use
1 2 3 4 5 6 7 8
and that’s it - all columns and extra constraints will be defined on
other_orders thanks to inheritance from
Defining constraints for partitioning criteria
As we are going to split one big table into the smaller ones, we need to have some criterium that would be used to decide which table should we put the data in. We can do it either by range (e.g.
created_at between 01.01.2016 and 31.12.2016) or by value (
client_id equal to 100). To ensure we have only the valid data which always satisfy out partitioning criterium, we should add proper
CHECK CONSTRAINTS as guard statements. To make sure the orders in particular table were e.g. created at 2016, we could add the following constraint:
If we were to create tables for orders for the upcoming few years (assuming that we want to cover entire year in each of them), we could do the following:
1 2 3 4 5 6 7 8 9
Beware of potential gotcha when defining
CHECK CONSTRAINTS. Take a look at the following example:
In this case orders with
client_id equal to 1000 could be inserted to any of these tables. Make sure the constraints are not inclusive on the same value when using ranges to avoid such problems.
To provide decent performance it is also important to make
postgresql.conf enabled. You can set it either to
That way we can avoid scanning all children tables when the
CHECK CONSTRAINTS exclude them based on query conditions. Compare the query plans between queries with
constraint_exclusion disabled and enabled:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
1 2 3 4 5 6 7 8 9 10 11 12
We wanted to select only the orders created on 2018, so there is no need to scan other children tables that don’t contain such data.
The difference between
partition setting is that the former checks constraints for all tables and the latter only for children tables inheriting from parent table (and also when using
UNION ALL subqueries).
Create new records easily with triggers
Having multiple tables is certainly difficult to manage when creating new records. Always remembering that they should be inserted to a specific table can be cumbersome. Fortunately, there’s an excellent solution for this problem: PostgreSQL triggers! If you are not familiar with them, you can read my previous blog post about database triggers.
We can automate the insertion process by checking the value of
created_at and decide which table it should be put in. Here’s an example how we could approach it:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
There’s a potential gotcha though: as we as we are dealing with multiple tables, the
id of the orders might not necessarily be unique across all of them. When creating new records we can ensure that the next id will be based on “global” sequence for partitioned tables, but it still gives a possibility to have duplicated ids in some tables, e.g. by accidental update of the id. Probably the best way to make sure there are no duplicates in all partitioned tables would be using uuid which most likely will be unique.
Table partitioning might be a great solution to improve performance of your application when the amount of data in tables is huge (especially when they don’t fit into memory any longer). In the first part, we learned some basics from raw SQL perspective about table partitioning. In the next one we will be applying this concept to real-world Rails application.