When validation is not enough: PostgreSQL triggers for data integrity
Is validation in your models or form objects enough to ensure integrity of the data? Well, seems like you can't really persist a record when the data is not valid unless you intentionally try to bypass validation using save: false
option when calling save
or using update_column
. What about uniqueness validation?
A classic example would be a unique email
per user. To make sure the email is truly unique we could add a unique index in database - not only would it prevent saving non-unique users when bypassing validation, but also it would raise extra error when 2 concurrent requests would attempt to save user with the same email. However, some validations are more complex that ensuring a value is unique and index won't really help much. Fortunately, PostgreSQL is powerful enough to provide a perfect solution to such problem. Time to meet your new friend: PostgreSQL triggers.
Anatomy of PostgreSQL triggers and procedures
PostgreSQL trigger is like a callback: it's a function that is called on specific event: before
or after
insert
, update
, delete
or truncate
in case of tables and views and for views you can also run a function instead of
those events. Triggers can be run either for each row
(tables only) and for each statement
(both tables and views). The difference between them is quite simple: for each row
is run for every modified row and for each statement
is run only once per statement. The important thing to keep in mind regarding for each row
is that you have a reference to the row being modified, which will be essential in upcoming example.
By running \h CREATE TRIGGER;
from psql we can get a generalized syntax for creating triggers:
Command: CREATE TRIGGER
Description: define a new trigger
Syntax:
CREATE [ CONSTRAINT ] TRIGGER name { BEFORE | AFTER | INSTEAD OF } { event [ OR ... ] }
ON table_name
[ FROM referenced_table_name ]
[ NOT DEFERRABLE | [ DEFERRABLE ] { INITIALLY IMMEDIATE | INITIALLY DEFERRED } ]
[ FOR [ EACH ] { ROW | STATEMENT } ]
[ WHEN ( condition ) ]
EXECUTE PROCEDURE function_name ( arguments )
where event can be one of:
INSERT
UPDATE [ OF column_name [, ... ] ]
DELETE
TRUNCATE
You can also add a condition for running triggers and timing option, which I'm not going to discuss in greater details, you can find more about them in official docs.
What about function_name
? It's a user-defined function returning trigger
. Here's a dummy example to give an idea about the syntax for defining functions:
CREATE FUNCTION dummy() RETURNS trigger AS $$
DECLARE
some_integer_variable int;
BEGIN
some_integer_variable := 1;
NEW.counter := some_integer_variable;
RETURN NEW;
END;
$$ language plpgsql;
We started with defining dummy
function taking no arguments returning type of trigger. Next, we have a DECLARE
block where we declare temporary variables, in our case it's some_integer_variable
of type int
. Within BEGIN / END
block we define the actual function body: we assign a dummy value to some_integer_variable
variable using :=
operator and then we do some assignment using implicit NEW
variable which is basically a row referenced by given statement. This variable is available only when running a trigger for each row
, otherwise it will return NULL
. Any trigger has to return either a row or NULL
- in this example we return NEW
row. At the end we have a declaration of writing function in plpgsql
language.
Let's take a look at some real world code to see triggers in action.
Using triggers for data integrity
A good example where we could use a trigger for more complex validation could be a calendar event: we need to ensure that no other event exists between some begins_at
time and finishes_at
time. We should also scope it only to a given calendar and exclude id of event being updated - when creating new events it wouldn't matter, but without excluding id we wouldn't be able to update any event. So what we actually want to achieve is to create a trigger that will be run before insert or update
on events table for each row
.
Let's start with generating Calendar
model and Event
model with reference to calendar
and begins_at
and finishes_at
attributes:
rails generate model Calendar
rails generate model Event calendar_id:integer begins_at:datetime finishes_at:datetime
and also generate extra migration adding trigger and procedure ensuring that only one event can be created for given period of time for a calendar:
rails generate migration add_validation_trigger_for_events_availability
add the SQL code :
class AddValidationTriggerForEventsAvailability < ActiveRecord::Migration
def change
execute <<-CODE
CREATE FUNCTION validate_event_availability() returns trigger as $$
DECLARE
events_count int;
BEGIN
events_count := (SELECT COUNT(*) FROM events WHERE (
events.calendar_id = NEW.calendar_id AND events.begins_at < NEW.finishes_at AND events.finishes_at > NEW.begins_at AND events.id != NEW.id
));
IF (events_count != 0) THEN
RAISE EXCEPTION 'Period between % and % is already taken', NEW.begins_at, NEW.finishes_at;
END IF;
RETURN NEW;
END;
$$ language plpgsql;
CREATE TRIGGER validate_event_availability_trigger BEFORE INSERT OR UPDATE ON events
FOR EACH ROW EXECUTE PROCEDURE validate_event_availability();
CODE
end
end
Our validate_event_availability
function performs query to count all events that are between given time period for specified calendar excluding own id (so that the row being updated is not considered here, which would prevent updating any event). If any other event is found, the exception is raised with an error message - %
characters are used for interpolation of begins_at
and finishes_at
attributes. If no other event is found, we simply return the row.
We want to define a trigger running this function before creating any new event or updating existing ones, so we need to run it BEFORE INSERT OR UPDATE
FOR EACH ROW
.
It might be a good idea to switch also to :sql
schema format - the standard :ruby
format can't handle triggers at this point. Add this line in config/application.rb
:
# config/application.rb
config.active_record.schema_format = :sql
Now we can run migrations:
rake db:migrate
After changing the schema format, new structure.sql
file should be created. It's not going to look that nice like schema.rb
, but at least it contains all the details. Let's try creating some events from rails console
:
calendar = Calendar.create
begins_at = "2016-05-02 12:00:00"
finishes_at = "2016-05-02 16:00:00"
Event.create(calendar_id: calendar.id, begins_at: begins_at, finishes_at: finishes_at)
Creating first event obviously works, but what's going to happen when we try to create an event with exactly the same dates?
Event.create(calendar_id: calendar.id, begins_at: begins_at, finishes_at: finishes_at)
ActiveRecord::StatementInvalid: PG::RaiseException: ERROR: Period between 2016-05-02 12:00:00 and 2016-05-02 16:00:00 is already taken
Awesome (haha, doesn't happen that often to be happy when an error occurs ;)), that's exactly what we wanted to achieve - the trigger keeps our data safe making sure that we won't have any duplicated events or events covering the same time period. The last thing we should do is to mimic such validation and add it to form object or model for better user experience, but it's beyond the scope of this article. It's going to duplicate some logic between the code in the application and the database, but in this case there's no way to DRY it up.
Wrapping up
PostgreSQL triggers and procedures are not something that you will often want to use in Rails applications, but sometimes there's no other solution, especially when you have more complex rules for data integrity. In such cases, triggers and procedures are the right tool for the job.