When validation is not enough: PostgreSQL triggers for data integrity

Is validation in your models or form objects enough to ensure integrity of the data? Well, seems like you can't really persist a record when the data is not valid unless you intentionally try to bypass validation using save: false option when calling save or using update_column. What about uniqueness validation? A classic example would be a unique email per user. To make sure the email is truly unique we could add a unique index in database - not only would it prevent saving non-unique users when bypassing validation, but also it would raise extra error when 2 concurrent requests would attempt to save user with the same email. However, some validations are more complex that ensuring a value is unique and index won't really help much. Fortunately, PostgreSQL is powerful enough to provide a perfect solution to such problem. Time to meet your new friend: PostgreSQL triggers.

Anatomy of PostgreSQL triggers and procedures

PostgreSQL trigger is like a callback: it's a function that is called on specific event: before or after insert, update, delete or truncate in case of tables and views and for views you can also run a function instead of those events. Triggers can be run either for each row (tables only) and for each statement (both tables and views). The difference between them is quite simple: for each row is run for every modified row and for each statement is run only once per statement. The important thing to keep in mind regarding for each row is that you have a reference to the row being modified, which will be essential in upcoming example.

By running \h CREATE TRIGGER; from psql we can get a generalized syntax for creating triggers:

Command:     CREATE TRIGGER
Description: define a new trigger
Syntax:
CREATE [ CONSTRAINT ] TRIGGER name { BEFORE | AFTER | INSTEAD OF } { event [ OR ... ] }
    ON table_name
    [ FROM referenced_table_name ]
    [ NOT DEFERRABLE | [ DEFERRABLE ] { INITIALLY IMMEDIATE | INITIALLY DEFERRED } ]
    [ FOR [ EACH ] { ROW | STATEMENT } ]
    [ WHEN ( condition ) ]
    EXECUTE PROCEDURE function_name ( arguments )

where event can be one of:

    INSERT
    UPDATE [ OF column_name [, ... ] ]
    DELETE
    TRUNCATE

You can also add a condition for running triggers and timing option, which I'm not going to discuss in greater details, you can find more about them in official docs.

What about function_name? It's a user-defined function returning trigger. Here's a dummy example to give an idea about the syntax for defining functions:

CREATE FUNCTION dummy() RETURNS trigger AS $$
DECLARE
    some_integer_variable int;
BEGIN
    some_integer_variable := 1;
    NEW.counter := some_integer_variable;
    RETURN NEW;
END;
$$ language plpgsql;

We started with defining dummy function taking no arguments returning type of trigger. Next, we have a DECLARE block where we declare temporary variables, in our case it's some_integer_variable of type int. Within BEGIN / END block we define the actual function body: we assign a dummy value to some_integer_variable variable using := operator and then we do some assignment using implicit NEW variable which is basically a row referenced by given statement. This variable is available only when running a trigger for each row, otherwise it will return NULL. Any trigger has to return either a row or NULL - in this example we return NEW row. At the end we have a declaration of writing function in plpgsql language.

Let's take a look at some real world code to see triggers in action.

Using triggers for data integrity

A good example where we could use a trigger for more complex validation could be a calendar event: we need to ensure that no other event exists between some begins_at time and finishes_at time. We should also scope it only to a given calendar and exclude id of event being updated - when creating new events it wouldn't matter, but without excluding id we wouldn't be able to update any event. So what we actually want to achieve is to create a trigger that will be run before insert or update on events table for each row.

Let's start with generating Calendar model and Event model with reference to calendar and begins_at and finishes_at attributes:

rails generate model Calendar
rails generate model Event calendar_id:integer begins_at:datetime finishes_at:datetime

and also generate extra migration adding trigger and procedure ensuring that only one event can be created for given period of time for a calendar:

rails generate migration add_validation_trigger_for_events_availability

add the SQL code :

class AddValidationTriggerForEventsAvailability < ActiveRecord::Migration
  def change
    execute <<-CODE
      CREATE FUNCTION validate_event_availability() returns trigger as $$
      DECLARE
        events_count int;
      BEGIN
        events_count := (SELECT COUNT(*) FROM events WHERE (
          events.calendar_id = NEW.calendar_id AND events.begins_at < NEW.finishes_at AND events.finishes_at > NEW.begins_at AND events.id != NEW.id
        ));
        IF (events_count != 0) THEN
          RAISE EXCEPTION 'Period between % and % is already taken', NEW.begins_at, NEW.finishes_at;
        END IF;
        RETURN NEW;
      END;
      $$ language plpgsql;

      CREATE TRIGGER validate_event_availability_trigger BEFORE INSERT OR UPDATE ON events
      FOR EACH ROW EXECUTE PROCEDURE validate_event_availability();
    CODE
  end
end

Our validate_event_availability function performs query to count all events that are between given time period for specified calendar excluding own id (so that the row being updated is not considered here, which would prevent updating any event). If any other event is found, the exception is raised with an error message - % characters are used for interpolation of begins_at and finishes_at attributes. If no other event is found, we simply return the row.

We want to define a trigger running this function before creating any new event or updating existing ones, so we need to run it BEFORE INSERT OR UPDATE FOR EACH ROW.

It might be a good idea to switch also to :sql schema format - the standard :ruby format can't handle triggers at this point. Add this line in config/application.rb:

# config/application.rb
config.active_record.schema_format = :sql

Now we can run migrations:

rake db:migrate

After changing the schema format, new structure.sql file should be created. It's not going to look that nice like schema.rb, but at least it contains all the details. Let's try creating some events from rails console:

calendar = Calendar.create
begins_at = "2016-05-02 12:00:00"
finishes_at =  "2016-05-02 16:00:00"
Event.create(calendar_id: calendar.id, begins_at: begins_at, finishes_at: finishes_at)

Creating first event obviously works, but what's going to happen when we try to create an event with exactly the same dates?

Event.create(calendar_id: calendar.id, begins_at: begins_at, finishes_at: finishes_at)

ActiveRecord::StatementInvalid: PG::RaiseException: ERROR:  Period between 2016-05-02 12:00:00 and 2016-05-02 16:00:00 is already taken

Awesome (haha, doesn't happen that often to be happy when an error occurs ;)), that's exactly what we wanted to achieve - the trigger keeps our data safe making sure that we won't have any duplicated events or events covering the same time period. The last thing we should do is to mimic such validation and add it to form object or model for better user experience, but it's beyond the scope of this article. It's going to duplicate some logic between the code in the application and the database, but in this case there's no way to DRY it up.

Wrapping up

PostgreSQL triggers and procedures are not something that you will often want to use in Rails applications, but sometimes there's no other solution, especially when you have more complex rules for data integrity. In such cases, triggers and procedures are the right tool for the job.

posted in: PostgreSQL, Database, ActiveRecord, Quick Tips, Patterns, Triggers

Anatomy of PostgreSQL triggers and procedures

Using triggers for data integrity

Wrapping up

Share this article if you liked it!