Keeping Data Integrity In Check: Conditional Unique Indexes For Soft Delete
Soft delete is a pretty common feature in most of the applications. It may increase complexity of the queries, nevertheless, not deleting anything might be a right default as the data might prove to be useful in the future: for restoring if a record was removed by mistake, to derive some conclusions based on statistics and plenty of other purposes. It may seem like it's a pretty trivial thing: just adding a column like deleted_at
and filtering out records that have this value present. But what happens when you need to do some proper uniqueness validation on both model layer and database level? Let's take a look what kind of problem can easily be overlooked and how it can be solved with a conditional index.
Case study: daily prices for vacation rentals
Let's imagine we are developing a vacation rental software. Most likely the pricing for each day will depend on some complex set of rules, but we may want to have some denormalized representation of base prices for each day to make things more obvious and have some possiblity of sharing this kind of data with other applications, which is quite common in this domain. We may start with adding a DailyPrice
model having a reference to a rental
, having price
value and of course date
for which the price is applicable.
Ensuring uniqueness
Obviously, we don't want to have any duplicated daily_prices
for any rental, so we need to add a uniqueness validation for rental_id
and date
attributes:
# app/models/daily_price.rb
validates :rental_id, presence: true, uniqueness: { scope: :date }
To ensure integrity of the data and that we are protected against race conditions and potental validation bypassing, we need to add a unique index on the database level:
# db/migrate/20161030120000_add_unique_index_for_daily_prices.rb
add_index :daily_prices, [:rental_id, :date], unique: true
Adding soft delete functionality
We have some nice setup already. But it turned out that for recalculating daily_prices
if some rules or values influencing the price change it's much more convenient to just remove them all and recalculate from scratch than checking if the price for given date needs to be recalculated. To be on the safe side, we may decide not to hard remove these rates, but do a soft delete instead.
To implement this feature we could add deleted_at
column, drop the previous index and a new one which will respect the new column. We should also update the validation in model in such case:
# app/models/daily_price.rb
validates :rental_id, presence: true, uniqueness: { scope: [:date, :deleted_at] }
And the migration part:
# db/migrate/20161030120000_add_deleted_at_to_daily_prices.rb
remove_index :daily_prices, [:rental_id, :date], unique: true
add_column :daily_prices, :deleted_at, :datetime
add_index :daily_prices, [:rental_id, :date, :deleted_at], unique: true
Everything should be fine with that, right? What could possibly go wrong here?
Adding index the right way
Let's play with Rails console and check it out:
> time_now = Time.current
=> Sun, 23 Oct 2016 09:44:46 UTC +00:00
> DailyPrice.create!(price: 100, date: Date.current, rental_id: 1, deleted_at: time_now)
COMMIT
> DailyPrice.create!(price: 100, date: Date.current, rental_id: 1)
COMMIT
> DailyPrice.create!(price: 100, date: Date.current, rental_id: 1)
ROLLBACK
ActiveRecord::RecordInvalid: Validation failed: Rental has already been taken
Nah, there can't be any problem, looks like we have the expected behaviour - we couldn't create a non-soft-deleted daily_price for given date and rental. But let's check one more thing:
> price = DailyPrice.new(price: 100, date: Date.current, rental_id: 1)
> price.save(validate: false)
COMMIT
Whoops! Something looks very wrong here. But how is it possible? The index looks exactly like the validation in our model, yet it didn't work like that when we bypassed the validation.
Let's consult PostgreSQL docs. There is something mentioned about null values:
Null values are not considered equal
That is our problem: ActiveRecord considered it as a unique value, but it doesn't work quite like that in PostgreSQL.
Our index should be in fact conditional and look the following way:
add_index :nightly_rates, [:rental_id, :date], unique: true, where: "deleted_at IS NULL"
We could optionally improve the validation in our model to make it look much closer to what we have in database and use conditions
option:
# app/models/daily_price.rb
validates :rental_id, presence: true, uniqueness: { scope: :date, conditions: -> { where(deleted_at: nil) } }
And that's it! There's no way we could compromise data integrity now!
Wrapping Up
Keeping data integrity in check is essential for most of the applications to not cause some serious problems, especially when implementing soft delete. Fortunately, simply by adding PostgreSQL conditional unique indexes we can protect ourselves from such issues.