The strange case of ActiveRecord ConcurrentMigrationError

A few weeks ago, after deploying some new code to production I received a notification about a strange new-to-me error:
` ActiveRecord::ConcurrentMigrationError: Cannot run migrations because another migration process is currently running`
The new deployed code consisted only of Solidus and Rails minor version upgrades, so at first I suspected some unanticipated bug/incompatibility. But the error had not happened either on CI or on the staging server, so I concluded that it was probably something more pertinent to the application environment.
Let’s get back to the error message, which happens to be quite self-explanatory.
I started to reason about what could have gone wrong: while staging and CI run on
single Amazon instances, the production environment is based on multiple AWS t2.large
instances, and the Rails deploy procedure is basically concurrent on each of them… that’s just
how Opsworks does it.
So, during the deploy process one instance was still running migrations when also uhura
tried to run them as well, eventually raising the error.
The error is generated by a Rails feature named Advisory locking added in 2015:
Attempting to run a migration while another one is in process will raise a ConcurrentMigrationError instead of attempting to run in parallel with undefined behavior. This could be rescued and the migration could exit cleanly instead. Perhaps as a configuration option?
This error can be quite insidious, as the deploy will fail on that instance, which will retain the previous application code. This time we were quite lucky as the new code was still functionally the same as before, but in other cases this could have easily caused a few headaches.
Imagine having one instance of the application serving outdated pages, not showing new
Solidus’ features, or showing features that don’t exist anymore. And this is not the worst case
scenario… the instance could be throwing 500 server error
to each visitor if something
more serious was wrong.
So, in order to fix the error (or, better said, ignore it and live happily) we customized the migration rake as follows:
# lib/tasks/migrate_ignore_concurrent.rake
namespace :db do
namespace :migrate do
desc 'Run db:migrate but ignore ActiveRecord::ConcurrentMigrationError errors'
task ignore_concurrent: :environment do
begin
Rake::Task['db:migrate'].invoke
rescue ActiveRecord::ConcurrentMigrationError
# Do nothing
end
end
end
end
and run this instead of the usual one:
bundle exec rake db:migrate:ignore_concurrent
This solution is, as often happens in life, not without caveat.
Consider the rare case when the first instance that got the lock is running a long DB migration (I’m not judging here, but you should not have long running migrations) which eventually fails. All other instances that tried to acquire the lock would now be already running the new code which also probably requires the new DB… what a day! So, all in all, it’s just a matter of tradeoffs, I chose mine :)