Case study: Check out Rarebird’s brand-new Hydrogen store →

The strange case of ActiveRecord ConcurrentMigrationError

andrea_longhi

30 Jan 2019 - Development, Migrations, Ruby on Rails, Solidus

Andrea Longhi

2 mins read
The strange case of ActiveRecord ConcurrentMigrationError

A few weeks ago, after deploying some new code to production I received a notification about a strange new-to-me error:

` ActiveRecord::ConcurrentMigrationError: Cannot run migrations because another migration process is currently running`

The new deployed code consisted only of Solidus and Rails minor version upgrades, so at first I suspected some unanticipated bug/incompatibility. But the error had not happened either on CI or on the staging server, so I concluded that it was probably something more pertinent to the application environment.

Let’s get back to the error message, which happens to be quite self-explanatory. I started to reason about what could have gone wrong: while staging and CI run on single Amazon instances, the production environment is based on multiple AWS t2.large instances, and the Rails deploy procedure is basically concurrent on each of them… that’s just how Opsworks does it.

aws-instances

So, during the deploy process one instance was still running migrations when also uhura tried to run them as well, eventually raising the error.

The error is generated by a Rails feature named Advisory locking added in 2015:

Attempting to run a migration while another one is in process will raise a ConcurrentMigrationError instead of attempting to run in parallel with undefined behavior. This could be rescued and the migration could exit cleanly instead. Perhaps as a configuration option?

This error can be quite insidious, as the deploy will fail on that instance, which will retain the previous application code. This time we were quite lucky as the new code was still functionally the same as before, but in other cases this could have easily caused a few headaches.

Imagine having one instance of the application serving outdated pages, not showing new Solidus’ features, or showing features that don’t exist anymore. And this is not the worst case scenario… the instance could be throwing 500 server error to each visitor if something more serious was wrong.

So, in order to fix the error (or, better said, ignore it and live happily) we customized the migration rake as follows:

  # lib/tasks/migrate_ignore_concurrent.rake

  namespace :db do
    namespace :migrate do
      desc 'Run db:migrate but ignore ActiveRecord::ConcurrentMigrationError errors'
      task ignore_concurrent: :environment do
        begin
          Rake::Task['db:migrate'].invoke
        rescue ActiveRecord::ConcurrentMigrationError
          # Do nothing
        end
      end
    end
  end

and run this instead of the usual one:

bundle exec rake db:migrate:ignore_concurrent

This solution is, as often happens in life, not without caveat.

Consider the rare case when the first instance that got the lock is running a long DB migration (I’m not judging here, but you should not have long running migrations) which eventually fails. All other instances that tried to acquire the lock would now be already running the new code which also probably requires the new DB… what a day! So, all in all, it’s just a matter of tradeoffs, I chose mine :)