The Perils of Parallel Testing in Ruby on Rails

0
47
The Perils of Parallel Testing in Ruby on Rails


Have you ever heard someone complain about their tests being too fast? Me neither.

Fast tests mean fast feedback. Whether you run them locally or in a continuous integration pipeline,
the earlier your tests finish, the earlier you can react to failures and improve your code. Besides the productivity
gains, it is well known that slow tests make developers grumpy. Nobody likes their developers grumpy.

With all that said, creating a lightning-fast test suite isn’t always as easy as you’d hope. Luckily, Rails 6 introduced
an exciting feature called parallel testing. It is effortless to get started with and can speed up your tests by a
lot. However, there are some pitfalls to watch out for.

What Is Parallel Testing?

What does parallel testing even mean?

When you run a test suite, your test runner will typically spawn a single process to execute your tests one
after the other — or serially. A single test process uses a single CPU core.
As you can probably imagine, this approach doesn’t take full advantage of modern hardware, which often sports dozens
of CPU cores.

You may have a fancy MacBook with 10 CPU cores, but sadly, that won’t make your tests go any faster!

We can change this by distributing individual tests to multiple worker processes. Tests will no longer run after
each other, but next to each other — in parallel. Running your test suite on two workers is twice as fast
as running the same test suite on a single worker.

The more cores your machine has, the more worker processes are feasible, and thus, the faster your test suite will
finish. Say your tests usually take eight minutes to run. Running those same tests using four worker processes will
take the runtime down to around two minutes!

Imagine doing the same on a nice, fat 16 core machine, where we can spawn sixteen workers. Very nice indeed!

Configuring Parallel Testing in Rails

So how do we get there? Until recently, you could use third-party gems to parallelize your test suite, but starting from
Rails 6, parallel tests come standard. It’s as easy as adding parallelize to your tests:

class ActiveSupport::TestCase
  parallelize(workers: :number_of_processors)
end

Using this configuration, Rails will automatically spawn worker processes based on the number of processors in your
machine. Rails will also create namespaced databases (e.g. database-test-0, database-test-1, etc.)
to run your tests against.

That’s all it takes to get started! Of course, there are some additional configuration options if you need them.

Sometimes, you may have to perform a specific setup or cleanup for parallel tests. Rails provides two hooks for you
to use — parallelize_setup and parallelize_teardown. These are called before and after new worker processes spawn:

class ActiveSupport::TestCase
  parallelize_setup do |worker|
    # setup
  end
 
  parallelize_teardown do |worker|
    # cleanup
  end

You can also manually set the number of workers:

class ActiveSupport::TestCase
  parallelize(workers: 4) # Use 4 worker processes
end

Alternatively, use the PARALLEL_WORKERS environment variable to override an existing configuration:

PARALLEL_WORKERS=4 rails test

There is also the option to use threads instead of workers to parallelize your test suite.

class ActiveSupport::TestCase
  parallelize(workers: :number_of_processors, with: :threads)
end

with: :threads is the default option when using JRuby or TruffleRuby. Using threads, in theory, provides slightly better
performance. Threads require less overhead than processes, after all. In practice, however, I never found using threads
all too useful, and you should be fine just sticking to process-based parallelization for the most part.

Beware the Pitfalls

So all you need to do is add parallelize to your existing tests to experience incredible speedup? It’s that easy!

If you are lucky, that really is true. It’s more likely that you will hit some unexpected snags when first adding parallelization to your existing test suite. This certainly was the case for me!

Let’s get one thing out of the way. If you use RSpec rather than Minitest, you are out of luck. RSpec does not
support Rails 6 built-in parallel testing. There is an ongoing discussion
about changing that, but there hasn’t been any significant progress for a while. If you want parallel tests with RSpec,
your best bet is still using third-party gems such as grosser/parallel_tests.

Another unexpected issue that you might face is that running a small number of tests in parallel ends up being slower
then running them serially. Setting up parallel tests comes with a significant overhead — such as creating multiple
databases — which can eliminate any gains you might get from parallelization.

You might be better off disabling parallel tests for a small number of tests. You can do so by using the
PARALLEL_WORKERS environment variable:

PARALLEL_WORKERS=0 rails test test/controllers/my_controller_test.rb

Rails 7 addressed this by enabling parallel execution only when you execute many tests. So if you’ve already
upgraded, you won’t experience this problem. Per default, the parallelization threshold is set to 50, but you can override
it:

config.active_support.test_parallelization_threshold = 123

The last problem I want to highlight is the one you are most likely to face, and it is also the most insidious and hardest
to deal with. You may start seeing random failures when enabling parallel tests for your test suite. To understand how
parallelization can cause this, let’s look at a simple test case:

class FileTest
  def teardown
    File.delete('test.txt')
  end
 
  test 'create file' do
    file = File.write('test.txt', 'created')
 
    assert_path_exists('test.txt')
  end
 
  test 'delete file' do
    file = File.write('test.txt', 'deleted')
 
    File.delete('test.txt')
 
    assert_not(File.exist?('test.txt'))
  end
end

Besides being a bit useless, this test is perfectly fine. It will pass 100% of the time as long as it’s run serially.
Each test is isolated, and executing these tests in random order does not cause them to fail. That changes when you
add multiple processes or threads to the mix.

When running tests in parallel, the execution order of individual statements in your tests can get changed up due to
CPU scheduling. Looking at the example, you’ll sometimes see execution orders such as this one:

# Worker 1 executes
file = File.write('test.txt', 'created')
 
# Worker 2 executes
file = File.write('test.txt', 'deleted')
File.delete('test.txt')
assert_not(File.exist?('test.txt'))
 
# Worker 1 executes
assert_path_exists('test.txt')

Since Worker 2 deleted the file before Worker 1’s assertion was executed, the first test will fail — sometimes. To add
insult to injury, a different execution order would sometimes make the second test fail and the first one pass.

This simple example illustrates an issue that extends not only to files but to any singleton resource that your tests
access in a non-thread-safe way. Suppose you write to a Redis database or an Elasticsearch index. In that case, you’ll likely experience
similar complications. What’s worse, it may take you a while to uncover all tests that cause random failures — and
even more time to fix all of them.

There is no silver bullet to address flaky parallel tests. In general, you will need to ensure that multiple test processes
do not share resources. For files, use Tempfiles.
Use parallelize_setup to create namespaced resources (e.g. Redis databases). And so on.

Adding Parallel Testing to Existing Rails Tests

Let’s say you struggle with random test failures due to parallelization and don’t have the time to fix them. However, you still
want to reap the benefits of parallel testing. You may prefer to enable it only for a subset of your tests.

Only tests
that call parallelize will be parallelized after all, and by using concerns or parent classes, you can add parallel
testing to your test suite one test class at a time. You could create a module like this:

module Parallelize
  def self.included(base)
   base.class_eval do
      parallelize(workers: :number_of_processors)
 
      # ...
    end
  end
end

Any test class that includes this module will now run in parallel:

class MyTest < ActiveSupport::TestCase
  include Parallelize
end

Alternatively, you could create a new test class like ParallelTest:

class ParallelTest < ActiveSupport::TestCase
  parallelize(workers: :number_of_processors)
end

Then, inherit from that for tests that should run in parallel — and leave out those that prove problematic.

Parallel Testing as a Bandaid

Parallel testing provides impressive speed gains for little effort. Don’t be fooled, though: it is no substitute for other
approaches to improve your test suites’ speed, but rather an addition.

If you find your test suite is slow and can spare the effort, spend some time profiling it and addressing
the root cause/s for the slowness. A slow test suite with parallel testing added to it will get faster, but never as fast as an already
fast test suite that also runs in parallel!

Wrap Up

In this post, we looked at what
parallel testing is, how you can set it up and how to configure it. If you need a way to make your tests go faster, parallel testing provides just that.

You might face some obstacles when adding parallel testing to your test suite. Don’t be surprised when tests that ran
stable for years suddenly start failing. You can work around them by parallelizing only a subset of your test suite.

No matter which approach you choose, parallel testing is a fantastic tool to speed up your tests!

Happy coding!

P.S. If you’d like to read Ruby Magic posts as soon as they get off the press, subscribe to our Ruby Magic newsletter and never miss a single post!



Source link

Leave a reply

Please enter your comment!
Please enter your name here