Testing Concurrent Code for Fun and Profit

Everyone knows that multi-threaded code is not a piece of cake. There are lots of publications on how to write concurrent code properly and also lots of well-known algorithms and data structures to choose from. Yet, authors often ignore another important topic and pretend as if it's not worth the discussion. The topic is how to test your concurrent code and that's what we're going to consider today. No way I'm an expert on this matter (and any other matter), so everything below is an attempt to share an opinionated approach that appears to work well for me and helps to find vast majority of the concurrency bugs.

To narrow down the topic, we're going to use Java and try to cover the SPSC queue we built recently with a minimal set of tests. While the observed tests are minimal, you should be able to write tests for your own concurrent code after reading this blog post. As for the programming language, everything we talk of should be applicable to any language with threads or green threads, so C/C++, Rust, Golang, Zig and many others apply.

The interface of our queue is very simple and consists of two methods:

public class SpscBoundedQueue<E> {

    // Some boring stuff, like fields and constructor.

    /**
     * Publishes an item to the tail of the queue, if it's not full.
     */
    public boolean offer(E e) { /* some code goes here */ }

    /**
     * Removes and returns the head of the queue, if it's not empty.
     */
    public E poll() { /* some code goes here */ }
}

So, where to start when testing it?

Where to start?

The best thing to do as the first step is to write good old single-threaded tests. Many aspects of your code can (and should) be covered with such tests. Those are input validation, boundary checks, basic invariants of your data structure, methods that aren't thread-safe and, hence, will be always called from a single thread - all of these should be covered with "cheap" (in terms of the execution time) tests. Keep in mind that the aforementioned list is not complete. You should always try to cover as much as possible with single-threaded tests.

In our case, we should cover the following things:

Input validation - our original code was minimal, so it lacked things like positive size validation in the constructor. Such validation is a perfect candidate for a single-threaded test.
Boundary checks - our queue has a limited size, so we expect offer() to return false when the queue is full, as well as poll() to return a null when the queue is empty.
Basic invariants - we should test the "First in, first out" (FIFO) property of our data structure.
Non thread-safe methods - again, we omitted many other methods that are handy, such as clear() method. Due to the queue design, those methods have to be called from a single thread in absence of any other queue mutation calls. Single-threaded tests are to the rescue.

Here is a test that illustrates items 2 and 3 from the above list:

@Test
public void testSerial() {
    SpscBoundedQueue<Integer> queue = new SpscBoundedQueue<>(10);

    Assert.assertNull(queue.poll());

    for (int i = 0; i < 10; i++) {
        Assert.assertTrue(queue.offer(i));
    }
    Assert.assertFalse(queue.offer(42));

    for (int i = 0; i < 10; i++) {
        Assert.assertEquals((Integer) i, queue.poll());
    }
    Assert.assertNull(queue.poll());
}

It's a bit dense and could be split into multiple, more focused tests, but it's not a big deal considering that it's an illustration of the concept. The test verifies both boundary checks, as well as the FIFO property.

Enough silly serial tests, let's run things in parallel!

How to break things?

Our main goal is to find any thread-safety violations. But what does it mean in practice? Such violations may be very infrequent and hard to reproduce. Sometimes race conditions, data races and other unpleasant things may even remain unnoticed until you hit a certain edge case. The sad truth is that testing concurrent code is hard and you can never be sure that your test suite is good enough. But that means that you should do your best at writing concurrent tests to eliminate most, if not all, thread-safety bugs.

Each concurrent test scenario has to be thought separately. It should reproduce a use case and involve a set of related methods of your data structure(s). In our example, things are simple: we need to test the offer() and poll() methods running on two different threads (remember, we deal with a Single Producer Single Consumer queue). But is it enough to call these methods like crazy from separate threads? Not really. Just like with single-threaded tests, we have to think of the invariants we have in the thread-safe part of the code.

To keep things practical, let's start with the skeleton of the test:

@Test
public void testHammer() throws InterruptedException {
    final int iterations = 1_000_000;

    // Prepare the data structure.
    SpscBoundedQueue<Integer> queue = new SpscBoundedQueue<>(10);

    // Prepare helper data structures (test infra).
    CyclicBarrier barrier = new CyclicBarrier(2);
    CountDownLatch latch = new CountDownLatch(2);
    AtomicInteger anomalies = new AtomicInteger();

    // Prepare and start the threads.
    ConsumerThread consumer = new ConsumerThread(queue, barrier, latch, anomalies, iterations);
    consumer.start();
    ProducerThread producer = new ProducerThread(queue, barrier, latch, anomalies, iterations);
    producer.start();

    // Wait for the threads to finish.
    latch.await();

    // Verify that there were no thread-safety violations.
    Assert.assertEquals(0, anomalies.get());
}

The above code is quite straightforward. The test runs two threads and involves a queue, as well as a number of helper synchronization primitives. Once the threads are done, it checks the anomalies counter to verify that there were no thread-safety violations. As the test name suggests, it's a hammer style test, i.e. it aims to "bash" the queue from multiple threads until it breaks (or not). Such tests sometimes called stress tests for concurrent code.

Now, let's see what our threads actually do. We start with the producer thread:

private static class ProducerThread extends Thread {

    // Boring stuff such as fields and constructor goes here...

    @Override
    public void run() {
        try {
            // Await for the consumer thread, so we start simultaneously.
            barrier.await();
            // Start publishing incrementing numbers to the queue.
            for (int i = 0; i < iterations; i++) {
                while (!queue.offer(i)) {
                    // Yes, we want to busy spin.
                }
            }
        } catch (Exception e) {
            // Any exception we get when producing is an anomaly.
            e.printStackTrace();
            anomalies.incrementAndGet();
        } finally {
            // Notify the main thread that we're done.
            latch.countDown();
        }
    }
}

Producer's code is simple and illustrative. Notice that we're publishing incrementing numbers to the queue. As we're going to see in the consumer's code, that's to be able to verify our main invariant - the FIFO property. One more important thing here is that we don't have any kind of back-off calls in the while loop. Instead, we prefer to busy spin. That's because calls like Thread#sleep() or LockSupport.parkNanos() or anything similar involve synchronization that might fix your otherwise broken code. Also, if you need to emulate some local work as the back-off or on successful operation, prefer using Blackhole#consumeCpu() from JMH or similar methods of your choice. Finally, due to the same consideration, it's definitely a bad idea to call System.out.println() or log anything is the main loop.

The consumer thread's code is also pretty simple:

private static class ConsumerThread extends Thread {

    // Boring stuff such as fields and constructor goes here...

    @Override
    public void run() {
        try {
            barrier.await();
            // Consume all items from the queue.
            int prev = -1;
            while (prev != iterations - 1) {
                Integer element = queue.poll();
                if (element == null) {
                    // Again, we busy spin.
                    continue;
                }
                // Check that we received the incremented number.
                if (element != prev + 1) {
                    anomalies.incrementAndGet();
                }
                prev = element;
            }
            // We expect the queue to be empty now.
            if (queue.poll() != null) {
                anomalies.incrementAndGet();
            }
        } catch (Exception e) {
            e.printStackTrace();
            anomalies.incrementAndGet();
        } finally {
            latch.countDown();
        }
    }
}

The above code completes the picture: our test aims to verify the FIFO property of the queue and nothing more than that.

Variations of the concurrent tests are important. If we would be testing a MPMC queue, it would be a good idea to have multiple tests with different number of producer and consumer threads: single producer - single consumer, single producer - multiple consumers, multiple producers - single consumer, multiple producers - multiple consumers. If we have some kind of local work emulation in the tests, it would be nice to test it with different CPU time too. Same applies to data structure capacity and any other things that may affect the flow of your code. Thread-safety violations are a question of unlucky (or lucky, if you want to find bugs) ordering and visibility, so the more variations of the scenario you run, the higher chances to find a violation.

The complete test source code may be found here. If you're proficient in Golang and fancy to see a more complex application of the above principles, see tests of xsync library. The library consists of a number of concurrent data structures that are certainly more complex than our SPSC queue.

How to run the tests?

Before we wrap up, let's discuss a few tips to squeeze everything from your multi-threaded tests.

First of all, it is a good habit to run the newly written concurrent tests on your dev machine for a few minutes. This might show failures early, without involving many CI runs.

Next, if your code runs on different CPU architectures, make sure to run tests on those. For instance, ARM CPUs have a weaker hardware memory model when compared with x86 ones.

Some language ecosystems have race detector tools, like ThreadSanitizer or Golang's Data Race Detector. If applicable, make sure to configure your CI to run the tests with enabled race detector. It's also worth mentioning jcstress and Lincheck frameworks available in JVM ecosystem. Unlike the aforementioned race detectors, these frameworks require writing dedicated tests, so, in case of concurrent data structure testing, they can be seen as an alternative to the hand-written tests we're discussing today.

Finally, if some of your concurrent tests appear to be flaky, i.e. infrequently fail due to an unknown reason, that may be an indication of an actual bug. Make sure to do your best to reproduce the failure, analyze it and fix the cause.

Let's recap?

Writing thread-safe concurrent code is hard. Writing sufficient tests for such code may be even harder. Here is the summary of what we discussed today:

Always try to cover as much as possible with single-threaded tests.
Write your concurrent tests to stress your code and verify a set of invariants.
Avoid calls that involve additional synchronization in the main loops of your tests.
Variations of the concurrent tests are important.
Test on various CPU architectures (wink-wink ARM).
If applicable, configure your CI server to run the concurrent tests with a race detector.
Flaky tests are your friends. Always do your best to reproduce and analyze them.

I hope you've learned something new today. Good luck with your concurrent tests and see you next time.

Random thoughts on concurrency, databases and distributed systems