A Field Guide to Unit Testing: Trustworthiness

How to make your tests trustworthy?

THE UNIT TESTING SERIES

  1. A Field Guide to Unit Testing: Overview
  2. A good test is trustworthy
    • Test isolation (and what is dependency)
    • Definition of a unit
    • Testing scenarios
  3. A good test is maintainable
    • Testing one thing at a time
    • Use setup method, but use it wisely
  4. A good test is readable
    • Test description
    • Test pattern

In A Field Guide to Unit Testing: Overview, we defined what a SUT (subject under test) is, looked at three different testing methodologies on the spectrum of scope (the universe of testing) and functionality (the testing pyramid). We also defined three pillars of a good test: trustworthiness, maintainability, and readability. Below is a brief overview of the topics for each of those pillars:

  • A good test is trustworthy
    • Test isolation (and what is dependency)
    • Definition of a unit
    • Testing scenarios
  • A good test is maintainable
    • Testing one thing at a time
    • Encapsulation
    • The tricky setup method
  • A good test is readable
    • Structure
    • Naming

This article covers the first part: A good test is trustworthy. A warmup question: How do we define a trustworthy test? Have an answer before you read on.

Two criteria of a trustworthy test

We define a trustworthy test to have the two attributes:

  1. We can be confident that it's the SUT that doesn't work when a test fails.
  2. We can be confident that the SUT works when the tests pass.

(refresher: SUT stands for Subject Under Test)

Hold on to that thought, we will come back to it.

The restaurant that serves French toast

The simple restaurant that we conjured up last time has three entities: Cook, Server and StorageRoom. This restaurant serves one thing: French toast. Below is the implementation of Cook and StorageRoom. Cook has a make_french_toast method, which does three things:

  1. Get egg and bread from StorageRoom.
  2. Make French toast with the ingredients.
  3. Return the plate of French toast as a string.
# Implementation of Restaurant::Cook

module Restaurant   
    class Cook
    def make_french_toast
      # [Step 1] Get eggs and bread from StorageRoom
      egg = StorageRoom.get('egg')
      bread = StorageRoom.get('bread')

      # Some error cases handling
      return 'Oops running out of eggs!' unless egg
      return 'Oops running out of bread!' unless bread

      # [Step 2] Making French toast with the eggs and bread
      puts 'Making French toast'

      # [Step 3] Return the plate of French toast as a string
      'French toast is here. Ugh.'
    end
  end
end

StorageRoom has get and put class methods. They interact with a database.

# Implementation of Restaurant::StorageRoom

module Restaurant
  class StorageRoom
    def self.get(ingredient)
      # We don't really care about the implementation.
      # Just know that it's supposed to return the ingredient if the database has it.
    end

    def self.put(ingredient)
      # We don't really care about the implementation.
      # Just know that it's supposed to insert the ingredient into the database.
    end
  end
end

(note: the full Restaurant code and its tests can be found here.

The accidentally integrated unit test

This is the testing pyramid we looked at previously:

testing-pyramid

It is a perfectly valid pyramid–except it's not usually the case in reality. What we often see in the wild is the pyramid below:

testing-pyramid-in-reality

At the bottom, there are integration tests disguised as unit tests. Why?

Dependency, and why it is bad for unit tests

The first attribute of a trustworthy test is

We can be confident that it's the SUT that doesn't work when a test fails.

It seems almost redundant. How would it be otherwise? It turns out, there are a lot of ways for a test to fail besides the SUT itself. One (and a huge part) of them is a malfunctioning dependency.

A dependency is something that a SUT calls that is out of its control.

Some commonly used dependencies include time, random number generator, database, or simply just a message sent to another class. For example, we'll say that Cook's make_french_toast depends on StorageRoom because it cannot control the result when it tries to get eggs and bread from StorageRoom. StorageRoom may return the eggs and bread, nothing at all, or a pair of unmatched socks for all it matters.

Imagine when we write a test for make_french_toast test that actually calls a real StorageRoom. If it fails, the culprit may as well be that StorageRoom is returning weird stuff, instead of the make_french_toast method itself.

A real unit test requires careful attention to be fully isolated from all the dependencies. If not, it becomes an integration test.

But why does it matter if we write integration instead of unit tests?

Getting rid of dependencies is hard. Isolation is hard. So why don't we just write integration tests and be done with it?

Integration tests have their own benefits, but there are some advantages that only unit tests possess:

  • Production code changes. All the time. If somehow we need to change the signature of a method that many other classes depend on, it'd a chaos to update the all the code occurrences in tests. With unit tests, we only have to change one place, with is the method itself, and the work is done.
  • Debugging is easier. When a test fails, it's easier to see what went wrong. We don't need to trace through 10 layers of dependencies to see which part fails to work.
  • They are fast. They rid of all the calls to external dependencies (most notably the database), so running them can be lightening fast. This is very important, because if it takes 20 minutes every time to run through the tests, let's face it, we won't run the tests as often (as we should), which will defeat the purpose of having tests in the first place.

Time to go back to our restaurant. Shall we?

The Cook's unit tests

Below are two tests for Restaurant::Cook's make_french_toast method that we just implemented. They're almost identical, but only one is a unit test.

# Test cases for `Restaurant::Cook`'s `make_french_toast` method, using RSpec.

RSpec.describe Restaurant::Cook do
  describe '#make_french_toast' do
    # First test
    it 'with everything prepared returns French toast as string' do
            # Prepare ingredients
      Restaurant::StorageRoom.put(:egg)
      Restaurant::StorageRoom.put(:bread)

      result = Restaurant::Cook.new.make_french_toast

      expect(result).to eq 'French toast is here. Ugh.'
    end

    # Second test
    it 'with everything prepared returns French toast as string' do
      # Prepare ingredients
      Restaurant::StorageRoom.stub(:get).with('egg') { 'egg' }
      Restaurant::StorageRoom.stub(:get).with('bread') { 'bread' }

      result = Restaurant::Cook.new.make_french_toast

      expect(result).to eq 'French toast is here. Ugh.'
    end
  end
end

Can you see the difference? Look at the first two lines of each test, where we prepare the ingredients. The first test calls Restaurant::StorageRoom.put(ingredient) to put ingredients into the database, so that when Cook#make_french_toast later calls Restaurant::StorageRoom.get(ingredient), those ingredients can be returned.

The second test takes a different approach. It stubs the StorageRoom#get method to return arbitrary values for the storage room items (egg and bread). If you don't already know what a stub is, it is a testing technique that enables us to return a fixed response for a certain method call. In our example,

Restaurant::StorageRoom.stub(:get).with('egg') { 'egg' } 

means that the call of Restaurant::StorageRoom.get(:egg) is intercepted. This message will not be sent to the actual Restaurant::StorageRoom. Instead, we pretend that it's sent, and that a value 'egg' is returned. In other words, we have isolated this test from its dependency for Restaurant::StorageRoom.

The second test, therefore, is a unit test, because it is isolated from its dependencies. The first test is an integration test, because to make it pass, we need to guarantee that StorageRoom works as well.

If it's still not clear, run the two tests now. The first test will fail, because we didn't (and will not) implement the StorageRoom functions, yet the first test depends on those functions. The second test will pass, because we use stubs to rid the SUT of its dependencies.

A brief recap

Wow, that was a bit long.

Now we know that having full isolation in tests allows us to achieve one of the two criteria of a trustworthy unit test: when a test fails, we can be confident that it's precisely the SUT, not anything else, that doesn't work.

That concludes the first part in our journey to a trustworthy test. Next, let's explore the second criteria:

We can be confident that the SUT works when the tests pass.

Restaurant now serves

Of the three entities in our Restaurant, we've implemented Cook and StorageRoom. There's only one left: Server.

Server has two functions. take_order receives a dish name and passes it to Cook if the dish name is "French toast," or else it returns an error message "We only make French toast. Take it or leave it." serve takes a dish (that Cook has made), decorates it, and serves it with gusto.

module Restaurant
  class Server
    def take_order(dish)
      if dish == 'French toast'
        Cook.new.make_french_toast
      else
        "We only make French toast. Take it or leave it."
      end
    end

    # This method is called when Cook is done cooking the dish.
    def serve(dish)
      decorated_dish = decorate(dish)

      "#{decorated_dish} yo!"
    end

    private

    def decorate(dish_with_murmur)
      dish_name = dish_with_murmur.gsub(' is here. Ugh.', '')
      "Delicious #{dish_name}"
    end
  end
end

(note: the full Restaurant code and its tests can be found here.

Now, how many unit test should we write for Server? Take your time to think about it, reason with it, keep your answer in mind, and read on.

What is a unit???

To write a unit test, it is very important to know what a unit is. Or what it can be. We might already have some pre-conceived ideas. For example,

A unit is a function.

should come almost instantly to everyone's mind. Nod your head if your answer is 3–one for each method in Restaurant::Server (take_order, serve, decorate). It's a valid answer, but not ideal. The concept of a unit should not be defined by any physical code boundaries, convenient as they may be, but interfaces, that is, we can think of a unit as something that takes an input and produces an output.

A unit is a piece of code that takes certain inputs, and returns an observable output to the end user.

Often that "piece of code" is a function, but there are times when it spans several functions or even several classes. Our tendency might be to minimize the size of a unit of work being tested, but if making our SUT bigger will produce a more noticeable end result, the tests will be more maintainable.

Let's look at a curious question first.

Should we unit test private methods?

Should we have a unit test for the private decorate method in Server class?

First of all, let's look at the unit definition again. "[O]bservable result to the end user". A private method is a private method precisely because it does not have an observable result to the end user, so we have very good reasons that testing private methods is not necessary.

Secondly, from a "test as documentation" point of view, we shouldn't test private methods either. Tests provide documentation about the SUT. They tell a story about how this SUT interact with the world. If we include private methods, we will distract, and possibly confuse, people from the main purpose of this SUT.

Thirdly, private methods are implementation details.

We shouldn't test "how" the SUT works, but that it works.

Private methods are of no concerns in the eyes of the public. So in our Restaurant::Server class, instead of testing decorate directly, we will treat it as part of the implementation for serve. So, if there really is a bug in this private method, we will catch it in our test for the public method that calls it.

Testing scenarios

The answer is 2, then? Because there are two public methods in Restaurant::Server class?

It's not 2. Look at the take_order again. Does it always follow the same logic with any input?

In #take_order, we have a conditional statement: if the input argument (dish) is "French toast," the server will pass this order to Restaurant::Cook; if not, the server will simply return a string "We only make French toast. Take it or leave it."

This method has two logic flows to handle the input, based on the content of it. We can say there are two scenarios for this SUT, and we should write a unit test for each scenario. Why? Because unit tests should cover all situations. If we only test this SUT with the input "French toast," the following sequence of things will happen:

  1. We do not have expectations against the code where the input is not "French toast."
  2. We are not sure what happens if the input is not French toast.
  3. We cannot be confident that the SUT works when the tests pass. 😱

Hmm. Doesn't look fun is it?

So we will write the following three unit test cases for Restaurant::Server#take_order:

  1. It passes the order to Cook if order is French toast.

  2. It returns an error message if order is not French toast.

  3. It does not pass the order to cook if order is not French toast.

You may have questions: Why do we need the third test case? Its scenario is exactly the same as the second one. If we do need it, why can't we just add an assertion in the second test case?

Those questions will be answered in more depth when we get to the maintainable and readable pillars of unit tests in the following articles. Here's a concise version: the goal of the third case is both to make our intention clearer and to test its behavior, and the reason that we don't put it into the second test case is to follow the principle that we should test only one thing at a time.

Yet, in the end, you must be the arbiter of the tests you write. It is up to you to choose.

Alas, the actual test code (for Restaurant::Server#take_order):

RSpec.describe Restaurant::Server do
  describe '#take_order' do
    # 1.
    it 'with French toast passes the order to Cook' do
      expect_any_instance_of(Restaurant::Cook).to receive(:make_french_toast)

      Restaurant::Server.new.take_order('French toast')
    end

    # 2.
    it 'with other orders returns an error message' do
      result = Restaurant::Server.new.take_order('Not French toast')

      expect(result).to be_kind_of String
    end

    # 3.
    it 'with other orders does not pass order to Cook' do
      expect_any_instance_of(Restaurant::Server).not_to receive(:serve)

      Restaurant::Server.new.take_order('Not French toast')
    end
  end
end

What about #serve then?

In our case, I'd write one test for #serve, because there is only one logic flow in it.

  • It returns the decorated dish as a string.
RSpec.describe Restaurant::Server do
    describe '#serve' do
    it 'returns decorated dish as a string' do
      dish_from_cook = 'French toast is here. Ugh.'
      result = Restaurant::Server.new.serve(dish_from_cook)

      expect(result).to eq 'Delicious French toast yo!'
    end
  end
end

And the tests pass.

restaurant-server-serve-test-passes

There's more

So that concludes the first pillar of a good test: being trustworthy. In this article, we broke the definition of trustworthiness down to two criteria (the SUT works when the tests pass, it doesn't work when the tests fail), and looked at three topics (test isolation, definition of a unit, testing scenarios) that help us understand achieve this goal.

In the next article, we will look into the other two pillars for a good test: being maintainable and being readable. See you there!


  • Find me at