Test Names That Enforce Architecture

You have a test suite. One feature breaks. Forty tests go red.

Not because forty things are wrong — because forty tests each validated the same contract they do not own.

This happens more often than most teams admit. A session management change breaks connection pool tests. A database schema migration fails authentication tests. Every test that ever touched a shared precondition lights up, and you spend the first twenty minutes of debugging just figuring out what actually broke.

The problem is not fragile code. The problem is tests that reach beyond their scope.

There is a naming convention that prevents this. Not by discipline — by structure.

The naming rule

Every test name encodes three things: the component under test, a severity verb, and the specific behavior being validated.

TEST(ConnectionPool, MustReturnIdleConnectionBeforeAllocatingNew)
TEST(ConnectionPool, ShouldRespectMaxPoolSizeUnderLoad)
TEST(ConnectionPool, MayTimeoutIfAllConnectionsBusy)

TEST(SessionManager, MustInvalidateSessionOnTokenExpiry)
TEST(SessionManager, ShouldReuseExistingSessionForSameUser)

The first argument is the component. The second argument starts with a verb — Must, Should, Shall, or May — followed by the behavior.

This looks like a formatting preference. It is not. It is an architectural constraint.

The constraint: name scopes the body

The rule is simple: if a test is named ConnectionPool_MustReturnIdleConnection, the test body validates that contract and nothing else.

Everything needed to bring the system to the right state for the test is setup — constructed, configured, initialized. But it is not asserted. The only assertions in the body are about the behavior named in the test.

Here is what that looks like in practice:

TEST(ConnectionPool, MustReturnIdleConnectionBeforeAllocatingNew) {
    // -- Setup: bring the system to the state we need --
    ConnectionPool pool(/*max_size=*/5);
    auto conn = pool.acquire();
    pool.release(conn);

    // -- Act --
    auto reused = pool.acquire();

    // -- Assert: ONLY the named behavior --
    EXPECT_EQ(reused.id(), conn.id());
}

Notice what this test does not do. It does not assert that pool.size() is correct after setup. It does not check that release() returned successfully. It does not verify that the pool was initialized with the right capacity. Those are contracts that belong to other tests — MustInitializeWithConfiguredCapacity, MustAcceptReleasedConnection. They have their own names and their own bodies.

The name makes overreach visible. If you find yourself writing an assertion that does not match the test name, you are either testing the wrong thing or the test needs to be split.

First consequence: isolated failure surfaces

When each test validates exactly one contract, failures stop cascading.

If SessionManager breaks, only SessionManager tests fail. ConnectionPool tests stay green — because they set up session state as a precondition but never assert on it. The broken contract lights up exactly the tests that name it, and nothing else.

Compare this to a test suite where every test starts with a block of “sanity check” assertions:

TEST(ConnectionPool, ReturnsConnection) {
    SessionManager sessions;
    ASSERT_TRUE(sessions.isInitialized());  // Why is this here?

    ConnectionPool pool(sessions);
    ASSERT_EQ(pool.size(), 0);              // This is a different contract

    auto conn = pool.acquire();
    ASSERT_NE(conn, nullptr);               // Now the actual test
    ASSERT_EQ(pool.size(), 1);              // And another contract
}

When SessionManager::isInitialized() breaks, this test fails. So does every other test that repeats that sanity check. One broken behavior, forty red tests, twenty minutes of triage.

The naming constraint eliminates this by making the scope violation obvious before you even look at the test body.

Second consequence: severity as triage

The verb in the test name is not decoration. It carries graduated severity, borrowed from the same hierarchy as RFC 2119:

Verb	Severity	Meaning
`Must`	Hard contract	Violation means the component is broken. Ship-blocking.
`Should`	Expected behavior	Violation is a regression worth investigating.
`Shall`	Conditional requirement	Behavior expected under specific conditions.
`May`	Optional behavior	Behavioral change, not necessarily a defect.

When the test suite runs and failures appear, the verb tells you how to react before you open the test:

ConnectionPool
  ✓ MustReturnIdleConnectionBeforeAllocatingNew
  ✓ ShouldRespectMaxPoolSizeUnderLoad
  ✗ MayTimeoutIfAllConnectionsBusy

SessionManager
  ✗ MustInvalidateSessionOnTokenExpiry
  ✓ ShouldReuseExistingSessionForSameUser

The SessionManager failure is a Must — a hard contract is broken, this needs immediate attention. The ConnectionPool failure is a May — an optional behavior changed, worth investigating but not necessarily a defect.

The verb triages the failure before you read a single line of code.

Third consequence: generated documentation

Look at that test runner output again. Group it by component, indent the behaviors, and you have a requirements document — generated directly from your test names, readable by anyone on the team, including people who do not read C++.

ConnectionPool
  Must return idle connection before allocating new
  Should respect max pool size under load
  May timeout if all connections busy

SessionManager
  Must invalidate session on token expiry
  Should reuse existing session for same user

This is not a side effect. It is a design goal. When test names are written as behavioral contracts with severity verbs, the test runner output is the specification. It stays in sync with the code because it is the code. No separate requirements document to maintain, no wiki page that drifts out of date.

For teams where non-technical stakeholders need to understand what the system guarantees, this is a powerful tool. A product manager can read the test output and know exactly which contracts the system upholds — and at what severity level.

Across frameworks

The naming convention is a principle, not a macro trick. It adapts to whatever framework you use — but some frameworks make it easier than others.

Google Test

Google Test’s TEST(TestSuite, TestName) macro maps directly to this convention. The first argument is your component, the second is Verb + Behavior:

// Component: ConnectionPool
// Verb: Must
// Behavior: ReturnIdleConnectionBeforeAllocatingNew
TEST(ConnectionPool, MustReturnIdleConnectionBeforeAllocatingNew)

For parameterized tests, the pattern extends naturally:

class ConnectionPoolStress : public ::testing::TestWithParam<int> {};

TEST_P(ConnectionPoolStress, ShouldHandleConcurrentAcquireUpToMaxSize) {
    ConnectionPool pool(/*max_size=*/GetParam());
    // ...
}

INSTANTIATE_TEST_SUITE_P(
    PoolSizes,
    ConnectionPoolStress,
    ::testing::Values(1, 10, 100, 1000)
);

Google Test gives you the two-level structure — suite and test name — and the convention fills it with meaning.

doctest — the structure that should be the standard

doctest takes this further, and in my view its structural model is what every testing framework should aspire to.

TEST_SUITE groups by component. TEST_CASE names the behavior. SUBCASE shares setup without duplicating assertions:

TEST_SUITE("ConnectionPool") {
    TEST_CASE("Must return idle connection before allocating new") {
        ConnectionPool pool(/*max_size=*/5);
        auto conn = pool.acquire();
        pool.release(conn);

        auto reused = pool.acquire();
        CHECK(reused.id() == conn.id());
    }

    TEST_CASE("Should respect max pool size under load") {
        ConnectionPool pool(/*max_size=*/2);

        SUBCASE("when all connections are active") {
            auto c1 = pool.acquire();
            auto c2 = pool.acquire();
            CHECK(pool.active_count() == 2);
        }

        SUBCASE("when max is reached") {
            auto c1 = pool.acquire();
            auto c2 = pool.acquire();
            CHECK_THROWS(pool.acquire());
        }
    }

    TEST_CASE("May timeout if all connections busy") {
        // ...
    }
}

Notice what SUBCASE gives you: shared setup that runs fresh for each subcase — without fixtures, without inheritance, without a SetUp() method that lives in a different part of the file. The test case body above the subcases is the setup. Each subcase is a leaf scenario. doctest re-enters the test case from the top for each subcase, executing a different leaf on each pass — like a DFS traversal of a tree.

This maps perfectly to the naming discipline. The TEST_CASE names the contract. The SUBCASE names the condition. The test runner output reflects the hierarchy:

ConnectionPool
  Must return idle connection before allocating new      -- PASSED
  Should respect max pool size under load
    when all connections are active                      -- PASSED
    when max is reached                                  -- PASSED
  May timeout if all connections busy                    -- PASSED

That is a requirements tree with conditional branches — generated from your test structure, not from a separate document.

The free-form string names in doctest also mean you are not fighting CamelCase identifiers to express contract descriptions. The names read as natural language, which makes the generated documentation immediately useful to non-technical readers.

I want to be clear: this is not about the ability to embed tests in production source files — that is a separate doctest feature with its own tradeoffs. The structural model — suites as components, cases as contracts, subcases as conditions — is what I think should be the baseline every framework offers.

pytest

Python’s test_ prefix requirement means you cannot start a function name with a severity verb. But pytest’s class-based organization gives you the component grouping, and the test_ prefix becomes a fixed preamble before the convention kicks in:

class TestConnectionPool:
    def test_must_return_idle_connection_before_allocating_new(self, pool):
        conn = pool.acquire()
        pool.release(conn)

        reused = pool.acquire()
        assert reused.id == conn.id

    def test_should_respect_max_pool_size_under_load(self, pool):
        c1 = pool.acquire()
        c2 = pool.acquire()
        assert pool.active_count == 2

    def test_may_timeout_if_all_connections_busy(self, pool):
        # ...
        pass


class TestSessionManager:
    def test_must_invalidate_session_on_token_expiry(self, session_mgr):
        session_mgr.create_session(token="abc")
        session_mgr.expire_token("abc")
        assert not session_mgr.is_valid("abc")

    def test_should_reuse_existing_session_for_same_user(self, session_mgr):
        s1 = session_mgr.get_or_create("alice")
        s2 = session_mgr.get_or_create("alice")
        assert s1.id == s2.id

The test_ prefix is noise from a documentation perspective, but pytest’s output strips the class and method structure into something readable:

TestConnectionPool
  ✓ test_must_return_idle_connection_before_allocating_new
  ✓ test_should_respect_max_pool_size_under_load
  ✗ test_may_timeout_if_all_connections_busy

TestSessionManager
  ✗ test_must_invalidate_session_on_token_expiry
  ✓ test_should_reuse_existing_session_for_same_user

The severity verb is still there — must, should, may — right after the test_ prefix. A Must failure is still a broken contract. A May failure is still worth investigating but not necessarily a defect. The triage works the same way.

Fixtures handle the setup separation cleanly. A @pytest.fixture for pool or session_mgr provides the preconditions without polluting the test body with assertions about initialization. The principle holds: setup is constructed, not asserted.

One rule, three consequences

One constraint on the name — component, severity verb, behavior. Three architectural consequences:

The test body cannot overreach, because the name scopes what it is allowed to verify. Failure surfaces stop overlapping, because each behavior is validated in exactly one place. And the test runner generates a requirements tree for free, with severity verbs that triage failures before you open the code.

This is not about naming style. It is about using a name as a structural constraint that shapes how tests are written, how failures are diagnosed, and how the system’s contracts are communicated.

Look at a test you wrote last month. Does its name describe the one behavior it validates? Does its body touch anything beyond that contract? Rename it. Scope it. Run the suite. The next time something breaks, count how many tests go red — and ask yourself how many of them should have.

The naming rule#

The constraint: name scopes the body#

First consequence: isolated failure surfaces#

Second consequence: severity as triage#

Third consequence: generated documentation#

Across frameworks#

Google Test#

doctest — the structure that should be the standard#

pytest#

One rule, three consequences#