You have a test suite. One feature breaks. Forty tests go red.
Not because forty things are wrong — because forty tests each validated the same contract they do not own.
This happens more often than most teams admit. A session management change breaks connection pool tests. A database schema migration fails authentication tests. Every test that ever touched a shared precondition lights up, and you spend the first twenty minutes of debugging just figuring out what actually broke.
The problem is not fragile code. The problem is tests that reach beyond their scope.
There is a naming convention that prevents this. Not by discipline — by structure.
The naming rule
Every test name encodes three things: the component under test, a severity verb, and the specific behavior being validated.
TEST(ConnectionPool, MustReturnIdleConnectionBeforeAllocatingNew)
TEST(ConnectionPool, ShouldRespectMaxPoolSizeUnderLoad)
TEST(ConnectionPool, MayTimeoutIfAllConnectionsBusy)
TEST(SessionManager, MustInvalidateSessionOnTokenExpiry)
TEST(SessionManager, ShouldReuseExistingSessionForSameUser)
The first argument is the component. The second argument starts with a verb — Must, Should, Shall, or May — followed by the behavior.
This looks like a formatting preference. It is not. It is an architectural constraint.
The constraint: name scopes the body
The rule is simple: if a test is named ConnectionPool_MustReturnIdleConnection, the test body validates that contract and nothing else.
Everything needed to bring the system to the right state for the test is setup — constructed, configured, initialized. But it is not asserted. The only assertions in the body are about the behavior named in the test.
Here is what that looks like in practice:
TEST(ConnectionPool, MustReturnIdleConnectionBeforeAllocatingNew) {
// -- Setup: bring the system to the state we need --
ConnectionPool pool(/*max_size=*/5);
auto conn = pool.acquire();
pool.release(conn);
// -- Act --
auto reused = pool.acquire();
// -- Assert: ONLY the named behavior --
EXPECT_EQ(reused.id(), conn.id());
}
Notice what this test does not do. It does not assert that pool.size() is correct after setup. It does not check that release() returned successfully. It does not verify that the pool was initialized with the right capacity. Those are contracts that belong to other tests — MustInitializeWithConfiguredCapacity, MustAcceptReleasedConnection. They have their own names and their own bodies.
The name makes overreach visible. If you find yourself writing an assertion that does not match the test name, you are either testing the wrong thing or the test needs to be split.
First consequence: isolated failure surfaces
When each test validates exactly one contract, failures stop cascading.
If SessionManager breaks, only SessionManager tests fail. ConnectionPool tests stay green — because they set up session state as a precondition but never assert on it. The broken contract lights up exactly the tests that name it, and nothing else.
Compare this to a test suite where every test starts with a block of “sanity check” assertions:
TEST(ConnectionPool, ReturnsConnection) {
SessionManager sessions;
ASSERT_TRUE(sessions.isInitialized()); // Why is this here?
ConnectionPool pool(sessions);
ASSERT_EQ(pool.size(), 0); // This is a different contract
auto conn = pool.acquire();
ASSERT_NE(conn, nullptr); // Now the actual test
ASSERT_EQ(pool.size(), 1); // And another contract
}
When SessionManager::isInitialized() breaks, this test fails. So does every other test that repeats that sanity check. One broken behavior, forty red tests, twenty minutes of triage.
The naming constraint eliminates this by making the scope violation obvious before you even look at the test body.
Second consequence: severity as triage
The verb in the test name is not decoration. It carries graduated severity, borrowed from the same hierarchy as RFC 2119:
| Verb | Severity | Meaning |
|---|---|---|
Must |
Hard contract | Violation means the component is broken. Ship-blocking. |
Should |
Expected behavior | Violation is a regression worth investigating. |
Shall |
Conditional requirement | Behavior expected under specific conditions. |
May |
Optional behavior | Behavioral change, not necessarily a defect. |
When the test suite runs and failures appear, the verb tells you how to react before you open the test:
ConnectionPool
✓ MustReturnIdleConnectionBeforeAllocatingNew
✓ ShouldRespectMaxPoolSizeUnderLoad
✗ MayTimeoutIfAllConnectionsBusy
SessionManager
✗ MustInvalidateSessionOnTokenExpiry
✓ ShouldReuseExistingSessionForSameUser
The SessionManager failure is a Must — a hard contract is broken, this needs immediate attention. The ConnectionPool failure is a May — an optional behavior changed, worth investigating but not necessarily a defect.
The verb triages the failure before you read a single line of code.
Third consequence: generated documentation
Look at that test runner output again. Group it by component, indent the behaviors, and you have a requirements document — generated directly from your test names, readable by anyone on the team, including people who do not read C++.
ConnectionPool
Must return idle connection before allocating new
Should respect max pool size under load
May timeout if all connections busy
SessionManager
Must invalidate session on token expiry
Should reuse existing session for same user
This is not a side effect. It is a design goal. When test names are written as behavioral contracts with severity verbs, the test runner output is the specification. It stays in sync with the code because it is the code. No separate requirements document to maintain, no wiki page that drifts out of date.
For teams where non-technical stakeholders need to understand what the system guarantees, this is a powerful tool. A product manager can read the test output and know exactly which contracts the system upholds — and at what severity level.
Across frameworks
The naming convention is a principle, not a macro trick. It adapts to whatever framework you use — but some frameworks make it easier than others.
Google Test
Google Test’s TEST(TestSuite, TestName) macro maps directly to this convention. The first argument is your component, the second is Verb + Behavior:
// Component: ConnectionPool
// Verb: Must
// Behavior: ReturnIdleConnectionBeforeAllocatingNew
TEST(ConnectionPool, MustReturnIdleConnectionBeforeAllocatingNew)
For parameterized tests, the pattern extends naturally:
class ConnectionPoolStress : public ::testing::TestWithParam<int> {};
TEST_P(ConnectionPoolStress, ShouldHandleConcurrentAcquireUpToMaxSize) {
ConnectionPool pool(/*max_size=*/GetParam());
// ...
}
INSTANTIATE_TEST_SUITE_P(
PoolSizes,
ConnectionPoolStress,
::testing::Values(1, 10, 100, 1000)
);
Google Test gives you the two-level structure — suite and test name — and the convention fills it with meaning.
doctest — the structure that should be the standard
doctest takes this further, and in my view its structural model is what every testing framework should aspire to.
TEST_SUITE groups by component. TEST_CASE names the behavior. SUBCASE shares setup without duplicating assertions:
TEST_SUITE("ConnectionPool") {
TEST_CASE("Must return idle connection before allocating new") {
ConnectionPool pool(/*max_size=*/5);
auto conn = pool.acquire();
pool.release(conn);
auto reused = pool.acquire();
CHECK(reused.id() == conn.id());
}
TEST_CASE("Should respect max pool size under load") {
ConnectionPool pool(/*max_size=*/2);
SUBCASE("when all connections are active") {
auto c1 = pool.acquire();
auto c2 = pool.acquire();
CHECK(pool.active_count() == 2);
}
SUBCASE("when max is reached") {
auto c1 = pool.acquire();
auto c2 = pool.acquire();
CHECK_THROWS(pool.acquire());
}
}
TEST_CASE("May timeout if all connections busy") {
// ...
}
}
Notice what SUBCASE gives you: shared setup that runs fresh for each subcase — without fixtures, without inheritance, without a SetUp() method that lives in a different part of the file. The test case body above the subcases is the setup. Each subcase is a leaf scenario. doctest re-enters the test case from the top for each subcase, executing a different leaf on each pass — like a DFS traversal of a tree.
This maps perfectly to the naming discipline. The TEST_CASE names the contract. The SUBCASE names the condition. The test runner output reflects the hierarchy:
ConnectionPool
Must return idle connection before allocating new -- PASSED
Should respect max pool size under load
when all connections are active -- PASSED
when max is reached -- PASSED
May timeout if all connections busy -- PASSED
That is a requirements tree with conditional branches — generated from your test structure, not from a separate document.
The free-form string names in doctest also mean you are not fighting CamelCase identifiers to express contract descriptions. The names read as natural language, which makes the generated documentation immediately useful to non-technical readers.
I want to be clear: this is not about the ability to embed tests in production source files — that is a separate doctest feature with its own tradeoffs. The structural model — suites as components, cases as contracts, subcases as conditions — is what I think should be the baseline every framework offers.
pytest
Python’s test_ prefix requirement means you cannot start a function name with a severity verb. But pytest’s class-based organization gives you the component grouping, and the test_ prefix becomes a fixed preamble before the convention kicks in:
class TestConnectionPool:
def test_must_return_idle_connection_before_allocating_new(self, pool):
conn = pool.acquire()
pool.release(conn)
reused = pool.acquire()
assert reused.id == conn.id
def test_should_respect_max_pool_size_under_load(self, pool):
c1 = pool.acquire()
c2 = pool.acquire()
assert pool.active_count == 2
def test_may_timeout_if_all_connections_busy(self, pool):
# ...
pass
class TestSessionManager:
def test_must_invalidate_session_on_token_expiry(self, session_mgr):
session_mgr.create_session(token="abc")
session_mgr.expire_token("abc")
assert not session_mgr.is_valid("abc")
def test_should_reuse_existing_session_for_same_user(self, session_mgr):
s1 = session_mgr.get_or_create("alice")
s2 = session_mgr.get_or_create("alice")
assert s1.id == s2.id
The test_ prefix is noise from a documentation perspective, but pytest’s output strips the class and method structure into something readable:
TestConnectionPool
✓ test_must_return_idle_connection_before_allocating_new
✓ test_should_respect_max_pool_size_under_load
✗ test_may_timeout_if_all_connections_busy
TestSessionManager
✗ test_must_invalidate_session_on_token_expiry
✓ test_should_reuse_existing_session_for_same_user
The severity verb is still there — must, should, may — right after the test_ prefix. A Must failure is still a broken contract. A May failure is still worth investigating but not necessarily a defect. The triage works the same way.
Fixtures handle the setup separation cleanly. A @pytest.fixture for pool or session_mgr provides the preconditions without polluting the test body with assertions about initialization. The principle holds: setup is constructed, not asserted.
One rule, three consequences
One constraint on the name — component, severity verb, behavior. Three architectural consequences:
The test body cannot overreach, because the name scopes what it is allowed to verify. Failure surfaces stop overlapping, because each behavior is validated in exactly one place. And the test runner generates a requirements tree for free, with severity verbs that triage failures before you open the code.
This is not about naming style. It is about using a name as a structural constraint that shapes how tests are written, how failures are diagnosed, and how the system’s contracts are communicated.
Look at a test you wrote last month. Does its name describe the one behavior it validates? Does its body touch anything beyond that contract? Rename it. Scope it. Run the suite. The next time something breaks, count how many tests go red — and ask yourself how many of them should have.