Building ESLint Rules to Prevent Tests That Lie
From Taxonomy to Tooling
Discovering that 60% of our agent-generated tests provided zero signal was sobering. Fixing them was labour-intensive but straightforward. The harder problem was preventing the patterns from returning — especially when AI agents write tests at high velocity across parallel workstreams.
We needed automated enforcement at two layers. Not all 8 false-confidence patterns (FC-A through FC-H) are detectable through static analysis — some require semantic reasoning about intent. But two of the most prevalent patterns, FC-A (multi-status acceptance) and FC-E (tautological/range assertions), have distinctive syntactic signatures that are visible in the AST. These are the pre-commit layer.
We built eslint-plugin-adt-testing, a three-rule ESLint plugin that runs at "error" severity on all test files. Here is how each rule works.
Rule 1: no-false-confidence-patterns
This is the workhorse rule, catching both FC-A and FC-E patterns across five distinct syntactic forms.
What It Catches
FC-A multi-status arrays:
// Flagged: FC-A multi-status array
expect([200, 404]).toContain(res.status)FC-A Array.includes:
// Flagged: FC-A includes pattern
[200, 404].includes(res.status)FC-A OR-chains:
// Flagged: FC-A OR-chain
if (res.status === 200 || res.status === 201) { /* ... */ }FC-E range checks in expect:
// Flagged: FC-E range assertion
expect(res.status).toBeLessThan(500)
expect(res.status).toBeGreaterThanOrEqual(200)FC-E raw binary comparisons:
// Flagged: FC-E binary comparison
if (res.status < 500) { /* ... */ }How It Works: Context-Aware Status Detection
The key design decision was making the rule context-aware. We do not flag every toBeLessThan() call — only those where the subject looks like an HTTP status and the threshold is a numeric literal between 100 and 599.
The looksLikeStatusCode() function walks the AST node recursively:
function looksLikeStatusCode(node) {
if (!node) return false
if (node.type === "MemberExpression") {
const prop = node.property.name || node.property.value
if (typeof prop === "string" && prop.toLowerCase().includes("status")) {
return true
}
return looksLikeStatusCode(node.object)
}
if (node.type === "Identifier") {
return node.name.toLowerCase().includes("status")
}
return false
}This catches res.status, response.statusCode, httpStatus, and any other identifier containing "status". It does not fire on expect(count).toBeLessThan(500) because count does not contain "status". Without this context-awareness, the rule would be too noisy to keep enabled.
The threshold check uses a simple range validation:
function isStatusLiteral(value) {
return typeof value === "number" && value >= 100 && value <= 599
}This means expect(res.status).toBeLessThan(1000) is not flagged (threshold outside HTTP range), while expect(res.status).toBeLessThan(500) is (threshold is a valid HTTP status code).
OR-Chain Flattening
The FC-A OR-chain detection required flattening nested logical expressions. JavaScript's AST represents a || b || c as nested binary LogicalExpression nodes:
LogicalExpression(||)
left: LogicalExpression(||)
left: a === 200
right: b === 201
right: c === 204
We flatten this into a list of leaf nodes and check that each is a status equality comparison against the same subject:
function flattenOr(node) {
if (node.type === "LogicalExpression" && node.operator === "||") {
return [...flattenOr(node.left), ...flattenOr(node.right)]
}
return [node]
}The rule only fires at the root of an OR chain (not intermediate nodes) to avoid duplicate reports. It also verifies that all comparisons reference the same status expression — res.status === 200 || res.status === 201 is flagged, but res.status === 200 || req.method === "POST" is not (different subjects).
Negation Handling
An interesting edge case: expect(res.status).not.toBeGreaterThanOrEqual(400). This is a negated range check that means "status is less than 400" — still a range check that passes for many codes. The rule detects .not. in the member expression chain and flags negated range assertions as well.
const isNegated =
node.callee.object.type === "MemberExpression" &&
node.callee.object.property.name === "not"Rule 2: no-conditional-status-expect
This rule catches a pattern that overlaps FC-A and FC-F: expect() calls guarded by an HTTP status condition.
// Flagged: conditional expect guarded by status check
if (res.status === 200) {
expect(res.body).toEqual({ id: 1 })
}The problem: when the endpoint returns 500 instead of 200, the if branch does not execute, the expect() never runs, and the test passes with zero assertions. This is FC-F (silent skip) triggered by a status-based condition.
AST Walking Strategy
The rule visits IfStatement nodes and performs two recursive searches:
-
Does the condition reference an HTTP status? The
conditionHasStatusCheck()function recursively walks the test expression, checking for status-like identifiers. -
Does the consequent or alternate contain an
expect()call? ThenodeContainsExpectCall()function walks the entire subtree of the if-body.
IfStatement(node) {
if (!conditionHasStatusCheck(node.test)) return
const consequentHasExpect = nodeContainsExpectCall(node.consequent)
const alternateHasExpect = node.alternate
? nodeContainsExpectCall(node.alternate)
: false
if (consequentHasExpect || alternateHasExpect) {
context.report({
node,
messageId: "conditionalStatusExpect",
})
}
}Split the conditional into separate tests — one for the happy path that seeds data producing 200, one for the 404 case that seeds conditions producing 404. Each test asserts exactly one expected status.
Rule 3: no-tautological-expect
The simplest and most surgical rule. It catches assertions where the subject and expected value are textually identical:
// Flagged: subject and argument are the same expression
expect(404).toBe(404)
expect(foo).toBe(foo)
expect(result.value).toEqual(result.value)Text Comparison Approach
We considered deep AST comparison but chose source text comparison instead. It is simpler, handles all expression types uniformly, and avoids false negatives:
const subjectText = sourceCode.getText(subject).trim()
const argumentText = sourceCode.getText(argument).trim()
if (subjectText === argumentText) {
context.report({
node,
messageId: "tautologicalExpect",
data: { text: subjectText, method },
})
}This does mean that expect(a).toBe(a) is flagged but expect(a).toBe(b) is not, even if a and b reference the same value at runtime. That is acceptable — the rule catches syntactic tautologies, not semantic ones.
Plugin Architecture
The three rules are packaged as eslint-plugin-adt-testing with a recommended flat config:
// eslint.config.js
import adtTestingPlugin from "eslint-plugin-adt-testing"
export default [
adtTestingPlugin.configs.recommended,
]The recommended config applies all three rules at "error" severity to test files:
files: [
"**/__tests__/**/*.{js,ts,jsx,tsx}",
"**/*.test.{js,ts,jsx,tsx}",
"**/*.spec.{js,ts,jsx,tsx}",
"**/*.e2e.{js,ts,jsx,tsx}",
"tests/**/*.{js,ts,jsx,tsx}",
]All three rules use CommonJS (.cjs) for maximum ESLint compatibility. The plugin has zero dependencies beyond ESLint itself.
CI Gate Integration
The rules run as part of our pre-commit parallel gate (alongside typecheck and unit tests). Because they are set to "error", a single FC-A or FC-E violation blocks the commit:
# Pre-commit hook runs these in parallel:
lint & # includes adt-testing rules
typecheck &
test &
waitThe error messages include the FC code for traceability:
FC-A: Multi-status array [200, 404] accepts failure as success.
Use an exact status assertion: expect(res.status).toBe(200).
FC-E: .toBeLessThan(500) is a range check — it passes for many status codes.
Use an exact assertion: expect(res.status).toBe(<code>).
What the Rules Cannot Catch — and How Agent Scaffolding Fills the Gap
Static analysis has limits. The three ESLint rules cover FC-A and FC-E at the pre-commit layer, but the remaining patterns require semantic understanding:
| Pattern | Why AST Cannot Catch It |
|---|---|
| FC-B (Shape-only) | typeof body === "object" is syntactically valid — the rule would need to know that body should have specific fields |
| FC-C (Mock-only) | Detecting that an assertion tests mock output requires understanding the mock setup and assertion relationship |
| FC-D (Route never reached) | Requires knowing which routes are mounted — a type-system question, not an AST question |
| FC-G (Graceful degrade) | Requires knowing whether a mock DB was injected — runtime context |
| FC-H (Wrong constant) | Requires understanding which security constant maps to which test — domain knowledge |
This is where our second enforcement layer comes in: agent scaffolding. We built structured testing rules and checklists that load into every AI agent's context window during development. When an agent writes or modifies a test, it has the full FC taxonomy in its working context and reasons about each pattern as it goes. The agent checks its own work against the FC-B through FC-H checklist before committing — catching semantic patterns that no linter can detect.
The combination is deliberate: ESLint rules (FC-A, FC-E) provide hard gates at commit time; agent scaffolding (FC-B through FC-H, COND) provides reasoning-based enforcement during development. Between the two layers, the full taxonomy is covered.
Lessons for Plugin Authors
Context-awareness prevents false positives
The looksLikeStatusCode() function is what makes no-false-confidence-patterns usable in practice. Without it, every toBeLessThan() in every test would flag — making the rule too noisy to keep enabled.
Source text comparison beats deep AST comparison
For no-tautological-expect, comparing sourceCode.getText() output is simpler and more robust than recursively comparing AST node structures.
Error messages should include the taxonomy code
FC-A: and FC-E: prefixes in error messages let developers look up the pattern in documentation immediately. The message becomes a teaching moment, not just a lint error.
CommonJS for maximum compatibility
ESLint plugins need to work across projects with different module systems. .cjs extension ensures the plugin loads correctly regardless of the consuming project's configuration.
Test the rules with fixtures
Each rule has a test file with valid and invalid code samples. Running the rules against known-good and known-bad patterns prevents the rules themselves from having false confidence — a meta-problem worth guarding against.
The plugin is open-source as part of our Agent Development Tools. If your team uses AI agents to write tests — or writes HTTP tests yourself and suspects your assertions are too loose — try dropping eslint-plugin-adt-testing into your config. The results may be uncomfortable, but that discomfort is the point — it means the rules are finding tests that were lying to you.
For the patterns ESLint cannot catch, consider building your own agent scaffolding: structured rules that load into your AI agents' context during development, giving them the vocabulary and checklists to reason about test quality as they work. Static analysis catches the syntactic patterns; agent scaffolding catches the semantic ones. Together, they close the gap.
Related Reading
- 60% of Our Tests Had Zero Signal: How We Discovered False Confidence covers the audit that produced the FC taxonomy in the first place.
- Why Agent-Native Teams Need Better Tests, Not More Tests explains why stricter evidence standards matter more as engineering throughput increases.
- Agent-Native Shift-Left CI for High-Velocity Solo Engineering shows how these linting rules fit into a broader local quality perimeter.