Test Data¶

How you manage test data has a direct impact on the reliability, maintainability, and speed of your test suite. CDS's approach is built around one core principle: push data complexity down the test pyramid.

The further down the pyramid you can validate a behaviour, the more control you have over the data — and the less the suite needs to rely on complex setup at higher layers.

Mocking and stubbing¶

At unit and integration level, mocking and stubbing are the preferred approach. Rather than relying on real data from a database or external service, mocks and stubs replace dependencies with controlled, predictable substitutes.

This makes tests faster, more reliable, and easier to reason about. A unit test that calls a real database is slower, more fragile, and harder to run in isolation. A unit test that uses a stub returns a predictable value in milliseconds.

The more scenarios that can be validated through mocking at lower pyramid layers, the less the suite needs to rely on complex data setup at end-to-end level. This is a deliberate design principle, not a convenience.

Language	Preferred tool	Alternative
JavaScript / TypeScript	Jest mocks (built-in)	Sinon.js
.NET	Moq	NSubstitute
Python	unittest.mock (built-in)	—
Java	Mockito	—

For API-level mocking — where a real external service is unavailable or unreliable in the test environment — WireMock is the recommended choice across languages. It allows HTTP services to be stubbed and verified, and is well-suited to integration and API testing where third-party dependencies cannot be controlled.

Fake data generation¶

Where test data needs to be created programmatically rather than mocked, we use language-appropriate Faker libraries. Faker generates realistic, varied data — names, addresses, emails, phone numbers — reducing the need for hardcoded values and making tests easier to maintain.

Language	Tool	Notes
JavaScript / TypeScript	Faker.js
.NET (C#)	Bogus
Python	Faker
Python	factory_boy	Approved alternative — useful for generating complex, related data objects rather than simple field values (e.g. a user with associated orders and addresses)

Note

Faker generates random data on each run. Always log or capture generated values when a test fails so the failure can be reproduced.

Data seeding¶

Data seeding — pre-populating a database or environment with a known set of data before tests run — should be treated as a last resort.

Seeding introduces environment dependency, makes tests harder to run in isolation, and creates maintenance overhead as the data model evolves. Before reaching for a seed script, the question to ask is: can this scenario be covered at a lower pyramid layer with mocks or generated data instead?

Where data seeding is genuinely necessary — for example, in end-to-end tests that require a realistic dataset in a test environment — seed scripts must be:

Version-controlled alongside the application code
Idempotent — safe to run multiple times without duplicating data
Minimal — scoped to the exact data the tests require, nothing more

Seed scripts that are not idempotent or that accumulate state over time become a source of test instability. They are a cost that must be actively managed, not ignored.