Test Data¶
How you manage test data has a direct impact on the reliability, maintainability, and speed of your test suite. CDS's approach is built around one core principle: push data complexity down the test pyramid.
The further down the pyramid you can validate a behaviour, the more control you have over the data — and the less the suite needs to rely on complex setup at higher layers.
Mocking and stubbing¶
At unit and integration level, mocking and stubbing are the preferred approach. Rather than relying on real data from a database or external service, mocks and stubs replace dependencies with controlled, predictable substitutes.
This makes tests faster, more reliable, and easier to reason about. A unit test that calls a real database is slower, more fragile, and harder to run in isolation. A unit test that uses a stub returns a predictable value in milliseconds.
The more scenarios that can be validated through mocking at lower pyramid layers, the less the suite needs to rely on complex data setup at end-to-end level. This is a deliberate design principle, not a convenience.
| Language | Preferred tool | Alternative |
|---|---|---|
| JavaScript / TypeScript | Jest mocks (built-in) | Sinon.js |
| .NET | Moq | NSubstitute |
| Python | unittest.mock (built-in) | — |
| Java | Mockito | — |
For API-level mocking — where a real external service is unavailable or unreliable in the test environment — WireMock is the recommended choice across languages. It allows HTTP services to be stubbed and verified, and is well-suited to integration and API testing where third-party dependencies cannot be controlled.
Fake data generation¶
Where test data needs to be created programmatically rather than mocked, we use language-appropriate Faker libraries. Faker generates realistic, varied data — names, addresses, emails, phone numbers — reducing the need for hardcoded values and making tests easier to maintain.
| Language | Tool | Notes |
|---|---|---|
| JavaScript / TypeScript | Faker.js | |
| .NET (C#) | Bogus | |
| Python | Faker | |
| Python | factory_boy | Approved alternative — useful for generating complex, related data objects rather than simple field values (e.g. a user with associated orders and addresses) |
Note
Faker generates random data on each run. Always log or capture generated values when a test fails so the failure can be reproduced.
Data seeding¶
Data seeding — pre-populating a database or environment with a known set of data before tests run — should be treated as a last resort.
Seeding introduces environment dependency, makes tests harder to run in isolation, and creates maintenance overhead as the data model evolves. Before reaching for a seed script, the question to ask is: can this scenario be covered at a lower pyramid layer with mocks or generated data instead?
Where data seeding is genuinely necessary — for example, in end-to-end tests that require a realistic dataset in a test environment — seed scripts must be:
- Version-controlled alongside the application code
- Idempotent — safe to run multiple times without duplicating data
- Minimal — scoped to the exact data the tests require, nothing more
Seed scripts that are not idempotent or that accumulate state over time become a source of test instability. They are a cost that must be actively managed, not ignored.