
End-to-End API Testing: The Complete Guide
Testing a single endpoint tells you that one endpoint works. It does not tell you that your login flow issues a token, your create endpoint accepts that token, and your read endpoint returns the thing you just created. That is the gap end-to-end API testing fills.
This guide covers the full picture: what end-to-end API testing is, why isolated tests miss real bugs, how to structure multi-step test flows, and how to run them in CI without flakiness.
What is end-to-end API testing?
End-to-end API testing validates a complete workflow by executing a sequence of dependent API calls, where the output of one request feeds into the next. The goal is to verify that the system works as a whole, not just one piece at a time.
A typical end-to-end API test looks like this:
- Authenticate (POST
/auth/login) and capture the token - Create a resource (POST
/items) using the token, capture the ID - Read the resource (GET
/items/:id) and assert the body matches - Update the resource (PUT
/items/:id) and verify changes - Delete the resource (DELETE
/items/:id) - Confirm deletion (GET
/items/:id) and assert 404
Every step depends on data from a previous step: the token, the ID, the state of the resource. This chain of dependencies is exactly what isolated endpoint tests skip.
How it differs from other API test types
| Test type | Scope | Dependencies | When to run |
|---|---|---|---|
| Unit / contract | Single endpoint shape | None | Every commit |
| Integration | Two services talking | Mocked or real, one hop | Every commit |
| End-to-end | Full multi-step workflow | Real services, real data flow | PRs and merge to main |
End-to-end tests are slower and more complex than unit tests. That is the point. They catch the bugs that only appear when real services interact across a real workflow.
Why isolated endpoint tests miss real bugs
An API can pass every single-endpoint test and still break in production. Here are the failure modes that only surface when you test the full chain.
Stale or invalid tokens
Your /login endpoint returns a valid token. Your /items endpoint accepts any well-formed token. Both pass individual tests. But in production, the token format changed last sprint and the items service rejects it. An end-to-end test that chains login into items catches this immediately.
Resource lifecycle bugs
Your create endpoint returns 201. Your read endpoint returns 200 for a known ID. Both pass. But the create endpoint writes to a cache that the read endpoint does not check, so freshly created resources return 404 for 30 seconds. Only the chained create-then-read test reveals this.
Side effects across services
Your payment endpoint charges the card and returns 200. Your order endpoint marks orders as paid. Both pass individually. But the payment service emits an event that the order service mishandles, leaving orders in a stuck state. The end-to-end flow (create order, pay, verify order status) is the only test that catches this.
State assumptions
Endpoints that assume prior state, such as "user has completed onboarding" or "cart is non-empty", work fine when you set up the state manually in each test. End-to-end tests force you to create that state through the actual API, which is how real users do it.
Anatomy of a real end-to-end API test
Let us walk through a concrete example: testing a task management API.
The workflow
Login → Create Project → Create Task → Assign Task → Complete Task → Verify Project Stats
Six requests. Each one depends on data from a previous step.
The YAML flow
workspace_name: Task Management E2E
run:
- flow: TaskLifecycle
flows:
- name: TaskLifecycle
variables:
- name: run_id
value: 'ci-001'
steps:
- request:
name: Login
method: POST
url: {{BASE_URL}}/auth/login
headers:
Content-Type: application/json
body:
email: 'test@example.com'
password: 'password123'
- js:
name: Auth
code: |
export default function(ctx) {
if (ctx.Login?.response?.status !== 200) throw new Error("Login failed");
return {
token: ctx.Login.response.body.access_token,
user_id: ctx.Login.response.body.user.id
};
}
depends_on: Login
- request:
name: CreateProject
method: POST
url: {{BASE_URL}}/api/projects
headers:
Authorization: Bearer {{Auth.token}}
Content-Type: application/json
body:
name: 'CI Test Project {{run_id}}'
depends_on: Auth
- js:
name: Project
code: |
export default function(ctx) {
if (ctx.CreateProject?.response?.status !== 201) throw new Error("Create project failed");
return { id: ctx.CreateProject.response.body.id };
}
depends_on: CreateProject
- request:
name: CreateTask
method: POST
url: {{BASE_URL}}/api/projects/{{Project.id}}/tasks
headers:
Authorization: Bearer {{Auth.token}}
Content-Type: application/json
body:
title: 'Verify deployment'
priority: 'high'
depends_on: Project
- js:
name: Task
code: |
export default function(ctx) {
if (ctx.CreateTask?.response?.status !== 201) throw new Error("Create task failed");
return { id: ctx.CreateTask.response.body.id };
}
depends_on: CreateTask
- request:
name: AssignTask
method: PUT
url: {{BASE_URL}}/api/tasks/{{Task.id}}/assign
headers:
Authorization: Bearer {{Auth.token}}
Content-Type: application/json
body:
assignee_id: '{{Auth.user_id}}'
depends_on: Task
- js:
name: VerifyAssignment
code: |
export default function(ctx) {
const body = ctx.AssignTask?.response?.body;
if (body?.assignee?.id !== ctx.Auth?.user_id) throw new Error("Assignee mismatch");
return { verified: true };
}
depends_on: AssignTask
- request:
name: CompleteTask
method: PUT
url: {{BASE_URL}}/api/tasks/{{Task.id}}/status
headers:
Authorization: Bearer {{Auth.token}}
Content-Type: application/json
body:
status: 'completed'
depends_on: VerifyAssignment
- request:
name: VerifyProjectStats
method: GET
url: {{BASE_URL}}/api/projects/{{Project.id}}/stats
headers:
Authorization: Bearer {{Auth.token}}
depends_on: CompleteTask
- js:
name: ValidateStats
code: |
export default function(ctx) {
const stats = ctx.VerifyProjectStats?.response?.body;
if (stats?.total_tasks !== 1) throw new Error("Expected 1 total task");
if (stats?.completed_tasks !== 1) throw new Error("Expected 1 completed task");
return { passed: true };
}
depends_on: VerifyProjectStats
Every JS node serves a dual purpose: it extracts values for downstream steps (referenced as {{Auth.token}}, {{Project.id}}, {{Task.id}}) and validates the response at that point in the workflow. If any step throws, the flow stops with a clear error message.
What to extract vs. what to validate
Extract (via JS nodes) the values you need downstream: tokens, IDs, cursors. Keep extraction minimal. If you extract 20 values but only use 3, the flow is harder to read.
Validate the contract at each step, not the full response body. Status codes, the presence of key fields, and business invariants (assignee matches, counts are correct). Avoid asserting on timestamps, generated IDs, or other non-deterministic values.
Variable passing between steps
Variable passing is the core mechanic of end-to-end API testing. Every chain depends on it, and most flakiness comes from getting it wrong.
Common variable types
| Variable | Source | Used by | Pitfall |
|---|---|---|---|
| Auth token | Login response body | Every subsequent request header | Token expiration during long test suites |
| Resource ID | Create response body or Location header | Read, update, delete requests | Nested IDs vs. top-level IDs |
| Pagination cursor | List response body | Next page request query param | Cursor depends on sort order |
| CSRF token | Bootstrap response header or body | Mutation request header | Token rotates per session |
| Upload URL | Presigned URL response | File upload request | URL expires quickly |
How passing works in DevTools YAML flows
In tools like Postman, variable passing happens through pm.environment.set() in pre-request scripts, which is a side effect hidden inside a JavaScript tab. In DevTools YAML flows, the pattern is explicit: a JS node extracts values and subsequent steps reference them:
# JS node extracts the token from the login response
- js:
name: Auth
code: |
export default function(ctx) {
return { token: ctx.Login?.response?.body?.access_token };
}
depends_on: Login
# Subsequent request references the extracted value
- request:
name: GetProfile
method: GET
url: {{BASE_URL}}/api/me
headers:
Authorization: Bearer {{Auth.token}}
depends_on: Auth
The extraction (JS node) and consumption ({{Auth.token}} in the header) are both visible in the YAML. A reviewer can trace the data flow without opening a UI.
Auto-mapping with Flows
DevTools Flows auto-detects when a value from one response appears in a subsequent request. When you import a HAR or build a flow visually, the variable references are created automatically using the {{ NodeName.response.body.field }} syntax. You review and adjust them rather than wiring everything by hand.
This matters most for complex workflows with 10+ steps, where manual wiring is error-prone and time-consuming.
Validation in end-to-end flows
End-to-end tests need validation at multiple points, not just at the end.
Validate at every step
Do not wait until the last step to check if things worked. If step 3 of 6 silently returns an error, steps 4-6 will fail with confusing messages. Use JS nodes or if conditions to validate the basics at every step:
- js:
name: ValidateCreate
code: |
export default function(ctx) {
const resp = ctx.CreateTask?.response;
if (resp?.status !== 201) throw new Error("Expected 201, got " + resp?.status);
if (!resp?.body?.id) throw new Error("Missing task ID in response");
if (resp?.body?.title !== "Verify deployment") throw new Error("Title mismatch");
return { task_id: resp.body.id };
}
depends_on: CreateTask
Or use an if node for simple status checks:
- if:
name: CheckCreate
condition: CreateTask.response.status == 201
then: AssignTask
else: HandleError
depends_on: CreateTask
Contract validation vs. snapshot validation
Contract validation checks structure and invariants: status code, field presence, types, business rules. It survives minor API changes (new optional fields, reordered keys).
Snapshot validation checks the exact response body. It breaks on every change, which is useful for detecting unintended regressions but noisy for evolving APIs.
For end-to-end tests, prefer contract validation. You want to catch broken workflows, not fail because someone added an optional updated_at field.
For a deeper treatment of assertion strategies, see: JSON Assertion Patterns for API Tests: A Practical Guide (with YAML Examples).
Timing validation
End-to-end flows are a natural place to add performance budgets via JS nodes:
- js:
name: CheckSearchTiming
code: |
export default function(ctx) {
const elapsed = ctx.Search?.response?.elapsed_ms;
if (elapsed > 500) throw new Error("Search took " + elapsed + "ms, budget is 500ms");
return { elapsed };
}
depends_on: Search
If the search step takes longer than 500ms, the flow fails. This catches performance regressions that single-endpoint tests might miss because the endpoint is fast in isolation but slow when preceded by auth and data setup.
Running end-to-end API tests in CI
End-to-end tests belong in CI, but they need more care than unit tests.
When to run them
- On every PR: Run the core end-to-end flows. Catch regressions before merge.
- On merge to main: Run the full suite. Gate deployments on passing results.
- On schedule (optional): Run against staging or production to catch environment drift.
Do not run end-to-end tests on every commit to a feature branch. They are slower than unit tests and you do not want them blocking rapid iteration.
Environment setup
End-to-end tests need a real (or realistic) backend. Options:
- Dedicated test environment: A staging instance that resets nightly. Simplest option.
- Ephemeral environment: Spun up per PR via Docker Compose or similar. Most isolated.
- Shared staging: Works if tests use unique data (run IDs, timestamps) to avoid conflicts.
Inject environment-specific values through environment references in your flow variables. Use {{#env:VAR_NAME}} to read from OS environment variables at runtime:
flows:
- name: MyFlow
variables:
- name: api_key
value: '{{#env:SECRET_API_KEY}}'
- name: run_id
value: 'ci-20260214'
- name: test_email
value: 'test@example.com'
GitHub Actions example
name: End-to-End API Tests
on:
pull_request:
branches: [main]
jobs:
e2e-api-tests:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Install DevTools CLI
run: curl -fsSL https://dev.tools/install.sh | sh
- name: Run end-to-end flows
run: devtools flow run tests/e2e/*.yaml --report junit
- name: Upload test results
if: always()
uses: actions/upload-artifact@v4
with:
name: e2e-results
path: reports/
For the full CI setup including parallel execution, caching, and artifact management, see: GitHub Actions for YAML API Tests: Parallel Runs + Caching.
Dealing with flakiness
End-to-end tests are more flaky than unit tests because they touch real services. Common causes and fixes:
| Cause | Fix |
|---|---|
| Token expiration during long suites | Refresh tokens between flows, or use short-lived test credentials |
| Shared data conflicts | Use unique identifiers per run (RUN_ID in resource names) |
| Race conditions | Add explicit waits or polling steps for async operations |
| Environment drift | Pin dependency versions, use consistent seed data |
| Network timeouts | Set reasonable timeouts per step, not just globally |
The goal is not zero flakiness. The goal is that every failure maps to a real bug or a fixable environmental issue.
Building end-to-end tests with Flows
Flows are visual, graph-based API test workflows. Instead of writing YAML by hand or scripting pre-request hooks, you build the workflow visually and let the tool handle variable wiring.
Why graphs work for end-to-end testing
A multi-step API test is a directed acyclic graph (DAG): some steps depend on others, some can run in parallel, and data flows from producers to consumers. A list of requests with scripts is an approximation. A graph is the actual structure.
In DevTools Flows:
- Nodes are steps: HTTP requests, conditions, loops, JavaScript transforms
- Edges are dependencies: explicit connections showing execution order via
depends_on - Variables are auto-mapped: when you reference
{{ Login.response.body.token }}in a request header, the dependency edge is created automatically
What this looks like in practice
- Import a HAR from a real browser session, or build from scratch
- The flow graph appears with requests as nodes and dependencies as edges
- Auto-mapped variables show which values flow between requests
- Add validation via JS nodes or
ifconditions at each step - Add conditions or loops for retry logic, pagination, or error handling
- Export to YAML for Git review and CI execution
- Run locally or in CI with the DevTools CLI
The visual graph and the YAML file are two views of the same test. Edit either one.
Parallel execution
Flows analyze the dependency graph and automatically parallelize independent branches. If your auth flow produces a token, and then you need to test both the "create project" and "list templates" endpoints (which do not depend on each other), Flows runs them concurrently:
Login
├── Create Project (parallel)
└── List Templates (parallel)
└── Verify Project Stats (waits for Create Project)
This cuts execution time without manual parallelization logic.
Conditions and loops
Real workflows are not always linear. DevTools YAML flows support these as first-class step types:
- Conditions (
if): If the create endpoint returns 409 (already exists), skip to the read step instead of failing - Loops (
for): Repeat a step N times, such as polling a status endpoint until the job completes - For-each (
for_each): Iterate over a list of items and validate each one
- if:
name: CheckExists
condition: CreateItem.response.status == 409
then: GetItem
else: HandleNew
depends_on: CreateItem
- for:
name: PollStatus
iter_count: 10
loop: CheckStatus
depends_on: StartJob
- for_each:
name: ValidateItems
items: '[1, 2, 3, 4, 5]'
loop: FetchItem
depends_on: ListItems
These are first-class node types in Flows, not scripted workarounds.
Common pitfalls in end-to-end API testing
Testing too much in one flow
A 30-step flow that covers login, CRUD, search, admin operations, and cleanup is hard to debug when it fails. Split workflows by business domain and use run with depends_on to orchestrate them:
run:
- flow: AuthFlow
- flow: CrudFlow
depends_on: AuthFlow
- flow: SearchFlow
depends_on: CrudFlow
Each flow should be independently runnable and independently debuggable.
Hardcoding test data
Hardcoded IDs, emails, and names will collide when tests run in parallel or on shared environments. Use:
- Run-specific identifiers:
'CI Test {{run_id}}' - Environment variables for credentials
- Setup steps that create fresh data and teardown steps that clean up
Skipping cleanup
If your test creates resources and does not delete them, the test environment accumulates garbage. Eventually, list endpoints slow down, unique constraints fail, and tests break for reasons unrelated to code changes.
Add teardown steps. Include delete requests at the end of each flow that remove the resources created during the test.
Validating non-deterministic values
Timestamps, auto-incremented IDs, and random UUIDs will differ on every run. Validate in JS nodes using:
- Presence and type:
if (!body?.created_at) throw new Error("missing created_at") - Pattern matching:
if (!/^[0-9a-f-]{36}$/.test(body?.id)) throw new Error("invalid UUID") - Relative values: compare
updated_atis aftercreated_at
Not exact values.
For detailed patterns, see: Deterministic API Assertions: Stop Flaky JSON Tests in CI.
FAQ
How many end-to-end API tests should I have? Fewer than you think. Cover the critical user journeys: authentication, the core CRUD workflow, payment (if applicable), and the most common multi-step paths. Five to ten well-designed e2e flows catch more bugs than fifty isolated endpoint tests.
Should end-to-end tests hit real services or mocks? Real services. The whole point is testing the integration. If you mock the dependencies, you are writing integration tests, not end-to-end tests. Use a dedicated test environment instead.
How do I handle async operations (webhooks, queues)? Add polling steps using a for loop. Make a request, then use a for node to poll a status endpoint with a timeout. This is more reliable than fixed-duration sleeps and more realistic than skipping the async part entirely.
What if my API changes frequently? Validate contracts (status codes, field presence, types) in JS nodes instead of exact body snapshots. End-to-end tests should verify behavior, not exact payloads. If a new optional field appears, your tests should not break.
Can I generate end-to-end tests from browser traffic? Yes. Record a browser session, export the HAR, import it into DevTools, and the auto-generated flow gives you a starting point. You then refine the variable mapping and add validation.
Start testing workflows, not just endpoints
If your API tests pass but your users still hit bugs, the gap is almost always in the spaces between endpoints: the token that expires, the ID that does not carry over, the side effect that never fires.
End-to-end API testing closes that gap. Chain requests, pass real data between steps, validate at every point, and run the whole workflow in CI.
DevTools Flows makes this visual: build the graph, auto-map the variables, export to YAML, run in CI. Try it at dev.tools.