DevTools
Back to Blog
Test File Downloads in API Flows: Content-Length, Checksums, Headers

Test File Downloads in API Flows: Content-Length, Checksums, Headers

DevTools TeamDevTools Team

When an API endpoint returns JSON, tests tend to be straightforward: assert status, validate schema, maybe assert a couple values.

File downloads are different. A “200 OK” can still be a broken artifact, truncated mid-stream, cached incorrectly by a CDN, or even an HTML login page served as a PDF.

If your product exports invoices, build artifacts, or training materials (for example, PDFs generated by platforms like Scenario IQ), your API tests should validate more than “it responded”. In practice, reliable download tests come down to three things:

  • Size (Content-Length, or an explicit size you can compare against)
  • Integrity (checksums, digest headers, ETags, or metadata-derived hashes)
  • Semantics (Content-Type, Content-Disposition, cache headers, range support)

This guide focuses on making those checks deterministic in YAML-based API flows, so they are reviewable in Git and stable in CI.

Failure modes you only catch with download-specific checks

A few real-world breakages that look fine if you only assert status: 200:

  • Truncation due to proxy timeouts or interrupted streams (client gets a partial body).
  • Silent content transformation (gzip at one hop, identity at another, wrong bytes, same filename).
  • Wrong representation (HTML error page returned with 200, or Content-Type: text/html instead of application/pdf).
  • CDN cache confusion (Vary missing, private data cached, stale versions returned).
  • Range bugs (resumable downloads broken, Accept-Ranges absent or ignored).

A good download test makes these failures obvious with header and checksum assertions, rather than by diffing a binary blob in Git.

The “minimum viable” download contract

At a minimum, treat a download endpoint like a contract over headers.

Here’s a practical checklist with the most useful headers to validate.

What you are testingHeader or signalWhy it mattersNotes for determinism
Correct media typeContent-TypePrevents “HTML login page as PDF” failuresAssert exact value (or a controlled allowlist)
Correct filename behaviorContent-DispositionEnsures correct download name and attachment vs inlineOften includes quotes and encoding, assert carefully
Not truncatedContent-LengthDetects partial responses when length is knownNot present for chunked transfer; see below
Cache safetyCache-Control, VaryAvoids sensitive file caching and cross-user leaksAssert policy consistent with endpoint
Resumable downloadsAccept-RangesClients can resume, CI can retry without re-downloadingUsually bytes if supported
Compression/identityContent-EncodingVerifies byte representation you are actually hashingYou can force identity via request headers
Content identityETag or digest headersDetects unexpected content changesETag semantics vary, digest headers are clearer

You do not need all of these for every endpoint, but you should decide and encode the contract explicitly.

Content-Length: useful, but not always present

When Content-Length is reliable

Content-Length is great when your server knows the size upfront (static object storage, generated file with known byte length). If a proxy truncates the body, Content-Length often still reflects the expected size, making mismatches detectable.

In YAML flows, prefer asserting it equals an expected value rather than “exists”, because existence alone does not catch regressions.

When Content-Length disappears

You may not get a Content-Length when:

  • The response uses chunked transfer encoding.
  • The server is streaming a generated file.
  • A proxy modifies encoding.

If your endpoint is intentionally streaming, size validation should come from a different signal (checksum/digest, or metadata that includes the expected byte length).

Checksums: what to compare, and what not to assume

Prefer explicit digest headers when you control the API

If you own the API, the most testable approach is to publish an explicit integrity signal:

  • A digest header (for example, SHA-256) that represents the bytes sent.
  • Or a metadata endpoint that returns sha256 and bytes for the object.

Avoid relying on “ETag equals MD5” as a rule. Depending on storage, CDNs, and multi-part uploads, an ETag can be something else entirely.

If you do not control the API

If the API already returns something checksum-like:

  • Treat ETag as a version identifier unless the API documents it as a content hash.
  • If a Digest style header is present, prefer it.
  • If neither exists, consider adding a separate metadata call in your flow that returns a documented checksum.

The key for CI stability is: compare the download response to a stable, server-provided expected value, rather than computing hashes in an ad hoc way across different clients and proxy behaviors.

A deterministic YAML flow pattern for downloads (with request chaining)

The most robust pattern is a two-step contract:

  1. Create or locate the artifact, and capture its expected bytes and sha256 (or a version token).
  2. Download it, and assert headers match the captured expectations.

Below is an illustrative YAML flow structure showing request chaining. Adjust field names to your API and runner conventions, the important part is the pattern: capture expected values, then assert them.

id: export-report-and-download
vars:
  baseUrl: ${ENV.BASE_URL}
  token: ${ENV.API_TOKEN}

steps:
  - id: createExport
    request:
      method: POST
      url: ${vars.baseUrl}/v1/reports/exports
      headers:
        authorization: Bearer ${vars.token}
        content-type: application/json
      body:
        format: pdf
        reportId: 12345
    assert:
      status: 202
    capture:
      exportId: $.exportId
      expectedBytes: $.expectedBytes
      expectedSha256: $.sha256

  - id: waitUntilReady
    request:
      method: GET
      url: ${vars.baseUrl}/v1/reports/exports/${steps.createExport.exportId}
      headers:
        authorization: Bearer ${vars.token}
    assert:
      status: 200
      json:
        - path: $.state
          equals: READY

  - id: download
    request:
      method: GET
      url: ${vars.baseUrl}/v1/reports/exports/${steps.createExport.exportId}/download
      headers:
        authorization: Bearer ${vars.token}
        accept: application/pdf
        accept-encoding: identity
    assert:
      status: 200
      headers:
        content-type: application/pdf
        content-length: ${steps.createExport.expectedBytes}
        x-content-sha256: ${steps.createExport.expectedSha256}
        cache-control: no-store

Notes:

  • The accept-encoding: identity request header is a practical trick when you want the byte representation to be consistent across environments.
  • The example asserts a custom checksum header (x-content-sha256) because it is unambiguous. If your API uses ETag or another documented header, assert that instead.
  • Keep captures close to the step that produces them, so diffs in PRs stay readable.

Diagram showing a three-step API flow: create export (capture expectedBytes and sha256), poll until READY, then download and assert headers like Content-Type, Content-Length, and checksum.

Header assertions that catch the most CI flakes

Content-Type and Content-Disposition

These two assertions prevent a surprising number of “download returned something else” regressions.

In particular, validate Content-Disposition when you expect a filename.

A pragmatic approach is:

  • Assert it starts with attachment;.
  • Assert it contains a stable prefix (for example, filename="report-), not an exact full string if the filename includes timestamps.

If your runner supports regex or contains assertions, use them. If it only supports exact equality, move variability into a captured variable (for example, return filename from metadata, then compare).

Cache-Control and Vary

For user-specific exports, you usually want Cache-Control: no-store (or at least private, no-store). In CI, missing cache headers can show up as nondeterministic failures if intermediate layers cache aggressively.

Also look for:

  • Vary: Authorization if responses vary by auth.
  • Vary: Accept-Encoding if representation changes with compression.

Make the cache policy part of the test contract.

Accept-Ranges

If clients rely on resumable downloads, assert Accept-Ranges: bytes. It is a clean, deterministic header check, and it catches misconfigurations in object storage, proxies, and frameworks.

Chunked transfer and streaming downloads

When your endpoint is truly streaming:

  • Do not require Content-Length.
  • Make integrity come from a digest header or metadata call.
  • Consider testing range reads explicitly if supported.

A practical range test looks like:

  • Request with Range: bytes=0-1023
  • Assert 206 Partial Content
  • Assert Content-Range is present and correctly formatted

That verifies the server and edge layers respect byte ranges, which is often critical for large files.

Keeping binary artifacts out of Git, without losing test value

If you need to validate that the file content matches a “golden” artifact, committing binaries to Git is usually a bad tradeoff. It bloats diffs and makes PR review painful.

Better options for experienced teams:

  • Store golden artifacts in object storage and version them (your tests assert the version or checksum).
  • Generate artifacts during the flow, and validate integrity via checksum, not by storing the bytes.
  • Publish artifacts as CI build artifacts for debugging on failure, not as repo fixtures.

This approach pairs naturally with YAML-first workflows because the review surface stays small: changed headers, changed checksum, changed size.

Why YAML-first flows beat Postman/Newman and Bruno for download testing

You can test file downloads in Postman/Newman and Bruno, but it usually turns into imperative scripting:

  • Postman/Newman: JS assertions in the Postman sandbox, plus collection format and environment JSON that is not optimized for review.
  • Bruno: still script-heavy for non-JSON validation, with a tool-specific format.

For download endpoints, the contract is mostly headers and a few stable values. Declarative YAML assertions are a better fit because:

  • Diffs stay readable in PRs (header changes show up line-by-line).
  • Tests stay deterministic (less custom JS logic, fewer hidden dependencies).
  • Request chaining is explicit (capture expected bytes or checksum, then assert them).

This is the core advantage of tools that run native YAML flows (like DevTools) over UI-locked formats: your download contract becomes normal code review, not a “trust the UI” exercise.

Practical tips for DevTools users

If you are generating flows from real browser traffic:

  • Record a focused HAR that includes the export and download steps (see the guide on generating a HAR file in Chrome safely).
  • Normalize volatile headers before committing (cookies, request IDs, timestamps).
  • Convert implicit chaining into explicit captures (export ID, checksum, expected bytes).

If you are migrating an existing suite:

  • Replace Postman scripts that “check the response is a PDF” with explicit header assertions.
  • Move checksum expectations to a first-class captured value, rather than recomputing inside a sandbox.

The migration mechanics are covered in Migrate from Postman to DevTools, the download-specific contract is what you encode with the patterns above.

Frequently Asked Questions

Should I assert Content-Length for every file download? Only when the response is expected to have a known length. For streaming or chunked responses, assert integrity via a digest header or metadata instead.

Is ETag a checksum? Not necessarily. Some systems use a content hash, others use version identifiers or multipart-derived values. Only treat it as a checksum if your API documents that behavior.

How do I make checksum tests deterministic across environments? Compare the download response to a server-provided expected checksum (header or metadata). Also consider forcing Accept-Encoding: identity to avoid representation differences.

What is the simplest test that catches “HTML instead of a file”? Assert Content-Type exactly, and optionally assert Content-Disposition indicates attachment when you expect a download.

How do I avoid committing binaries while still validating content? Assert a stable checksum and byte length, and store the actual file as a CI artifact only on failure (or keep golden files in external storage with versioned hashes).

Run download contracts as code, locally and in CI

File downloads are one of those areas where UI-centric API tools tend to hide important details, and test suites rot into brittle scripts. A YAML-first approach keeps the contract reviewable: headers, sizes, and checksums in plain text, chained explicitly across steps.

If you want to codify these download checks in Git and execute them deterministically in CI, DevTools is designed for that workflow: record real traffic, convert it into readable YAML flows, and run them locally or in your pipeline. Start from https://dev.tools and keep your download behavior under code review, not trapped in a UI export format.