Performance testing is the umbrella discipline of measuring how a system behaves under workload. Load testing measures behavior at expected steady-state traffic. Stress testing measures behavior past the breaking point. Spike, soak, and smoke are narrower profiles for sudden surges, long-duration drift, and sanity checks. They're related, often confused, and answer different questions.

This is a definitional reference — short, tabled, and decision-tree-led — for engineers who keep getting "can you load test this?" requests where the right test is actually one of the others.

Performance testing is the umbrella — what falls under it

Performance testing is the broad practice of evaluating system behavior in terms of speed, scalability, and reliability under workload. Every other test in this post is a kind of performance test.

Inside that umbrella, the six common profiles are:

Smoke test — a 5-minute sanity check that the system runs at all
Load test — sustained traffic at expected production rates
Stress test — increasing load until the system breaks
Spike test — a sudden surge from low to peak traffic
Soak (endurance) test — moderate traffic held for hours or days
Breakpoint test — a methodical climb to find the exact failure threshold

The mistake most teams make is conflating "performance testing" (the umbrella) with "load testing" (one specific profile). When a stakeholder asks for "performance testing," what they almost always need is a sequence of these — smoke first, then load, then stress or spike depending on the risk model.

Load testing — what it is and isn't

A load test verifies that the system meets its service-level objectives at expected traffic levels for a sustained period.

Goal: confirm the system handles normal production traffic without degrading
Typical duration: 30 minutes to a few hours
Load shape: ramp up to target, hold steady, ramp down
Success criteria: p95 and p99 latency stay under SLO, error rate stays below budget, no resource saturation that wouldn't recover

Load testing answers "can we serve our normal Tuesday-afternoon traffic without anyone noticing?" It does not tell you what happens at 3× normal, what happens during a viral spike, or whether the system slowly leaks memory over a weekend. Those are stress, spike, and soak tests respectively.

A load test is the most common starting point for a performance program because the success criterion is concrete: SLOs you've already committed to. If you don't have SLOs yet, define them first — load testing without an SLO is just generating numbers.

Stress testing — what "breaking point" actually means

A stress test pushes load past expected levels until the system fails, to characterize how it fails.

Goal: find the breaking point and observe the failure mode
Typical duration: 15 minutes to an hour, with load increasing throughout
Load shape: continuous ramp-up with no plateau, or stepped increases until something breaks
Success criteria: the system degrades gracefully (rejects with 429, queues, sheds load) rather than crashing or corrupting data

Stress testing is not about finding a number to brag about. It's about observing failure: does the database connection pool exhaust first, or the load balancer? Do queues back up safely, or does the worker pool fork-bomb itself? Does autoscaling kick in fast enough, or does the first 60 seconds of overload generate timeouts that propagate?

The output of a stress test is usually a paragraph in a runbook, not a graph in a slide deck. "When p99 reaches 800 ms, we're 90 seconds away from queue saturation; the auto-scaler responds in 120 seconds, so the operational SLO at peak is to alert when p99 crosses 600 ms."

Spike, soak, and smoke testing in one paragraph each

A spike test simulates a sudden, severe traffic surge — Black Friday, a TV-ad mention, a launch-day push notification. Load jumps from a baseline to a multiple of normal in seconds, holds briefly, then drops back. The interesting question is recovery: does the system stabilize within minutes once the spike subsides, or do retries keep it pinned?

A soak (endurance) test runs at moderate, realistic load for a long time — often 8 to 72 hours. It surfaces problems no short test can find: memory leaks, connection-pool exhaustion, log-disk fill, certificate refresh failures, stale-cache drift. Soak tests are the cheapest way to catch the bugs that take down production at 4 a.m. on day three of an incident-free release.

A smoke test is a 5-minute, low-volume run that verifies the test setup itself works — the script connects, auth succeeds, endpoints respond — before you commit to a longer test. Always run a smoke test before any other performance test. Catching a misconfigured environment after a 4-hour soak is its own kind of failure.

Side-by-side comparison

Test type	Goal	Duration	Load shape	Success criterion	When to run
Smoke	Verify the test runs	5 min	Light, flat	Script completes, no setup errors	Before every other test
Load	Meet SLO at expected traffic	30 min – 2 hr	Ramp, hold steady, ramp down	p95/p99 within SLO, errors within budget	Pre-merge for hot paths, nightly for full suite
Stress	Find breaking point and failure mode	15 min – 1 hr	Continuous or stepped ramp	Graceful degradation observed	Quarterly, before major launches
Spike	Survive sudden surges	10–30 min	Sharp jump, brief plateau, drop	Recovers within minutes of spike end	Before known events (launches, marketing pushes)
Soak	Find slow-burn failures	4–72 hr	Moderate, sustained	No drift in latency, memory, error rate	Before major releases, monthly
Breakpoint	Find exact failure threshold	30 min – 2 hr	Methodical step increases	Threshold identified per resource	When sizing infrastructure or capacity planning

A simple decision tree for picking the right test

Ask yourself the question and the test follows:

"Did I just change my test setup?" → Smoke test.
"Can we handle Tuesday's normal traffic?" → Load test against your SLO.
"What happens at 3× normal?" → Stress test.
"Can we survive Black Friday?" → Spike test, then a load test at projected Black Friday volume.
"Did our last release introduce a memory leak?" → Soak test for at least 8 hours.
"How much capacity do we actually need?" → Breakpoint test, then size to ~60% of the threshold.
"Is this API ready for production at all?" → Smoke, then load. Add others as risk warrants.

For most teams, the realistic cadence is: smoke before every test, load on a nightly schedule, soak monthly, stress and spike before major launches, breakpoint when capacity-planning.

Common misuses

A few patterns that show up in almost every performance program at some point.

Running stress tests when you needed load tests. A stress test that pushes far past expected traffic produces alarming numbers that don't reflect production. If you're trying to answer "are we safe at our current scale?", a load test at expected traffic is the right tool.

Skipping smoke before a long test. A 4-hour soak that fails in the first 30 seconds because of a typo in the script is the kind of mistake nobody admits to twice. Smoke tests cost five minutes; they're cheap insurance.

Treating one heavy load test as a substitute for soak. A 30-minute load test at peak rate doesn't catch memory leaks, file-descriptor exhaustion, or cert-rotation bugs. Soak is a separate test for separate failure modes.

Loading tests in the wrong environment. Tests run against a quarter-sized staging environment generate numbers that have nothing to do with prod behavior. Either use a prod-clone or be explicit that staging numbers are directional only.

No SLOs. "We did some load testing and the response times looked OK" is not a result. Without SLO targets, every load test is a numbers exercise that nobody can act on. The first artifact of any performance program should be a written SLO; the second is the load test against it.

If you want to actually script these profiles in k6 — including stages snippets for each — that's covered in the load test profiles guide (and the deeper API load testing pillar when it lands). For end-to-end load testing of an API specifically, see how to load test an API.

FAQ

Is performance testing the same as load testing?

No. Performance testing is the umbrella that includes load, stress, spike, soak, smoke, and breakpoint testing. Load testing is one specific profile inside that umbrella — sustained traffic at expected production rates against an SLO.

What's the difference between load testing and stress testing in one sentence?

Load testing verifies the system meets its SLOs at expected traffic; stress testing pushes traffic past expected levels until the system fails, to observe how it fails.

Do I need to run all six profiles?

No team runs all six on a frequent cadence. The realistic minimum is smoke before every test and load on a regular schedule (nightly or pre-merge for critical paths). Add soak before major releases, spike before known events, stress quarterly, and breakpoint only when capacity planning.

Where does benchmarking fit in?

Benchmarking is the comparison of measured performance against a fixed reference — another version, another configuration, another product. Any of the profiles above can be used to generate the numbers that go into a benchmark, but benchmarking is a reporting practice, not a test type.

What's the difference between load testing and capacity testing?

They overlap. "Capacity testing" usually refers to a breakpoint test: methodically increasing load to find the exact threshold at which a specific resource (CPU, memory, connections, queue depth) saturates. A load test holds at expected traffic; a capacity test climbs until something breaks.

How are these different for API testing specifically?

The same six profiles apply, but the metrics shift toward latency percentiles (p50/p95/p99), throughput in requests per second, and error rate. UI-driven load tests also measure browser render time, which doesn't apply to APIs. For API-specific guidance, the API load testing guide covers the metric set in detail.

If you've already nailed down which profile you need, the next decision is the tool. The k6 vs JMeter comparison covers the two most common choices for API-shaped workloads.