1. Worked on a team that owned a service that was operating as expected
2. 1st basic load test: unravelled a throttling issue
3. Scaling out was fine, requests were responding slowly
4. Investigated: spotted bad K8S config
5. Was common config across the organization, impacted most other services in the org too
6. K8S cluster had capacity to support this burst performance, but due to the CPU limits in place, it was throttled.
7. 1-2 line YAML change: significant performance improvement
Just getting started already delivered value.
Review: each line
Note: no "await" on the get!
No VUs configured, so just one user.
Function called 10 times, sleeps 1 second each time.
Total test time is ~10 seconds.
If we had 10 VUs then it would spend those 10 iterations sooner (so execution would be ~1s)
K6 can do a lot, but it doesn't need to be complicated.
Note:
http_req_dur
http_req_failed
execution time (bottom)
---
# Core Concepts - Test Lifecycle
```javascript
// 1. init code - e.g. imports
export function setup() {
// 2. setup code, OPTIONAL, e.g. prepare test data for the user
}
export default function (data) {
// 3. VU code, REQUIRED, your test - e.g. hit endpoint with some data
}
export function teardown(data) {
// 4. teardown code, OPTIONAL, e.g. clean up data
}
```
If you want a CHECK to fail, it must be combined with a THRESHOLD
SLO's you'd like to report on
Exit code is helpful particularly in CI to fail the build
Can combine checks with thresholds: examples in the documentation
Show K6 script and execute it - view CLI output.
Show: Threshold is 95% of checks should pass, as service only errors 1% only of the time.
If setting up a few test accounts manually is 30 minutes
Do that... it might be much quicker than automating it all!
Complex: computationally heavy, complex DB queries?
Do you really want to test the CDN? what's the point? youy might get blocked, and you're not testing your origin properly
Writing: clean up to avoid impacting subsequent test runs
So customers aren't using it and you're not impacting their experience.
You don't want other people's workloads / noise impacting your tests.
- Bad example: Cloud hosted runner
- as it could be running anything for other users, you're at their mercy
- it might be running in different regions each time
Long execution time = heavy on the environment, and if it was every push you'd be waiting forever for your build to complete.
You can use stages for warm up. If you don't warm up you might not scale quick enough!
Obviously depends on the customer traffic patterns.
Warm up example in my final live demo
FIRST: Kick off workflow
Show:
- Workflow
- API Code?
- Test Code
Go over the code, warm up, configuration..
<<<5 min wait>>> - jump to anecdote whilst running?
Show K6 output in GHA
Show Cloudflare output
Gotcha: K6 vs Cloudflare reporting of response times (client vs server reporting)
Note in particular:
- P99
- Requests Per Second
- Errors
- CPU time
Once our tests were running nightly in CI, and things were stable. One morning, we spotted a performance drop.
Git-bisect-style: to find the issue and re-running the load tests helped me identify the issue.
In turn we also noticed it was impacting a bunch of other services and teams across the organization (who were still scratching their heads). Unlikely suspect: a minor OTEL update dependabot merged!
Repo contains: Basic API, Also the GitHub Action for running K6
Any questions?
- Content
- Look at config in GHA again if time.