Test automation, 2026 edition

How to test automation when the agent writes the test, not you.

Every top result for this keyword still teaches the 2019 workflow: pick Selenium, write scripts, integrate CI. That is not the workflow anymore. In 2026 the test is a Markdown #Case block, a coding agent interprets it, and the browser actions come from a fixed vocabulary of 18 tools chosen at runtime. Here is how that actually works, line by line, in open source.

M
Matthew Diakonov
11 min read
4.8from Assrt MCP users
Plan file is plain Markdown at /tmp/assrt/scenario.md
18 fixed tools in the agent vocabulary, including OTP and http_request
wait_for_stable is a live MutationObserver, not a timer

The shape, in one sentence

You write the plan. The agent writes the clicks. Real Playwright runs the browser.

No DSL to learn. No brittle selector strings you maintain forever. Tests are yours on disk, portable as plain text, editable from any IDE.

What every "how to test automation" guide still tells you

Read the top five search results for this exact query. TestingXperts, TestRail, GeeksforGeeks, TestGrid, Perfecto. They agree on a shape that has not changed since before coding agents existed: pick a tool from a list (Selenium, Cypress, Appium), choose which cases to automate (regression, performance, API), write scripts in your language of choice, set up a test environment, wire it into CI. The underlying assumption is that a human will author the browser commands.

That assumption is the part worth questioning. If an agent is writing your feature code, why would you hand it a test script to maintain? The more useful question is what the agent needs to automate a test without your help. The answer turns out to be small: a plan format it can read, a tool vocabulary it can call, and a way to know the page has finished updating.

Pick Selenium2019 playbookPick Cypress2019 playbookWrite test scripts2019 playbookSet up a test environment2019 playbookIntegrate with CI2019 playbookDecide what to automate2019 playbookDecide what not to automate2019 playbookPick an assertion library2019 playbookSet up reporting2019 playbook

Step one: the plan is a Markdown file, not a DSL

The whole pipeline starts from a file you could write in a Slack message. Four cases, 12 lines, no imports. Each #Case header opens a scenario, and the body is 3-5 imperative lines in English. The PLAN_SYSTEM_PROMPT in assrt-mcp/src/mcp/server.ts:219 pins the format so agents can both read it and generate it, which means the plan file is simultaneously input, output, and source of truth.

/tmp/assrt/scenario.md

Notice what is not in that file. No selector strings. No waits. No timeouts. No retries. No assertion library imports. The agent fills those in per step, using the tool vocabulary described below, by looking at a live snapshot of the page. When the DOM changes because someone renamed a button, the plan stays the same, the snapshot changes, the agent adapts.

Step two: the agent picks from exactly 18 tools

The TOOLS array at assrt-mcp/src/core/agent.ts:16 is the whole menu. There are 18 entries. Browser control (navigate, snapshot, click, type_text, select_option, scroll, press_key, wait, screenshot, evaluate), email-verification primitives (create_temp_email, wait_for_verification_code, check_email_inbox), outcome calls (assert, complete_scenario, suggest_improvement), and two wildcards (http_request for external verification, wait_for_stable for the DOM). That fixed menu is what makes the agent predictable.

One #Case, three tool calls, one assertion

#Case 1
#Case 2
#Case 3
Assrt agent
navigate
click
type_text
assert
navigatesnapshotclicktype_textselect_optionscrollpress_keywaitscreenshotevaluatecreate_temp_emailwait_for_verification_codecheck_email_inboxassertcomplete_scenariosuggest_improvementhttp_requestwait_for_stable

Two of those tools do not exist anywhere in the traditional test-automation guides. The first is create_temp_email, which spins up a disposable inbox so the agent can handle the email-verification arm of a signup without prompting you. The second is http_request, which lets the agent verify external side effects in the same scenario. Both are there because an agent-driven test run will regularly encounter forms and webhooks a human test script would route around.

The anchor fact: how the agent knows when to continue

This is the most specific thing on the page and it is the part other guides miss entirely. Every traditional how-to tells you to pick between a fixed sleep and a selector-based wait, and both are wrong in different ways. A sleep is flaky and slow. A selector wait fails when the app re-renders siblings. Assrt has a third option called wait_for_stable and it is literal DOM-mutation accounting.

assrt-mcp/src/core/agent.ts:911-944

What is on the screen: the agent injects a real MutationObserver into the page under test, watching document.body for childList, subtree, and characterData changes. It then polls window.__assrt_mutations every 500ms. When the number stops going up for the configured stable window (2 seconds by default, capped at 10), the agent continues. On timeout it reports how many mutations still happened. Afterwards it disconnects the observer and deletes the counter off window so nothing leaks into the next step.

Why this matters for flaky tests

Most flake in test automation comes from the runner continuing before the app has settled. A selector wait fixes one element at a time and assumes you know which element will signal the end of loading. A MutationObserver does not care what element you care about. It just reports when nothing is changing anymore. That generalizes: the agent does not have to be told the right sentinel for every screen, which means the plan file does not have to name one.

Four capabilities the 2019 playbook leaves out

These are the corners agent-driven automation actually encounters. If your test runner cannot do these, the agent will ask you to manually stub them, and the point of automation is gone.

Disposable email for signup flows

create_temp_email returns a live inbox. wait_for_verification_code polls it for up to 120s. The agent never asks you for a test address.

Split-field OTP paste

The system prompt pins an exact DataTransfer paste expression for six-digit code inputs. Typing one char per field triggers controlled-input bugs the agent learned to route around.

External API verification

http_request is a 30s fetch wrapper. After a UI action, the same scenario can curl the Telegram Bot API to confirm the message actually arrived.

Page-stability without sentinels

wait_for_stable injects a MutationObserver, polls mutation count every 500ms, waits for the number to stop rising. No selector to maintain.

Continuous page discovery

During a run the agent opportunistically snapshots new URLs it navigates to and asks a small model for 1-2 test cases per page. Your suite grows while you test.

Human-editable plan

The plan is Markdown at /tmp/assrt/scenario.md. Save the file, fs.watch debounces for 1s, the next run uses the new text. No UI required.

What a single run looks like in a terminal

The CLI prints what the agent did, step by step. Each line maps back to a tool call in the TOOLS array. When the scenario ends there is a pass/fail and a video of the session. No custom reporter plugin, no dashboard to configure.

npx assrt run --url http://localhost:3000

Side by side: the old how-to vs the agent-driven how-to

FeatureTraditional Selenium guideAssrt (agent-driven)
What you authorJS/Python code that calls a browser driverMarkdown #Case blocks in plain English
Who picks the actionsYou, at authoring timeThe agent, at runtime, from 18 fixed tools
How to wait for a pagesleep(ms) or waitForSelectorMutationObserver + 500ms poll, injected live
OTP verification flowYou wire up Mailosaur or a custom SMTP mockcreate_temp_email + wait_for_verification_code built in
Verifying a webhook or external APIA separate integration test filehttp_request tool inside the same scenario
Where the test livestests/*.spec.ts in your repo/tmp/assrt/scenario.md (copy to your repo when ready)
Rerun on editRerun your test runner manuallyfs.watch on scenario.md, 1s debounce, auto-sync

The traditional workflow still exists under the hood: real Playwright drives Chrome. The difference is who writes the Playwright calls.

The five steps, translated

If you insist on a numbered list the way the other guides have it, here is the 2026 rewrite. Each step takes minutes instead of days because the work moved up to text.

How to set up agent-driven test automation

1

Install the MCP server

npx assrt-mcp inside any project. The server registers over stdio with your coding agent. There is no daemon to keep running, no port to forward.

2

Let the agent draft the first plan

Ask your agent to call assrt_plan on a local URL. A #Case file lands at /tmp/assrt/scenario.md with 5-8 entries based on the buttons and inputs it actually sees.

3

Edit the plan in your text editor

Open scenario.md. Delete the cases you do not care about. Add one more. Save. fs.watch debounces and syncs back. The plan file is the test file.

4

Run and watch the video

assrt_test reads scenario.md, the agent executes each #Case using the 18-tool vocabulary, and a WebM recording is written at the end. Open it when a case fails.

5

Check the plan into your repo

cp /tmp/assrt/scenario.md tests/regression/smoke.md. In CI: npx assrt run --plan-file tests/regression/smoke.md --json. The plan is versioned alongside the code it tests.

Concrete numbers from the source

Round numbers you can verify by grepping the repo.

0Tools in the agent vocabulary
0msMutation poll interval in wait_for_stable
0sMax OTP wait in wait_for_verification_code
0sfs.watch debounce before syncing the plan

The specific line numbers: TOOLS array at agent.ts line 16, wait_for_stable at agent.ts line 906, fs.watch at scenario-files.ts line 22, PLAN_SYSTEM_PROMPT at server.ts line 219. If any of these drift in a future release, the repo is open source and you can grep.

A short honest checklist before you switch

Fits your workflow

  • You already use Claude Code, Cursor, or another MCP-capable coding agent daily
  • Your tests target web UIs, not native mobile or desktop apps
  • You are comfortable with tests that live as .md files in your repo
  • You want the test runner to handle OTPs and webhooks inline
  • You can tolerate small runtime non-determinism in exchange for zero selector maintenance

Tests still fail sometimes. The difference is that when they do, the plan file is three sentences, the video is recorded, and the agent prints the exact tool calls it tried. Debugging is reading the transcript, not untangling a DSL.

Run your first agent-driven test in under a minute

One npx command. The agent writes the plan, you edit the plan, real Playwright drives the browser, a WebM recording auto-opens when it finishes.

Install npx assrt-mcp

How to test automation, specific answers

What does 'test automation' actually mean in 2026, when a coding agent is already writing the feature code?

It means the unit of automation moved up one level. In 2019 you automated the clicks: you wrote Selenium or Playwright code that drove the browser. In 2026 you automate the plan: you write a sentence like 'Click Sign In, use a disposable email, verify the dashboard heading loads' and a coding agent picks the right browser tools to execute it. Assrt is a concrete example: its agent exposes exactly 18 tools (navigate, snapshot, click, type_text, select_option, scroll, press_key, wait, screenshot, evaluate, create_temp_email, wait_for_verification_code, check_email_inbox, assert, complete_scenario, suggest_improvement, http_request, wait_for_stable) and chooses among them per step. The Playwright code still runs underneath, the agent just writes it for you at runtime.

How is the #Case format different from a Gherkin feature file or a YAML DSL?

Gherkin and YAML test DSLs are schemas that a human authors and a parser compiles. The #Case format is a prompt that an agent interprets. Concretely, the plan file at /tmp/assrt/scenario.md contains blocks like '#Case 1: Homepage loads' followed by 3-5 lines of plain imperative English. The PLAN_SYSTEM_PROMPT in assrt-mcp/src/mcp/server.ts line 219 pins the shape and tells the agent what its vocabulary is; the agent then picks concrete tools per step. You never learn a syntax. If a non-engineer reads the file on GitHub they can edit it, and the next run uses the edited text. That is not possible with a compiled Given/When/Then tree.

How does the agent know when a page has finished loading after a click?

Assrt has a tool called wait_for_stable. When the agent calls it, the code at agent.ts lines 911-921 executes `window.__assrt_observer = new MutationObserver(...)` inside the page under test, then polls `window.__assrt_mutations` every 500ms. When the count stays flat for the configured stable window (2 seconds by default, 10 max), the action returns with an elapsed time and the total mutation count. This is different from every Selenium wait helper you have used. It is not `waitForElement`, it is not `sleep(2000)`, it is a live count of DOM mutations going to zero. The observer is cleaned up in a finally-style cleanup: `window.__assrt_observer.disconnect(); delete window.__assrt_mutations;` so the page is left exactly as it was.

How do you test flows that require an email verification code?

The agent has two tools for this: create_temp_email and wait_for_verification_code. When it sees an email field in the signup form, the SYSTEM_PROMPT in agent.ts tells it to call create_temp_email first, use the returned address in the form, submit, then call wait_for_verification_code with a timeout in seconds. A disposable inbox is created on demand and polled for up to 120 seconds. If the OTP is split across single-character fields (common for six-digit codes), the agent is instructed to use `evaluate` with a specific DataTransfer paste expression instead of typing one character at a time, because typing per-field triggers controlled-input bugs in many React implementations. That exact expression lives inline in the system prompt so the agent never invents its own variant.

Can the same approach verify that an action produced the right external side effect, like a Slack message or a Stripe webhook?

Yes, that is what the http_request tool is for. It is a 30-second timed fetch wrapper the agent can call with a URL, method, headers, and JSON body. After the agent does something in the UI, like 'Connect Telegram' or 'Publish post', it can call http_request against the external API (Telegram Bot API, Slack Web API, a staging webhook receiver) to confirm the event arrived. This closes a gap that every traditional test-automation guide skirts: the app can look like it worked while the backend fire-and-forget job silently failed. http_request lets a single scenario cross that boundary without leaving the test runner.

Where does the plan live, and does saving it rerun the tests?

The plan lives at /tmp/assrt/scenario.md. The companion scenario-files.ts starts an fs.watch on that file after each test run, debounced by one second. Editing the file from your text editor, from the agent chat, or from a CI step all trigger the same sync back to central storage. Because it is a real file, a human can open it, change '#Case 3: Checkout requires email' to include a second assertion, save, and the next assrt_test run uses the new text. You do not need a UI to edit the test. grep, sed, and git are fine.

Does the agent actually write Playwright code, or does it just call its own tools?

Under the hood, the agent wraps @playwright/mcp, the Playwright MCP server Microsoft maintains. When the agent calls its `click` tool with a description and a ref, that gets translated into real Playwright commands against a real browser process. The ref values (e.g. `ref=e5`) come from Playwright MCP snapshots of the accessibility tree. So the pipeline is: Markdown plan -> agent tool call -> Playwright MCP call -> Playwright browser driver -> Chrome. The test is portable in the sense that if you exported the run, you would have a Playwright trace, not a proprietary YAML format.

What does assrt_plan add on top of just running tests?

assrt_plan navigates to a URL, takes a snapshot of the accessibility tree, and asks a small model to generate 5-8 #Case entries in the same Markdown format assrt_test consumes. The generated plan lands at /tmp/assrt/scenario.md, you review it, keep what is useful, and then assrt_test executes. It is not magic, the model just proposes likely flows based on the buttons and inputs actually visible on the page, under the rule 'each case must be completable in 3-4 actions max, no login/signup unless a form is visible, no CSS or performance tests'. Treat the output as a starting point, not a finished suite.

Done with Selenium docs

The test is the plan. The plan is a file. The agent does the rest.

0 tools, one Markdown file, zero DSLs.

Try Assrt free

How did this page land for you?

React to reveal totals

Comments ()

Leave a comment to see what others are saying.

Public and anonymous. No signup.