How to test automation when the agent writes the test, not you.
Every top result for this keyword still teaches the 2019 workflow: pick Selenium, write scripts, integrate CI. That is not the workflow anymore. In 2026 the test is a Markdown #Case block, a coding agent interprets it, and the browser actions come from a fixed vocabulary of 18 tools chosen at runtime. Here is how that actually works, line by line, in open source.
The shape, in one sentence
You write the plan. The agent writes the clicks. Real Playwright runs the browser.
No DSL to learn. No brittle selector strings you maintain forever. Tests are yours on disk, portable as plain text, editable from any IDE.
What every "how to test automation" guide still tells you
Read the top five search results for this exact query. TestingXperts, TestRail, GeeksforGeeks, TestGrid, Perfecto. They agree on a shape that has not changed since before coding agents existed: pick a tool from a list (Selenium, Cypress, Appium), choose which cases to automate (regression, performance, API), write scripts in your language of choice, set up a test environment, wire it into CI. The underlying assumption is that a human will author the browser commands.
That assumption is the part worth questioning. If an agent is writing your feature code, why would you hand it a test script to maintain? The more useful question is what the agent needs to automate a test without your help. The answer turns out to be small: a plan format it can read, a tool vocabulary it can call, and a way to know the page has finished updating.
Step one: the plan is a Markdown file, not a DSL
The whole pipeline starts from a file you could write in a Slack message. Four cases, 12 lines, no imports. Each #Case header opens a scenario, and the body is 3-5 imperative lines in English. The PLAN_SYSTEM_PROMPT in assrt-mcp/src/mcp/server.ts:219 pins the format so agents can both read it and generate it, which means the plan file is simultaneously input, output, and source of truth.
Notice what is not in that file. No selector strings. No waits. No timeouts. No retries. No assertion library imports. The agent fills those in per step, using the tool vocabulary described below, by looking at a live snapshot of the page. When the DOM changes because someone renamed a button, the plan stays the same, the snapshot changes, the agent adapts.
Step two: the agent picks from exactly 18 tools
The TOOLS array at assrt-mcp/src/core/agent.ts:16 is the whole menu. There are 18 entries. Browser control (navigate, snapshot, click, type_text, select_option, scroll, press_key, wait, screenshot, evaluate), email-verification primitives (create_temp_email, wait_for_verification_code, check_email_inbox), outcome calls (assert, complete_scenario, suggest_improvement), and two wildcards (http_request for external verification, wait_for_stable for the DOM). That fixed menu is what makes the agent predictable.
One #Case, three tool calls, one assertion
Two of those tools do not exist anywhere in the traditional test-automation guides. The first is create_temp_email, which spins up a disposable inbox so the agent can handle the email-verification arm of a signup without prompting you. The second is http_request, which lets the agent verify external side effects in the same scenario. Both are there because an agent-driven test run will regularly encounter forms and webhooks a human test script would route around.
The anchor fact: how the agent knows when to continue
This is the most specific thing on the page and it is the part other guides miss entirely. Every traditional how-to tells you to pick between a fixed sleep and a selector-based wait, and both are wrong in different ways. A sleep is flaky and slow. A selector wait fails when the app re-renders siblings. Assrt has a third option called wait_for_stable and it is literal DOM-mutation accounting.
What is on the screen: the agent injects a real MutationObserver into the page under test, watching document.body for childList, subtree, and characterData changes. It then polls window.__assrt_mutations every 500ms. When the number stops going up for the configured stable window (2 seconds by default, capped at 10), the agent continues. On timeout it reports how many mutations still happened. Afterwards it disconnects the observer and deletes the counter off window so nothing leaks into the next step.
Why this matters for flaky tests
Most flake in test automation comes from the runner continuing before the app has settled. A selector wait fixes one element at a time and assumes you know which element will signal the end of loading. A MutationObserver does not care what element you care about. It just reports when nothing is changing anymore. That generalizes: the agent does not have to be told the right sentinel for every screen, which means the plan file does not have to name one.
Four capabilities the 2019 playbook leaves out
These are the corners agent-driven automation actually encounters. If your test runner cannot do these, the agent will ask you to manually stub them, and the point of automation is gone.
Disposable email for signup flows
create_temp_email returns a live inbox. wait_for_verification_code polls it for up to 120s. The agent never asks you for a test address.
Split-field OTP paste
The system prompt pins an exact DataTransfer paste expression for six-digit code inputs. Typing one char per field triggers controlled-input bugs the agent learned to route around.
External API verification
http_request is a 30s fetch wrapper. After a UI action, the same scenario can curl the Telegram Bot API to confirm the message actually arrived.
Page-stability without sentinels
wait_for_stable injects a MutationObserver, polls mutation count every 500ms, waits for the number to stop rising. No selector to maintain.
Continuous page discovery
During a run the agent opportunistically snapshots new URLs it navigates to and asks a small model for 1-2 test cases per page. Your suite grows while you test.
Human-editable plan
The plan is Markdown at /tmp/assrt/scenario.md. Save the file, fs.watch debounces for 1s, the next run uses the new text. No UI required.
What a single run looks like in a terminal
The CLI prints what the agent did, step by step. Each line maps back to a tool call in the TOOLS array. When the scenario ends there is a pass/fail and a video of the session. No custom reporter plugin, no dashboard to configure.
Side by side: the old how-to vs the agent-driven how-to
| Feature | Traditional Selenium guide | Assrt (agent-driven) |
|---|---|---|
| What you author | JS/Python code that calls a browser driver | Markdown #Case blocks in plain English |
| Who picks the actions | You, at authoring time | The agent, at runtime, from 18 fixed tools |
| How to wait for a page | sleep(ms) or waitForSelector | MutationObserver + 500ms poll, injected live |
| OTP verification flow | You wire up Mailosaur or a custom SMTP mock | create_temp_email + wait_for_verification_code built in |
| Verifying a webhook or external API | A separate integration test file | http_request tool inside the same scenario |
| Where the test lives | tests/*.spec.ts in your repo | /tmp/assrt/scenario.md (copy to your repo when ready) |
| Rerun on edit | Rerun your test runner manually | fs.watch on scenario.md, 1s debounce, auto-sync |
The traditional workflow still exists under the hood: real Playwright drives Chrome. The difference is who writes the Playwright calls.
The five steps, translated
If you insist on a numbered list the way the other guides have it, here is the 2026 rewrite. Each step takes minutes instead of days because the work moved up to text.
How to set up agent-driven test automation
Install the MCP server
npx assrt-mcp inside any project. The server registers over stdio with your coding agent. There is no daemon to keep running, no port to forward.
Let the agent draft the first plan
Ask your agent to call assrt_plan on a local URL. A #Case file lands at /tmp/assrt/scenario.md with 5-8 entries based on the buttons and inputs it actually sees.
Edit the plan in your text editor
Open scenario.md. Delete the cases you do not care about. Add one more. Save. fs.watch debounces and syncs back. The plan file is the test file.
Run and watch the video
assrt_test reads scenario.md, the agent executes each #Case using the 18-tool vocabulary, and a WebM recording is written at the end. Open it when a case fails.
Check the plan into your repo
cp /tmp/assrt/scenario.md tests/regression/smoke.md. In CI: npx assrt run --plan-file tests/regression/smoke.md --json. The plan is versioned alongside the code it tests.
Concrete numbers from the source
Round numbers you can verify by grepping the repo.
The specific line numbers: TOOLS array at agent.ts line 16, wait_for_stable at agent.ts line 906, fs.watch at scenario-files.ts line 22, PLAN_SYSTEM_PROMPT at server.ts line 219. If any of these drift in a future release, the repo is open source and you can grep.
A short honest checklist before you switch
Fits your workflow
- You already use Claude Code, Cursor, or another MCP-capable coding agent daily
- Your tests target web UIs, not native mobile or desktop apps
- You are comfortable with tests that live as .md files in your repo
- You want the test runner to handle OTPs and webhooks inline
- You can tolerate small runtime non-determinism in exchange for zero selector maintenance
Tests still fail sometimes. The difference is that when they do, the plan file is three sentences, the video is recorded, and the agent prints the exact tool calls it tried. Debugging is reading the transcript, not untangling a DSL.
Run your first agent-driven test in under a minute
One npx command. The agent writes the plan, you edit the plan, real Playwright drives the browser, a WebM recording auto-opens when it finishes.
Install npx assrt-mcp →How to test automation, specific answers
What does 'test automation' actually mean in 2026, when a coding agent is already writing the feature code?
It means the unit of automation moved up one level. In 2019 you automated the clicks: you wrote Selenium or Playwright code that drove the browser. In 2026 you automate the plan: you write a sentence like 'Click Sign In, use a disposable email, verify the dashboard heading loads' and a coding agent picks the right browser tools to execute it. Assrt is a concrete example: its agent exposes exactly 18 tools (navigate, snapshot, click, type_text, select_option, scroll, press_key, wait, screenshot, evaluate, create_temp_email, wait_for_verification_code, check_email_inbox, assert, complete_scenario, suggest_improvement, http_request, wait_for_stable) and chooses among them per step. The Playwright code still runs underneath, the agent just writes it for you at runtime.
How is the #Case format different from a Gherkin feature file or a YAML DSL?
Gherkin and YAML test DSLs are schemas that a human authors and a parser compiles. The #Case format is a prompt that an agent interprets. Concretely, the plan file at /tmp/assrt/scenario.md contains blocks like '#Case 1: Homepage loads' followed by 3-5 lines of plain imperative English. The PLAN_SYSTEM_PROMPT in assrt-mcp/src/mcp/server.ts line 219 pins the shape and tells the agent what its vocabulary is; the agent then picks concrete tools per step. You never learn a syntax. If a non-engineer reads the file on GitHub they can edit it, and the next run uses the edited text. That is not possible with a compiled Given/When/Then tree.
How does the agent know when a page has finished loading after a click?
Assrt has a tool called wait_for_stable. When the agent calls it, the code at agent.ts lines 911-921 executes `window.__assrt_observer = new MutationObserver(...)` inside the page under test, then polls `window.__assrt_mutations` every 500ms. When the count stays flat for the configured stable window (2 seconds by default, 10 max), the action returns with an elapsed time and the total mutation count. This is different from every Selenium wait helper you have used. It is not `waitForElement`, it is not `sleep(2000)`, it is a live count of DOM mutations going to zero. The observer is cleaned up in a finally-style cleanup: `window.__assrt_observer.disconnect(); delete window.__assrt_mutations;` so the page is left exactly as it was.
How do you test flows that require an email verification code?
The agent has two tools for this: create_temp_email and wait_for_verification_code. When it sees an email field in the signup form, the SYSTEM_PROMPT in agent.ts tells it to call create_temp_email first, use the returned address in the form, submit, then call wait_for_verification_code with a timeout in seconds. A disposable inbox is created on demand and polled for up to 120 seconds. If the OTP is split across single-character fields (common for six-digit codes), the agent is instructed to use `evaluate` with a specific DataTransfer paste expression instead of typing one character at a time, because typing per-field triggers controlled-input bugs in many React implementations. That exact expression lives inline in the system prompt so the agent never invents its own variant.
Can the same approach verify that an action produced the right external side effect, like a Slack message or a Stripe webhook?
Yes, that is what the http_request tool is for. It is a 30-second timed fetch wrapper the agent can call with a URL, method, headers, and JSON body. After the agent does something in the UI, like 'Connect Telegram' or 'Publish post', it can call http_request against the external API (Telegram Bot API, Slack Web API, a staging webhook receiver) to confirm the event arrived. This closes a gap that every traditional test-automation guide skirts: the app can look like it worked while the backend fire-and-forget job silently failed. http_request lets a single scenario cross that boundary without leaving the test runner.
Where does the plan live, and does saving it rerun the tests?
The plan lives at /tmp/assrt/scenario.md. The companion scenario-files.ts starts an fs.watch on that file after each test run, debounced by one second. Editing the file from your text editor, from the agent chat, or from a CI step all trigger the same sync back to central storage. Because it is a real file, a human can open it, change '#Case 3: Checkout requires email' to include a second assertion, save, and the next assrt_test run uses the new text. You do not need a UI to edit the test. grep, sed, and git are fine.
Does the agent actually write Playwright code, or does it just call its own tools?
Under the hood, the agent wraps @playwright/mcp, the Playwright MCP server Microsoft maintains. When the agent calls its `click` tool with a description and a ref, that gets translated into real Playwright commands against a real browser process. The ref values (e.g. `ref=e5`) come from Playwright MCP snapshots of the accessibility tree. So the pipeline is: Markdown plan -> agent tool call -> Playwright MCP call -> Playwright browser driver -> Chrome. The test is portable in the sense that if you exported the run, you would have a Playwright trace, not a proprietary YAML format.
What does assrt_plan add on top of just running tests?
assrt_plan navigates to a URL, takes a snapshot of the accessibility tree, and asks a small model to generate 5-8 #Case entries in the same Markdown format assrt_test consumes. The generated plan lands at /tmp/assrt/scenario.md, you review it, keep what is useful, and then assrt_test executes. It is not magic, the model just proposes likely flows based on the buttons and inputs actually visible on the page, under the rule 'each case must be completable in 3-4 actions max, no login/signup unless a form is visible, no CSS or performance tests'. Treat the output as a starting point, not a finished suite.
Done with Selenium docs
The test is the plan. The plan is a file. The agent does the rest.
0 tools, one Markdown file, zero DSLs.
Comments (••)
Leave a comment to see what others are saying.Public and anonymous. No signup.