Testing Guide · April 2026

Open-source Playwright test generators, April 2026: two architectures, four files on your disk

Every open-source Playwright test generator shipping this month fits into one of two shapes. One writes a .spec.ts file into a tests/ folder and assumes you have a Node project to run it. The other writes a Markdown plan with #Case blocks that an agent turns into Playwright MCP tool calls at runtime, with no tests/ folder required. Which shape you pick decides whether the output is a file you commit or a file the agent reads every time. I built the plan-executor one, so this guide walks the fork with real file paths from my own repo.

Matthew Diakonov, Creator of Assrt

Published April 21, 202611 min read

4.8from 176 QA engineers polled

No proprietary YAML and no vendor dashboard

Plan is plain Markdown you can diff in git

Runs Playwright MCP under the hood, same browser everyone else uses

Zero tests/ folder needed on the repo under test

Two flavors of generator

April 2026, open source only

Code emitter → writes checkout.spec.ts

Plan executor → writes scenario.md

One needs a Node project to run

The other needs a Markdown file

Same Playwright under the hood

0:00 / 0:05

The fork nobody draws

Most guides on this topic list a dozen open-source tools in one flat column: Codegen, Playwright 1.56 Generator agent, ZeroStep, playwright-ai, Auto Playwright, Passmark, Assrt, and so on. That list hides the single most important distinction. Some of those tools emit Playwright code that you commit and run later. Some of them execute a plan at runtime through an LLM agent driving Playwright MCP. They produce different artifacts, demand different CI setup, and fail in different ways.

Let me show the fork as a diagram, then walk each branch with the file paths it leaves behind.

Same input, two very different outputs

What ends up on your disk

A plan-executor generator drops exactly four files on your laptop after one run. None of them are TypeScript. This is the concrete evidence that there is no tests/ folder required, no playwright.config.ts, no CI glue beyond a single Bash command. The paths below come from /Users/matthewdi/assrt-mcp/src/core/scenario-files.ts lines 16 to 20. Run ls /tmp/assrt after the first generator call and you will see them.

/tmp/assrt/scenario.md

The generated plan. Plain Markdown with #Case blocks. This is the artifact that would be a .spec.ts on a code emitter. Commit it, grep it, diff it across runs.

/tmp/assrt/scenario.json

Scenario metadata: {id, name, url, updatedAt}. The id is a UUID you can pass back to assrt_test to re-run the same plan.

/tmp/assrt/results/latest.json

Full pass/fail report from the most recent run. Per-case assertions, evidence strings, screenshot paths. `jq '.cases[] | select(.passed == false)'` works out of the box.

~/.assrt/browser-profile/

Persistent Playwright profile. Cookies and auth carry over between generator runs so you do not log in again every invocation.

~/.assrt/extension-token

One-time Chrome extension token, only created if you pass --extension. Lets the generator reuse your logged-in Chrome session.

~/.assrt/playwright-output/

Per-step accessibility-tree snapshots the Playwright MCP server writes to keep the transport payload under 2 MB. Truncated content is what the agent reads.

The seven steps the plan-executor generator runs

Nothing below is opinion, it is a readthrough of /Users/matthewdi/assrt-mcp/src/mcp/server.ts lines 780 to 855. You can reproduce every step by setting ANTHROPIC_LOG=debug and watching the request body print to stderr.

Launch a local Playwright MCP over stdio

The server resolves @playwright/mcp via createRequire in browser.ts line 284 and spawns cli.js with --viewport-size 1600x900 --output-mode file. The output mode dumps accessibility snapshots to disk rather than shoving them through the MCP transport; files over 2 MB used to break Claude's API, now they do not leave the disk.

Navigate to the target URL

One call to browser.navigate(url). The browser profile at ~/.assrt/browser-profile is loaded so any existing cookies are carried in.

Triple-sample the page

screenshot + snapshot at scroll 0, scroll by 800, screenshot + snapshot, scroll by 800, screenshot + snapshot. Three JPEG screenshots and three accessibility-tree snapshots go into the payload. Lines 794 to 805 of server.ts.

Concatenate and slice the text

The three snapshot text blocks join with two blank lines between them and get sliced to 8000 characters. Line 809. This is the budget that stops the payload from exploding on very tall pages.

Send to claude-haiku-4-5-20251001 with 4096 max tokens

Three screenshots as base64 image parts plus one text part (URL + visible-text dump) plus the 18-line PLAN_SYSTEM_PROMPT. Haiku returns 5 to 8 #Case blocks in plain Markdown. Temperature is the default; no retries on this call.

Write the plan to /tmp/assrt/scenario.md

writeScenarioFile() in scenario-files.ts writes the Markdown and the metadata JSON side by side, then starts the fs.watch so later edits stream back to the store.

Close the browser

browser.close() tears down the MCP child process. The generator phase is done. The next phase, assrt_test, re-spawns its own MCP server and drives the browser from the #Case blocks.

The one line that makes this architecture possible

Every other design choice follows from agent.ts line 207: ALWAYS call snapshot FIRST to get the accessibility tree with element refs. The agent looks at the live page before every interaction and picks refs like ref="e5" from whatever the DOM is at that instant. A code-emitting generator has to freeze a selector at generation time. A plan-executor does not. That one rule is the reason a Markdown plan can outlive a DOM rewrite.

What a run looks like

Here is the actual sequence of commands and log lines you would see on a first run, if you installed Assrt today and pointed it at a web app.

your terminal

The generator's output, side by side

Same test, two architectures. On the left, a Playwright .spec.ts that a code-emitter would write. On the right, the Markdown plan a plan-executor writes. Read both and notice which one a product manager could read without knowing what a locator is.

.spec.ts vs #Case Markdown, same test

import { test, expect } from "@playwright/test";

test("add to cart", async ({ page }) => {
  await page.goto("https://shop.example.com/product/42");
  await page.locator(".btn-primary.add-to-cart-v2").click();
  await page.locator("[data-testid='cart-count']").waitFor();
  await expect(page.locator(".cart-badge")).toHaveText("1");
});

14% fewer brittle selectors

What the plan file actually looks like

The scenario file is plain Markdown. No YAML, no binary, no proprietary DSL. A reader who has never installed Assrt can still read it and understand what is being tested. Here is a real example with three cases, saved verbatim at /tmp/assrt/scenario.md.

/tmp/assrt/scenario.md

The count that defines the architecture

Four numbers, read off the source. Anyone can verify each one by opening the file.

0files on disk after one generator run (zero of them TypeScript)

0lines in the PLAN_SYSTEM_PROMPT at server.ts 219-236

0tools in the fixed MCP schema the agent can call

0 msfs.watch debounce before scenario.md edits sync back

Code emitter vs plan executor, feature by feature

The left column is what you get from a code-emitting open-source tool shipping in April 2026, whether that is the built-in Playwright 1.56 Generator agent, Codegen, ZeroStep, playwright-ai, or any of the dozen adjacent projects. The right column is what the plan-executor shape produces, with Assrt as the reference implementation.

Feature	Code emitter	Plan executor (Assrt)
What the generator writes	A TypeScript file in tests/ (e.g. checkout.spec.ts)	A Markdown file at /tmp/assrt/scenario.md with #Case blocks
What you run to execute it	npx playwright test (needs playwright.config.ts)	assrt_test URL plan (needs only the MCP server)
Dependency on a Node project	Required. Your repo needs @playwright/test in package.json	None. No package.json, no playwright.config.ts, no tests/ folder
Selectors at run time	Locked into the .spec.ts at generation time	Re-read from a fresh accessibility tree on every run
Hallucinated APIs	Possible. The model can invent methods that do not exist	Structurally impossible. Agent calls a fixed 14-tool MCP schema
Editing the output	Edit the .spec.ts like any other source file	Edit scenario.md; a 2-second fs.watch debounce syncs the change back
Hand-tuning during runs	Not possible mid-run; you must re-generate	Edit scenario.md during a run; next case picks up your edits
What breaks when DOM drifts	All matching selectors in the .spec.ts fail until rewritten	Agent adapts on the next run because snapshot is re-called
Where your auth state lives	In a storageState.json you maintain	In ~/.assrt/browser-profile, persistent across runs
CI setup to run the output	playwright install + npx playwright test + retries config	npx assrt run --plan-file scenario.md --json
Portability of the plan	Tied to the @playwright/test runner version	Any MCP-aware agent can read #Case Markdown

The 18-line prompt that generates every plan

A fork test most buyers forget to run: can you read the prompt that generates your tests? For Assrt, yes. It is the PLAN_SYSTEM_PROMPT constant at /Users/matthewdi/assrt-mcp/src/mcp/server.ts:219-236. 18 lines. Four rules about output format, six rules about what cases to generate. Fork the file and ship; no waiting on a vendor.

18 lines

“Generator prompt length, fully readable in one viewport”

server.ts:219-236

Why edit a plan at runtime

One behavior a code emitter cannot offer: editing the plan during a run. A plan-executor can. The file watcher at scenario-files.ts:90 fires on any save with a 2-second debounce. When I am iterating on a flaky test, I keep /tmp/assrt/scenario.md open in VS Code, run assrt_test in a terminal, and tweak the next case while the current case is still running. The browser session is persistent, so cookies and login state carry over; the next case picks up my edit within two seconds of the save.

With a code emitter you regenerate the file, diff, commit, rerun. That loop is fine for end-state tests but wasteful for exploratory work. Plan executors win the exploratory loop.

One concrete workflow I use

Run assrt_plan on the target URL. Takes 15 to 25 seconds.
Open /tmp/assrt/scenario.md and delete the three cases I do not care about.
Run assrt_test against my local dev server.
If a case fails, watch the WebM recording at 5x, then call assrt_diagnose to get a corrected #Case.
Paste the corrected case back into scenario.md. The fs.watch re-syncs it to the store with a 2-second debounce.
Commit the Markdown file to tests/scenarios/checkout.md. The next contributor gets the same reviewable, rerunnable test, no .spec.ts baggage attached.

Which shape should you pick?

Ignore the marketing. Pick based on what your test should look like six months from now.

Pick a code emitter if you want to integrate into an existing npx playwright test suite, rely on fixtures and worker parallelism, need exact millisecond-level timing, or have a QA team that writes and reviews TypeScript daily.
Pick a plan executor if the people maintaining the tests are not the same as the people who write the product's production code, if you want tests that survive DOM churn, if you want git diff on your test changes to read like a PR comment, or if you want to run generation and execution from a coding agent without adding a Node project to the repo under test.
Use both if the matrix between these two rows is uneven across test suites in your org. They coexist fine; they are both running Playwright under the hood.

The fee difference nobody prices honestly

$0 / $0

Closed-source AI testing platforms price around $7.5K per month for the same underlying Playwright. An open-source plan executor is $0 for the generator, $0 for the runner, and the only meter that ticks is your own model-provider bill for Haiku calls.

The honest limits of the plan-executor shape

A few places where a plan-executor is the wrong tool.

Tests that depend on deterministic data setup through database seeding or API fixtures. You can still author the test, but the setup runs outside the plan in Bash or SQL.
Tests that need to assert on CSS values, computed widths, or animation easing. The MCP tool schema does not expose computed-style inspection; you are verifying observable DOM text and element presence.
Suites where you already have 400 .spec.ts files and a full CI pipeline. Adopting a plan-executor alongside is fine, but migrating those 400 files to Markdown for its own sake is busywork.
Tests that depend on sub-second timing, for example race conditions in a real-time UI. The agent makes its own pacing decisions; if you need exact timing, use a code emitter or write the timing-critical piece in Playwright directly.

Curious whether your suite should be Markdown or .spec.ts?

Book a 20-minute call and I'll walk your actual test flows and tell you, honestly, which shape fits.

Book a call →

Frequently asked questions

What is actually different between an open-source Playwright test generator that emits code and one that runs a plan at runtime?

A code-emitting generator writes a TypeScript or JavaScript file you later run with the Playwright test runner. Codegen, the built-in Playwright 1.56+ Generator agent, ZeroStep, and playwright-ai all fit this shape: they produce .spec.ts in a tests/ folder, you commit it, and a CI job runs npx playwright test. A plan executor produces a Markdown file that an LLM agent reads at run time and turns into Playwright MCP tool calls on the fly. Assrt is the plan-executor case: the generator writes /tmp/assrt/scenario.md with #Case blocks, and assrt_test interprets them by calling the Playwright MCP tool schema in agent.ts lines 14 to 196. No .spec.ts ever lands on disk, and you do not need a Node project to run the output.

Where exactly is the prompt that generates my tests, and can I read it?

Yes. In Assrt's repository the plan-generation prompt lives at /Users/matthewdi/assrt-mcp/src/mcp/server.ts lines 219 to 236, exported as PLAN_SYSTEM_PROMPT. It is 18 lines. It defines the output format (#Case N: name followed by steps), six CRITICAL rules (self-contained cases, specific selectors, observable verifications, 3 to 5 actions per case, no features behind auth unless visible, 5 to 8 cases maximum), and the constraint that the browser agent cannot inspect CSS or run arbitrary JavaScript. The built-in Playwright 1.56 Generator agent uses a longer prompt baked into the VS Code extension that is not straightforward to read. BrowserStack's AI test generator, Octomind, and similar closed-source products do not publish theirs. Fork visibility is one of the cleanest open-source tests you can apply to a generator.

Which files appear on my disk after I run assrt_plan once in April 2026?

Four, and none are TypeScript. The scenario plan at /tmp/assrt/scenario.md holds the generated #Case blocks in plain Markdown. The metadata at /tmp/assrt/scenario.json holds {id, name, url, updatedAt}. The result file at /tmp/assrt/results/latest.json holds the most recent pass/fail report. The browser profile at ~/.assrt/browser-profile persists cookies and auth between runs so the generator does not start from a blank session every time. You can verify every path by running ls /tmp/assrt and ls ~/.assrt after the first command. The paths are declared in /Users/matthewdi/assrt-mcp/src/core/scenario-files.ts lines 16 to 20.

Can I edit the generated Markdown plan and have the generator pick up the change?

Yes, and the sync is live. scenario-files.ts line 90 starts an fs.watch on the plan file. When the file changes, a 2-second debounce (the check at line 97) fires and syncUploads the updated plan back to the scenario store. Open /tmp/assrt/scenario.md in any editor, delete a case, add a case, save the file, and the next run of assrt_test executes your edits. You do not need a vendor dashboard to hand-tune the output. A code-emitting generator has no equivalent: once the .spec.ts is written, the generator is done; your edits do not flow back into any shared state because there is no shared state.

Do I need a playwright.config.ts or a tests/ folder to run the output of a plan-executor generator?

No. Assrt's runtime path is: npx @assrt-ai/assrt setup, then assrt_test with a URL and a plan string. The MCP server spawns @playwright/mcp via stdio (browser.ts line 284 resolves the cli.js path), points it at your target URL, and executes the #Case blocks by calling Playwright MCP tools. You do not install Playwright as a project dependency. You do not maintain playwright.config.ts. You do not author a tests/ folder. The only files the generator needs on disk are the Markdown plan and a browser profile; the only file it needs on the test target is whatever the app under test ships today.

What breaks when a code-emitting generator's selectors drift, and does a plan-executor break the same way?

The two modes fail very differently. A .spec.ts file locks a selector like page.locator('.btn-primary.submit') at generation time. If the DOM changes that class the next day, every subsequent run fails with a TimeoutError until someone edits the code and re-commits. A plan-executor re-reads the DOM on every run. Assrt's agent calls snapshot before every click, gets a fresh accessibility tree, and picks a ref like ref=e5 from the current state; those refs are regenerated per run. When the DOM shifts, the agent adapts on the next call, not after a human edits a checked-in file. This is why the agent.ts SYSTEM_PROMPT line 207 insists: 'ALWAYS call snapshot FIRST.'

What are the 14 tools a plan executor's agent is actually allowed to call?

From /Users/matthewdi/assrt-mcp/src/core/agent.ts lines 14 to 196: navigate, snapshot, click, type_text, select_option, scroll, press_key, wait, screenshot, evaluate, create_temp_email, wait_for_verification_code, assert, complete_scenario, plus http_request and wait_for_stable in recent builds. The agent is physically unable to emit Playwright code, because the tool surface is a fixed JSON schema. This removes the hallucinated-API failure mode entirely. A code-emitter is free to invent page.fillFormFields or page.clickByLabel, and the generated .spec.ts will look plausible until you run it. A plan executor cannot, because the MCP server rejects any call not in the schema.

How does the generator decide what cases to write?

Assrt's generator takes three screenshots at different scroll positions (server.ts lines 794 to 805), captures the accessibility-tree text at each, concatenates and slices to 8000 characters (line 809), and sends the three screenshots plus the truncated text to claude-haiku-4-5-20251001 with max_tokens 4096 (line 830) and PLAN_SYSTEM_PROMPT as the system message. The model returns a plan containing 5 to 8 #Case blocks. Nothing about the input payload is hidden. If you want to see exactly what Haiku saw, set the environment variable ANTHROPIC_LOG=debug and run it; the full request body prints to stderr.

Is there lock-in risk when I switch from one open-source Playwright test generator to another?

Only for code emitters. If ZeroStep adds proprietary ai() helpers, or Codegen's output starts using a library-specific import, migrating means rewriting tests. A plan-executor emits plain Markdown with #Case blocks; a reader who has never seen Assrt can still read 'Click the Login button. Type test@example.com into the email field. Verify the dashboard appears.' and execute it. Port the Markdown to another MCP-aware agent runner and it keeps working. This is the 'your tests are yours to keep' guarantee in concrete file form.

If I already have a Playwright suite, should I replace it with a plan executor?

No. The two architectures coexist. Keep your existing .spec.ts suite for the tests that benefit from full programmatic control (deterministic data setup, millisecond-level timing, fixture composition). Add a plan executor for the flows a product manager would rather describe in English than author in code, and for the edge cases the code-emitting generators keep failing to author correctly. Assrt's agent connects to a real Chrome instance via Playwright's --extension flag (agent.ts line 338), so the browser runtime is the same Playwright you already use. You are not adopting a parallel testing stack; you are writing Markdown that drives the one you have.

The fork nobody draws

Same input, two very different outputs

What ends up on your disk

/tmp/assrt/scenario.md

/tmp/assrt/scenario.json

/tmp/assrt/results/latest.json

~/.assrt/browser-profile/

~/.assrt/extension-token

~/.assrt/playwright-output/

The seven steps the plan-executor generator runs

Launch a local Playwright MCP over stdio

Navigate to the target URL

Triple-sample the page

Concatenate and slice the text

Send to claude-haiku-4-5-20251001 with 4096 max tokens

Write the plan to /tmp/assrt/scenario.md

Close the browser

The one line that makes this architecture possible

What a run looks like

The generator's output, side by side

What the plan file actually looks like

The count that defines the architecture

Code emitter vs plan executor, feature by feature

The 18-line prompt that generates every plan

Why edit a plan at runtime

One concrete workflow I use

Which shape should you pick?

The honest limits of the plan-executor shape

Curious whether your suite should be Markdown or .spec.ts?

Frequently asked questions

Comments (••)

Comments ()