AI in testing automation is two loops running at once, not one clever model

Every category page on the phrase treats AI in testing automation as a bucket of three things: self-healing selectors, visual validation, and autonomous test generation. That framing misses the actual engineering problem. The hard part is not whether an LLM can click a button. The hard part is running a second model in parallel with the first, to turn every URL the driver visits into a new, gradable, ready-to-execute #Case draft, without blocking the driver or blowing the quota.

Assrt does this with three constants, one extra system prompt, and a queue. Every number and every line of code below is in the open source agent.ts file, and you can change the constants, replace the prompt, or turn the whole second loop off with a single flag. That is what AI in testing automation looks like when you can read the code.

Matthew Diakonov, Written with AI

Published April 21, 202613 min read

4.8from 189 engineers

MAX_CONCURRENT_DISCOVERIES = 3, MAX_DISCOVERED_PAGES = 20, both in agent.ts

Dedicated DISCOVERY_SYSTEM_PROMPT separate from the main driver prompt

1,087 lines of open TypeScript, Haiku 4.5 or Gemini 3.1 Pro with your own key

AI in testing automation

two loops, one browser, shared state

Loop A: the driver executes your #Case

Loop B: the discovery worker reads every new URL

3 concurrent LLM calls, capped at 20 pages per run

A separate DISCOVERY_SYSTEM_PROMPT with tight scope

Drafts stream back live as #Case Markdown

0:00 / 0:05

2 loops

“The interesting half of AI in testing automation is not the model driving the browser. It is the worker sitting next to it, reading every new page the driver touches and turning each one into 1 or 2 new test cases, bounded to 3 in flight at a time.”

agent.ts lines 269-271 and 585-618

What the category pages get wrong

Read the top five results for this phrase today and the pattern is identical. mabl, Browserstack, testRigor, Functionize, TestGrid, all describe AI in testing automation as three features: self-healing selectors that adjust when class names change, visual validation that diffs screenshots, and autonomous test generation that emits a plan from a URL. Every one of those is a feature; none of them is an architecture. None explains how to actually run AI-driven tests at scale without either blocking the main run on generation, or flooding your LLM quota with uncoordinated calls.

Assrt answers the architecture question with three constants. Read them. Change them. Ship your suite with them.

0Concurrent discovery LLM calls (MAX_CONCURRENT_DISCOVERIES)

0Max pages discovered per run (MAX_DISCOVERED_PAGES)

0max_tokens per discovery LLM call

0Accessibility-tree characters fed to each discovery call

The two-loop architecture in one diagram

The browser in the middle is a single Chromium tab launched by @playwright/mcp. The driver and the discovery worker both read from it, but only the driver writes. Every discovery call is gated by browserBusy, so a foreground click always wins the race. The outputs are different shapes for different consumers: the driver produces tool calls and assertions; the worker produces streaming #Case Markdown.

Two models, one shared browser, different output shapes

The anchor fact: the separate discovery prompt (and its caps)

The single most important piece of code in the discovery worker is its system prompt. It is deliberately the opposite shape of the driver's. It forbids login cases. It caps at 1 to 2 cases per page. It caps each case at 3 to 4 actions. It demands references to actual visible elements. If you hand a long system prompt or a loose output contract to a worker that is going to fire on 20 pages per run, you get 20 runaway LLM calls. Tight scope is the cost of safe concurrency.

agent.ts: DISCOVERY_SYSTEM_PROMPT + the three caps

How the worker gets fed

Every driver navigate tool call funnels into queueDiscoverPage, which dedupes on origin+path, checks the SKIP_URL_PATTERNS regex list, respects the MAX_DISCOVERED_PAGES cap, and pushes the URL onto a pending queue. flushDiscovery is called both at initial navigation and after every scenario step, but it returns early if browserBusy is true or activeDiscoveries has already hit the cap. That is the whole concurrency contract.

agent.ts: queueDiscoverPage and flushDiscovery

One discovery LLM call, start to finish

The call itself is a streaming Anthropic message. max_tokens is 1024, which is why discovery is cheap even when it fires 20 times. The accessibility tree is sliced to 4000 characters before it goes into the prompt, which matters because an accessibility tree on a complex dashboard can easily exceed 30 kilobytes. Every partial token is re-emitted as a discovered_cases_chunk SSE event so the UI can show the draft typing itself in real time.

agent.ts: generateDiscoveryCases

Architecture at a glance

Two loops, one browser

The driver owns the Playwright MCP tab and drives it through your #Case. The discovery worker reads the same accessibility tree but only when the driver is idle, gated by browserBusy at agent.ts lines 754 and 1042.

Separate system prompts

The driver uses SYSTEM_PROMPT (agent.ts lines 198-254). The discovery worker uses DISCOVERY_SYSTEM_PROMPT (lines 256-267). Different output shapes, different behaviors, different max_tokens.

Concurrency caps

MAX_CONCURRENT_DISCOVERIES = 3. MAX_DISCOVERED_PAGES = 20. Nothing past the caps makes it to the LLM. Your quota is protected by constants you can read and change.

SKIP_URL_PATTERNS

A regex list at agent.ts line 271 that short-circuits /logout, /api/, javascript:, about:blank, data:, and chrome: URLs before they ever hit the worker. Stops you from wasting a discovery call on a 204 logout redirect.

Streaming feedback

Every partial token the worker emits goes out as a discovered_cases_chunk event. The UI shows a live typing effect per discovered page. Completed drafts fire discovered_cases_complete with the final Markdown.

A real run, second by second

To make the interleaving concrete, here is a 6-frame walkthrough of a typical Assrt run. One driver Case. Four discovered pages. One queued discovery that has to wait for a slot to open up.

One scenario, two loops, no blocks

01 / 06

t+0.0s: you call assrt_test with one #Case

The driver loop starts executing. browserBusy flips between true and false as each tool call runs. queueDiscoverPage is called on the initial navigate.

What you send, what comes back

You hand assrt_test a single #Case and it runs. While it runs, the discovery worker writes several more, grounded in the exact pages the driver actually visited. The driver's output is a test report; the worker's output is a backlog of new tests to try next.

what you send in plan

what the discovery worker streams back

Watch it interleave in a terminal

Run the CLI with --json and you can see the page_discovered, discovered_cases_chunk, and discovered_cases_complete events land in between the agent's own tool-call lines. TheactiveDiscoveries counter climbs to 3 and then sits there until one of the discoveries finishes, which is when the next URL gets dequeued.

assrt run (annotated)

Foreground vs background, side by side

Feature	Manual 'generate plan' step	Assrt discovery worker
When it runs	Manually, before a test, as a separate 'generate plan' step that blocks you	Concurrently, while your current #Case is still executing, at every navigate
Which prompt drives it	Same big agent prompt; model improvises formatting and scope	Dedicated DISCOVERY_SYSTEM_PROMPT with hard caps: 1-2 cases, 3-4 actions, no login, no CSS
How many pages it covers	Whatever you manually point it at; adding pages is a new prompt every time	Every URL the driver visited, up to MAX_DISCOVERED_PAGES = 20, deduped by origin+path
Parallelism bound	One at a time, synchronous, or no bound and the quota explodes	MAX_CONCURRENT_DISCOVERIES = 3 in flight at once, fourth is queued
Collision with running test	Blocks the browser or produces drafts the driver cannot actually execute	flushDiscovery yields on browserBusy, so every driver click beats every discovery call
Grounding source	A human description or a URL + public crawl; no live accessibility tree	Playwright MCP accessibility tree from the same browser tab, truncated to 4000 chars
Output shape	Natural language plan; reformatting needed before any agent can run it	#Case Markdown blocks, ready to paste straight into assrt_test as plan
Cost per run at 10 pages	One big prompt that burns thousands of tokens; or a SaaS that hides the number	~1024 max_tokens per discovered page on Haiku 4.5; under two dollars for a 20-page run

What the driver hands the browser

For completeness, here is the full tool surface the driver uses on every run. The discovery worker sees none of these; it only reads the accessibility tree. The contract between the two loops is the browser state, nothing more.

navigatesnapshotclicktype_textselect_optionscrollpress_keywaitscreenshotevaluatecreate_temp_emailwait_for_verification_codecheck_email_inboxassertcomplete_scenariosuggest_improvementhttp_requestwait_for_stable

Playwright MCP
browser

@playwright/mcp

Anthropic SDK

Google GenAI

temp-mail.io

#Case Markdown

Model Context Protocol

Chromium

Server-Sent Events

Getting started in four steps

1
Install the MCP server
Add assrt-mcp to your Claude Code or Cursor config.
2
Write one #Case
Describe the most important happy path in English.
3
Run assrt_test
Driver executes. Discovery worker spawns in parallel.
4
Save the drafts
Review streamed #Case suggestions and keep the ones you want.

From zero to a running AI test suite with background discovery is about three minutes. No proprietary DSL. No seat licensing. Your plans stay as #Case Markdown on your disk at /tmp/assrt/scenario.md, and the drafts the worker emits are the same format.

The part nobody else will show you

The entire discovery worker is 34 lines of scheduling code plus a 33-line LLM call. The three caps are three integers. The prompt is twelve lines. The fact that this whole concept fits in under 80 lines of TypeScript is why open source matters for AI in testing automation: the category pages want to sell you a platform; the architecture is a paragraph.

agent.ts lines 256 through 618. MIT license. No vendor lock-in. Your tests are Markdown files you can commit alongside your app.

Want to see the discovery worker fire on your own app?

30 minutes, your staging URL, one #Case. We'll show you the backlog it drafts for you in the background.

Frequently asked questions

What does 'AI in testing automation' actually mean at the code level, and how is Assrt different from the category pages on mabl, testRigor, Functionize, or TestGrid?

AI in testing automation at the Assrt code level is two concurrent loops sharing one browser. Loop A is the foreground agent running your #Case scenarios using tools defined at agent.ts lines 16 through 196: navigate, snapshot, click, type_text, evaluate, http_request, wait_for_stable, and so on. Loop B is the discovery worker, implemented in queueDiscoverPage at agent.ts lines 555 through 562 and generateDiscoveryCases at lines 585 through 618. Every time Loop A navigates to a new URL, Loop B pushes that URL onto a pending queue, opens up to MAX_CONCURRENT_DISCOVERIES = 3 parallel LLM calls with a dedicated DISCOVERY_SYSTEM_PROMPT, and streams 1 to 2 new #Case drafts per page as discovered_cases_chunk server-sent events. The whole process caps at MAX_DISCOVERED_PAGES = 20 per run. Category pages on mabl, testRigor, Functionize, and TestGrid describe generic categories like self-healing tests, visual validation, and autonomous generation. None of them show the two-loop architecture, the bounded concurrency, or the separate system prompt that makes this safe to run while your main test is still in flight.

Why does the discovery worker use a separate system prompt instead of reusing the main one?

Because the two loops need different output shapes. The main SYSTEM_PROMPT at agent.ts lines 198 through 254 pushes the driver to call tools, assert, and close out scenarios. If you handed that prompt to a freshly spotted URL with no context, the model would immediately try to click things on a page it is not yet attached to. The DISCOVERY_SYSTEM_PROMPT at agent.ts lines 256 through 267 is the opposite shape: it forbids login and signup cases, forbids CSS and responsive and performance cases, caps output at 1 to 2 cases, caps each case at 3 to 4 actions, and demands references to actual visible buttons and inputs. The worker reads the accessibility tree (truncated to 4000 characters at agent.ts line 586), produces clean #Case text, and never touches the live browser. Separating the prompts is the reason the two loops don't step on each other.

What stops 20 concurrent discoveries from melting the LLM quota or blocking the main test?

Three hard limits, all visible in agent.ts. First, MAX_CONCURRENT_DISCOVERIES = 3 at line 269 caps how many discovery LLM calls run at once; the fourth is queued. Second, MAX_DISCOVERED_PAGES = 20 at line 270 caps the total number of pages ever discovered in a single run; anything past that gets dropped on the floor by queueDiscoverPage. Third, flushDiscovery at lines 564 through 583 checks browserBusy before grabbing the snapshot, so the worker never fights the main agent for the single browser tab. The main loop sets browserBusy = true before every tool call and false after (lines 754 and 1042), which makes flushDiscovery yield while any foreground action is in flight. That is why you can run this on a long multi-step scenario without the discovery queue either saturating your API limits or interleaving mid-click.

If the discovery worker is drafting cases while I run my test, what happens to those drafts?

They stream back to the client as discovered_cases_chunk events with incremental text, then a single discovered_cases_complete event when the generation finishes. The driver UI receives these over the same SSE channel as the main test output. Independently, each discovered page also fires a page_discovered event that carries a URL, a (mostly empty) title, and a base64 JPEG screenshot, all at agent.ts line 574. In the web app, the side panel shows a list of pages the agent has seen, each with a preview image and a streaming block of generated #Case text. You can save those drafts, edit them, re-run them as fresh scenarios. The test you actually asked for is never blocked on any of this: the discoveries run as background tasks with activeDiscoveries tracked separately from browserBusy.

Can I run two LLMs at once, for example Haiku for the driver and Gemini for the discovery?

Not in the current default, but the architecture supports it with a trivial change because the tool schemas are translated once at agent.ts lines 277 through 301. Both providers go through the same TOOLS list, and GEMINI_FUNCTION_DECLARATIONS auto-derives Gemini function declarations from the Anthropic input_schema. The current constructor stores a single provider (lines 354 through 367), so both loops use the same one, but splitting them is a 10-line change: instantiate a second Anthropic or GoogleGenAI client solely for the DISCOVERY_SYSTEM_PROMPT path, and route generateDiscoveryCases through it. The interesting implication is that the tool schema is not the coupling point. You can run a slow deep model on discovery (generates better test ideas) and a fast cheap model on driving (cheaper per click), because the contract between them is the accessibility tree and nothing else.

How is this different from just asking GPT to 'write a test plan for my app'?

GPT writing a plan from a prompt is guessing. The Assrt discovery worker is grounded in the actual accessibility tree of the actual page, captured live from the actual browser that the main agent is driving. At agent.ts line 571 the worker calls this.browser.snapshot() to pull the Playwright MCP accessibility tree, then feeds that text plus a base64 JPEG into the LLM alongside the DISCOVERY_SYSTEM_PROMPT. The cases it emits reference elements that are actually on the page because they came from a real DOM read, not from your description of the page. The DISCOVERY_SYSTEM_PROMPT explicitly requires 'Reference ACTUAL buttons/links/inputs visible on the page' (line 266). This is the difference between a generated test plan that reads like marketing and a generated test case that a downstream agent can actually execute in the same browser session.

What model is the driver running by default, and what keeps it cheap enough to pair with discovery?

claude-haiku-4-5-20251001, set at agent.ts line 9. Haiku 4.5 is the right pick for a tool-use loop that is heavy on fast decisions and light on long-form generation. The driver's max_tokens is 4096 per call (line 715), which is plenty for a tool-use response and nowhere near the reasoning budget of a coding model. The discovery worker uses max_tokens = 1024 (line 608) because its output is always a short block of #Case Markdown. With both loops on Haiku 4.5, a typical 10-step scenario that discovers 5 new pages runs under two dollars of LLM spend at list price. The same scenario on a closed AI QA platform at $7,500 a month of seat licensing is the comparison the SERP does not make.

Can I read the discovery worker source and modify its behavior?

Yes. The whole agent is a single TypeScript file, agent.ts in the assrt-mcp repo, 1,087 lines long. The discovery worker specifically lives between lines 256 and 618. Turning off discovery entirely is a one-liner: either set MAX_DISCOVERED_PAGES = 0, or comment out the queueDiscoverPage call at line 457 (inside run()) and line 775 (inside the navigate tool handler). Tightening it is just as easy: lower MAX_CONCURRENT_DISCOVERIES to 1 to serialize, or widen the SKIP_URL_PATTERNS regex list at line 271 to ignore more paths. The point of shipping this as open source instead of a SaaS is that you can read every knob, move every bound, and keep the tests that come out of it as your own Markdown files at /tmp/assrt/scenario.md.