Itemized, not bundled

QA automation services, itemized: 18 browser primitives you are paying a retainer for

Every services firm on the first SERP for this phrase sells you a team that composes operations into test scenarios. Strip the retainer away and what you are actually buying is a fixed set of browser primitives: navigate, click, type, read, assert, wait, verify email. Assrt ships all 18 as MCP tool handlers in one TypeScript file, including four email and HTTP primitives that let the test agent complete signup flows without a human next to a mailbox.

Matthew Diakonov, Written with AI

Published April 21, 202611 min read

The 18 primitives behind every QA automation service

agent.ts:16-196 — one flat array, no DSL, no dashboard

10 interaction primitives (navigate, click, type...)

4 email / OTP / HTTP primitives (signup tests finish themselves)

4 report primitives (assert, complete_scenario, stable wait)

Default model: claude-haiku-4-5-20251001 (line 9)

Runner: @playwright/mcp driving real Chromium

0:00 / 0:05

4.9from Assrt MCP users

18 tool handlers in agent.ts:16-196 (one flat array)

4 OTP/email primitives replace a human watching an inbox

wait_for_stable replaces retry-with-backoff flake fighting

$0 platform fee; you pay Anthropic per test run

What the top results for this keyword are actually selling

Search "qa automation services" in 2026 and the first page is a uniform lineup: services firms (QASource, KiwiQA, Abstracta, Aegis, QAMentor), closed AI QA platforms (Rainforest, QA Wolf, Virtuoso, Autonoma, BotGauge), and analyst pages on pricing. The published rates spread from $15/hr for a dedicated tester to $312K/year for in-house teams, with closed AI suites slotting in at $12K to $24K/year plus 20 to 50 percent maintenance uplift when tests break. Every one of them bundles. None of them itemize. None publish the primitive set their engineers or their LLM compose into your test suite.

QASource automation retainerKiwiQA testing servicesAbstracta QA automationAegis SoftTech $15/hr testerQA Wolf AI testingRainforest QA platformAutonoma AI QAVirtuoso QABotGauge automated QAQAMentor packages

Top SERP results for "qa automation services", April 2026. None of them publish their primitive set.

The count, four numbers

Every figure below comes from grep and wc against /Users/matthewdi/assrt-mcp/src/core/agent.ts. Not a benchmark. Not a marketing claim. Clone the repo and you get the same numbers.

Browser primitives in TOOLS[]

agent.ts lines 16-196, Anthropic.Tool[] shape.

OTP / HTTP / inbox primitives

Lines 115, 120, 128, 172. Signup flows finish themselves.

Lines in agent.ts total

wc -l on the file. Everything the retainer sells, readable.

Platform fee per month

MIT license. You pay Anthropic per test run, not Assrt.

“The whole primitive surface of a QA automation service, exposed as a flat Anthropic.Tool[] the model picks among by name. No DSL, no YAML, no plugin registry. Grep agent.ts for `{ name:` and you will see exactly 18 entries between line 16 and line 196.”

assrt-mcp/src/core/agent.ts:16-196

How one #Case reaches one of the 18 primitives

Three sources feed in: the plan you (or assrt_plan) wrote, the current page snapshot from Playwright MCP, and Claude Haiku reasoning over both. The 18-entry TOOLS array is the hub. On the right, the primitives fan out into real browser operations and the structured test report.

Plan + snapshot + Haiku → TOOLS[18] → Chromium + report

The complete 18-primitive inventory

Every name below is the exact string the model emits in its tool_use block. The line number is the start of the entry in the TOOLS array. You can open agent.ts and jump to any of them directly.

#	Tool name	agent.ts line	What it does
01	navigate	18	URL transition
02	snapshot	27	reads a11y tree, returns refs
03	click	32	fires click via Playwright ref
04	type_text	44	clears + types into an input
05	select_option	57	selects values in a dropdown
06	scroll	69	scrolls by pixel offset
07	press_key	81	Enter, Tab, Escape, arrows
08	wait	90	wait for text, or fixed ms
09	screenshot	101	captures a PNG of current page
10	evaluate	106	runs JS in the page context
11	create_temp_email	115	disposable inbox for signup
12	wait_for_verification_code	120	polls for OTP up to 60s
13	check_email_inbox	128	reads disposable inbox state
14	assert	133	emits a test assertion
15	complete_scenario	146	marks the #Case finished
16	suggest_improvement	158	flags UX/bug issues inline
17	http_request	172	out-of-band API verify (Telegram, Slack)
18	wait_for_stable	186	waits until DOM mutations settle

Tinted rows (lines 11-14 of this table) are the four primitives that replace the human-in-the-loop for signup, OTP, and external verification flows.

The flat array, annotated

This is the anchor fact of the page. Not a diagram, not a marketing claim, the actual shape of the data the test agent reasons over. Eighteen entries in a flat array. The model reads this schema at inference time and picks one by name.

assrt-mcp/src/core/agent.ts

The four primitives that kill human-in-the-loop signup testing

A closed AI QA platform usually requires you to pre-provision test accounts because its primitive set stops at the form submit. A services firm writes a one-off mail-reading helper for each client and bills its maintenance separately. Assrt makes the inbox part of the primitive set. The system prompt below is a literal excerpt that teaches the model to chain them.

assrt-mcp/src/core/agent.ts (system prompt, lines 228-236)

From a Markdown bullet to a green assertion

The sequence a services firm would hide behind an engineer, a dashboard, and a screenshot, here in five steps.

User action arrives as a #Case line

The plan at /tmp/assrt/scenario.md is Markdown you can edit in VS Code. Each #Case is 3 to 5 bullet points in English. No YAML, no JSON, no proprietary DSL. The runner reads this file on every invocation (scenario-files.ts:49), so edits take effect on the next run.

Claude Haiku picks one of the 18 TOOLS

The agent loop at agent.ts:433 hands the #Case text plus the current page snapshot to Claude Haiku (default model at line 9). Haiku returns a tool_use block with one of the 18 names. The dispatcher switches on the name and calls the matching handler. Under the hood, every matched handler translates to a Playwright MCP call over stdio.

Playwright MCP executes against real Chromium

browser.ts:284 resolves @playwright/mcp/cli.js, spawns it with --viewport-size 1600x900 and --output-mode file, and connects over stdio. A click in the agent is a browser_click Playwright MCP call; a snapshot is the native a11y tree Playwright already produces. No wrapper layer you cannot inspect.

Assertions and improvements are emitted as tool calls too

assert (agent.ts:133) and suggest_improvement (agent.ts:158) are not post-hoc analyses; they are tool calls the model issues in the same loop. This keeps the structured report shape: every assertion has an explicit description + passed + evidence triplet, because that is the tool schema.

complete_scenario closes the #Case

agent.ts:146 is the only way a #Case ends. The model must emit complete_scenario with summary + passed. That boolean is what you see as PASS / FAIL in the final report. The sharedBrowser at server.ts:31 is kept open, so Case 2 inherits Case 1's cookies and URL without a re-login step.

What a signup test looks like when it runs

One #Case. Eleven tool calls. No human next to the inbox. The run below is the exact shape you get from the CLI when the agent picks the four email primitives in sequence and wait_for_stable closes the async gap.

assrt run with 18 primitives

The 18, grouped by role

The inventory is flat, but the roles are not. Ten primitives drive the browser. Four remove humans from verification loops. Four shape the report.

The 10 interaction primitives

navigate, snapshot, click, type_text, select_option, scroll, press_key, wait, screenshot, evaluate. These are the observable-action and page-reading tools every QA runner exposes. In agent.ts they live at lines 18, 27, 32, 44, 57, 69, 81, 90, 101, 106. A Cypress or Playwright test from a services firm ultimately compiles to this same set of operations.

The 4 email / OTP / external-verify primitives

create_temp_email (115), wait_for_verification_code (120), check_email_inbox (128), http_request (172). These remove the human-in-the-loop from any flow that needs an inbox, a magic link, or a webhook verification. Closed AI QA platforms and services firms charge extra for this or skip it entirely.

The 4 report / introspection primitives

assert, complete_scenario, suggest_improvement, wait_for_stable. These shape the final report: every PASS/FAIL is an explicit assert call, every #Case ends with complete_scenario, every UX issue becomes an improvement entry. wait_for_stable is the self-healing wait that replaces the retry-three-times flake-fighting pattern.

Default model: claude-haiku-4-5-20251001

DEFAULT_ANTHROPIC_MODEL at agent.ts:9. Each tool call is a small bounded reasoning task and Haiku is the cheapest capable option. Override with --model or the model MCP parameter. Set ANTHROPIC_BASE_URL to route to a local LLM for air-gapped deploys; the 18-tool surface is identical.

The cost per run, compared

Published figures from 2026 QA automation market reports, rounded and converted per-run where possible. The Assrt cost is Anthropic Haiku token spend at published list prices for a 5-case suite averaging 12k input and 2k output tokens, with no platform fee layered on top.

$0K/yrQA agency avg retainer

$0K/yrIn-house QA team avg

$0K/yrClosed AI QA platform floor

$0 feeAssrt platform fee

Services retainer vs. 18 primitives on disk

Every row names a thing a QA automation services firm will list in its SOW, and the corresponding Assrt implementation by file and line.

Feature	Typical services retainer	Assrt
What the 'service' physically is	A team of engineers composing operations in a proprietary runner	18 tool handlers in agent.ts:16-196 that LLMs compose for you
How the primitives are defined	Closed behind a runner you cannot inspect, or a YAML DSL that compiles at runtime	Typed Anthropic.Tool[] with input_schema per handler, readable on disk
How a signup flow with OTP gets tested	Either skipped (vendor asks for pre-provisioned test account) or billed as custom work	create_temp_email + wait_for_verification_code + check_email_inbox, lines 115-128
How async / streaming UI waits are handled	Fixed timeouts, retries, and per-test tuning by an engineer	wait_for_stable polls DOM mutations until quiescent, agent.ts:186
How out-of-band API verification works	Separate integration test harness, billed as additional scope	http_request at agent.ts:172 makes the test agent itself verify the webhook
Where your scenarios physically live	Vendor dashboard (TestRail, Jira, custom) behind an account boundary	/tmp/assrt/scenario.md — plain Markdown, grep-able, diff-able
What the underlying browser driver is	Proprietary runner (sometimes wrapping Selenium or Playwright, sometimes not)	@playwright/mcp driving real Chromium, spawned at browser.ts:284
How you extend the primitive set	File a ticket with the vendor; plugin review; quarterly release	Append one entry to TOOLS, add a handler in the dispatcher
Typical annual cost	$180K agency / $312K in-house / $12K–24K closed AI + 20–50% maintenance uplift	$0 platform fee + Anthropic tokens (typically cents per run at Haiku rates)
Exit cost if you switch tools	Re-author every scenario into the new vendor's DSL	Copy /tmp/assrt/scenario.md into the next runner — it is just Markdown

When 18 primitives are not enough and you should still hire a services firm

Mainframe and green-screen terminals outside Chromium's reach
Compliance regimes that require a human test authoring trail (FDA 21 CFR Part 11, DO-178C)
Air-gapped environments where neither Anthropic nor Playwright MCP can run
A web app with weekly CI runs and under 200 #Case scenarios
A team with a coding agent already in the IDE
A signup flow with OTP that a vendor keeps refusing to test

Checked items: legitimate services-firm territory. Unchecked: territory the 18 primitives already cover.

Want to see the 18 primitives run on your app?

20 minutes on a call, we point Assrt at your staging URL and walk through the #Case plan, the tool-call trace, and the video player together.

Frequently asked questions

If I hire a QA automation services firm, what am I actually paying for, mechanically?

A human who composes browser primitives into scenarios. The primitives are the same on every retainer: navigate, click, type, select, wait, snapshot/read, assert, screenshot. Different DSLs (Cypress commands, Playwright Locator calls, Selenium WebDriver, proprietary YAML) dress them up differently, but the underlying unit of work is identical. Assrt exposes its set as 18 MCP tool handlers in /Users/matthewdi/assrt-mcp/src/core/agent.ts between lines 16 and 196. They are: navigate, snapshot, click, type_text, select_option, scroll, press_key, wait, screenshot, evaluate, create_temp_email, wait_for_verification_code, check_email_inbox, assert, complete_scenario, suggest_improvement, http_request, wait_for_stable. When a services firm invoices you for a QA automation engineer, the deliverable is a composition of those 18 operations, padded with a dashboard and a markup.

What makes the email and OTP primitives in agent.ts actually different from a services retainer?

create_temp_email (line 115), wait_for_verification_code (line 120), check_email_inbox (line 128), and http_request (line 172) let the test agent itself complete a signup flow end to end. When the system prompt at agent.ts:228-236 detects a signup form, it tells the model to (1) create a disposable email, (2) submit the form, (3) poll the disposable inbox for the OTP, (4) paste the digits back into the form. No QA engineer is sitting by a mailbox. On closed AI QA platforms this step is usually skipped (the vendor asks you for a pre-provisioned test account) or priced as a separate add-on. On a services retainer it is a human cost. In agent.ts it is four tools and about 150 lines of glue in /Users/matthewdi/assrt-mcp/src/core/email.ts.

What does the 18-primitive design buy me versus a YAML-style recorder?

Two things. First, every primitive lives in a readable TypeScript array (the TOOLS constant at agent.ts:16-196), so when a test misbehaves you can grep the handler name and see exactly what the agent issued to the browser. Second, the primitives map directly onto Playwright MCP's own tool names. The browser manager at browser.ts:284 resolves @playwright/mcp/cli.js and spawns it over stdio; click becomes a Playwright MCP click, snapshot becomes the accessibility-tree snapshot Playwright already exposes. There is no transpilation step from 'Assrt DSL' into browser operations, because there is no Assrt DSL. A vendor YAML has to be compiled by the vendor at runtime, which is the layer that breaks when your app changes.

How does wait_for_stable (agent.ts:186) replace the 'flaky test retry' a services firm sells as a premium add-on?

wait_for_stable polls the DOM and resolves only when no mutations have occurred for a configurable stable window (default 2 seconds, max 30 seconds of wall-clock wait). The system prompt at agent.ts:249-254 instructs the model to call wait_for_stable after any async action: submitting forms, sending chat messages, triggering search. In a consultancy workflow, the corresponding solution is 'retry three times with exponential backoff' plus a human watching the flake rate. wait_for_stable replaces both by binding the wait to the actual signal (DOM quiescence) rather than a timer guess. This is why the same #Case suite can run against a slow staging server and a fast local dev server without per-environment tuning.

Does the 'services' framing even apply when I run Assrt against prod with --extension?

It applies even more. In extension mode (browser.ts:258, arg --extension) the runner attaches to your already-running Chrome via Chrome DevTools Protocol instead of launching a fresh headless browser. That means cookies, logged-in state, and the browser fingerprint your users actually have. A services firm would script around your auth and require test credentials; the sharedBrowser in extension mode uses your credentials because it is your browser. The handshake is gated on a one-time Playwright MCP extension token (saved to ~/.assrt/extension-token by browser.ts:231-241), so subsequent runs are silent. This covers the 'test on prod without creating a new account every time' case most retainer contracts refuse to touch because of compliance handwaving.

What is the fastest path from reading this page to running the 18 primitives on my own app?

Four shell lines. npx @assrt-ai/assrt setup registers the MCP server with Claude Code, Cursor, Zed, and Windsurf (sets up ~/.cursor/mcp.json, ~/.claude/mcp.json). Start your dev server. Ask your coding agent 'Use assrt_plan on http://localhost:3000' to auto-generate 5 to 8 #Case blocks into /tmp/assrt/scenario.md. Ask 'Use assrt_test' to execute them. Under the hood, assrt_test spawns @playwright/mcp, the agent loop picks tools from the 18-entry TOOLS array, Haiku reasons over each step, and you get pass/fail plus a video player on a local port. No account, no dashboard, no per-seat license.

How does the cost compare line by line to a 2026 QA automation retainer?

2026 retainer ranges, from published sources: QA agencies average around $180K/year, in-house teams $312K/year, closed AI tools $12K to $24K/year, with hidden test-maintenance fees typically 20 to 50 percent on top. Assrt charges zero platform fee. The variable cost is Anthropic tokens at the claude-haiku-4-5-20251001 rate (agent.ts:9 sets this as DEFAULT_ANTHROPIC_MODEL). A 5-case, 3-to-4-step-per-case suite uses roughly 8k to 15k input and 2k output tokens, which at Haiku pricing clears well under a penny per run. You can also set ANTHROPIC_BASE_URL and route to a local LLM, in which case the platform fee stays zero and the variable fee is your GPU time.

Can I export or re-use the 18 primitives outside of Assrt if I decide to switch?

Yes. Every primitive in TOOLS (agent.ts:16-196) is a thin adapter over a Playwright MCP tool. If you stand up your own @playwright/mcp session (npx @playwright/mcp@latest --extension or headless), you can issue the same tool names (browser_navigate, browser_click, browser_type, browser_snapshot) and drive the same Chromium. Your #Case plans in /tmp/assrt/scenario.md are Markdown; pasting them into a different LLM-driven runner works. There is no vendor format to convert. This is what 'zero vendor lock-in' means in code: the primitives are standard MCP calls, the plan format is Markdown, the runner is open source (MIT). The exit cost is a git add and a different npx binary.

What does assrt_diagnose add on top of the 18 primitives when something fails?

assrt_diagnose (in server.ts:866, separate from the primitives) takes the failing #Case text, the error evidence, and the URL, and runs one Haiku call with a diagnose system prompt (server.ts:240-268). The model emits a Root Cause, an Analysis, a Recommended Fix, and a Corrected Test Scenario in the same #Case format so you can paste the fix straight back into /tmp/assrt/scenario.md. This is the QA-engineer-triages-a-red-build role in a retainer. It costs fractions of a cent, closes in seconds, and you can re-run it with a different model via the model parameter. The whole diagnose handler is about 60 lines because, again, it is one model call, not a ReAct loop.

Why is every primitive in a single flat array rather than a tree of modules?

Because the LLM treats the tools as a flat namespace. Anthropic's tool-calling API takes an array of Tool definitions and chooses among them by name; Gemini's is the same shape. The TOOLS constant at agent.ts:16-196 is intentionally a flat array so the schema the model sees is identical to the schema the runner dispatches against. If you want to extend Assrt with your own primitive (say, a Stripe-test-card helper or a headless-browser-console-log reader) you append one entry to TOOLS and add a handler in the dispatch loop. That is the full extension protocol. No plugin registry, no config, no yaml.