QA automation services, itemized: 18 browser primitives you are paying a retainer for
Every services firm on the first SERP for this phrase sells you a team that composes operations into test scenarios. Strip the retainer away and what you are actually buying is a fixed set of browser primitives: navigate, click, type, read, assert, wait, verify email. Assrt ships all 18 as MCP tool handlers in one TypeScript file, including four email and HTTP primitives that let the test agent complete signup flows without a human next to a mailbox.
What the top results for this keyword are actually selling
Search "qa automation services" in 2026 and the first page is a uniform lineup: services firms (QASource, KiwiQA, Abstracta, Aegis, QAMentor), closed AI QA platforms (Rainforest, QA Wolf, Virtuoso, Autonoma, BotGauge), and analyst pages on pricing. The published rates spread from $15/hr for a dedicated tester to $312K/year for in-house teams, with closed AI suites slotting in at $12K to $24K/year plus 20 to 50 percent maintenance uplift when tests break. Every one of them bundles. None of them itemize. None publish the primitive set their engineers or their LLM compose into your test suite.
Top SERP results for "qa automation services", April 2026. None of them publish their primitive set.
The count, four numbers
Every figure below comes from grep and wc against /Users/matthewdi/assrt-mcp/src/core/agent.ts. Not a benchmark. Not a marketing claim. Clone the repo and you get the same numbers.
“The whole primitive surface of a QA automation service, exposed as a flat Anthropic.Tool[] the model picks among by name. No DSL, no YAML, no plugin registry. Grep agent.ts for `{ name:` and you will see exactly 18 entries between line 16 and line 196.”
assrt-mcp/src/core/agent.ts:16-196
How one #Case reaches one of the 18 primitives
Three sources feed in: the plan you (or assrt_plan) wrote, the current page snapshot from Playwright MCP, and Claude Haiku reasoning over both. The 18-entry TOOLS array is the hub. On the right, the primitives fan out into real browser operations and the structured test report.
Plan + snapshot + Haiku → TOOLS[18] → Chromium + report
The complete 18-primitive inventory
Every name below is the exact string the model emits in its tool_use block. The line number is the start of the entry in the TOOLS array. You can open agent.ts and jump to any of them directly.
| # | Tool name | agent.ts line | What it does |
|---|---|---|---|
| 01 | navigate | 18 | URL transition |
| 02 | snapshot | 27 | reads a11y tree, returns refs |
| 03 | click | 32 | fires click via Playwright ref |
| 04 | type_text | 44 | clears + types into an input |
| 05 | select_option | 57 | selects values in a dropdown |
| 06 | scroll | 69 | scrolls by pixel offset |
| 07 | press_key | 81 | Enter, Tab, Escape, arrows |
| 08 | wait | 90 | wait for text, or fixed ms |
| 09 | screenshot | 101 | captures a PNG of current page |
| 10 | evaluate | 106 | runs JS in the page context |
| 11 | create_temp_email | 115 | disposable inbox for signup |
| 12 | wait_for_verification_code | 120 | polls for OTP up to 60s |
| 13 | check_email_inbox | 128 | reads disposable inbox state |
| 14 | assert | 133 | emits a test assertion |
| 15 | complete_scenario | 146 | marks the #Case finished |
| 16 | suggest_improvement | 158 | flags UX/bug issues inline |
| 17 | http_request | 172 | out-of-band API verify (Telegram, Slack) |
| 18 | wait_for_stable | 186 | waits until DOM mutations settle |
Tinted rows (lines 11-14 of this table) are the four primitives that replace the human-in-the-loop for signup, OTP, and external verification flows.
The flat array, annotated
This is the anchor fact of the page. Not a diagram, not a marketing claim, the actual shape of the data the test agent reasons over. Eighteen entries in a flat array. The model reads this schema at inference time and picks one by name.
The four primitives that kill human-in-the-loop signup testing
A closed AI QA platform usually requires you to pre-provision test accounts because its primitive set stops at the form submit. A services firm writes a one-off mail-reading helper for each client and bills its maintenance separately. Assrt makes the inbox part of the primitive set. The system prompt below is a literal excerpt that teaches the model to chain them.
From a Markdown bullet to a green assertion
The sequence a services firm would hide behind an engineer, a dashboard, and a screenshot, here in five steps.
User action arrives as a #Case line
The plan at /tmp/assrt/scenario.md is Markdown you can edit in VS Code. Each #Case is 3 to 5 bullet points in English. No YAML, no JSON, no proprietary DSL. The runner reads this file on every invocation (scenario-files.ts:49), so edits take effect on the next run.
Claude Haiku picks one of the 18 TOOLS
The agent loop at agent.ts:433 hands the #Case text plus the current page snapshot to Claude Haiku (default model at line 9). Haiku returns a tool_use block with one of the 18 names. The dispatcher switches on the name and calls the matching handler. Under the hood, every matched handler translates to a Playwright MCP call over stdio.
Playwright MCP executes against real Chromium
browser.ts:284 resolves @playwright/mcp/cli.js, spawns it with --viewport-size 1600x900 and --output-mode file, and connects over stdio. A click in the agent is a browser_click Playwright MCP call; a snapshot is the native a11y tree Playwright already produces. No wrapper layer you cannot inspect.
Assertions and improvements are emitted as tool calls too
assert (agent.ts:133) and suggest_improvement (agent.ts:158) are not post-hoc analyses; they are tool calls the model issues in the same loop. This keeps the structured report shape: every assertion has an explicit description + passed + evidence triplet, because that is the tool schema.
complete_scenario closes the #Case
agent.ts:146 is the only way a #Case ends. The model must emit complete_scenario with summary + passed. That boolean is what you see as PASS / FAIL in the final report. The sharedBrowser at server.ts:31 is kept open, so Case 2 inherits Case 1's cookies and URL without a re-login step.
What a signup test looks like when it runs
One #Case. Eleven tool calls. No human next to the inbox. The run below is the exact shape you get from the CLI when the agent picks the four email primitives in sequence and wait_for_stable closes the async gap.
The 18, grouped by role
The inventory is flat, but the roles are not. Ten primitives drive the browser. Four remove humans from verification loops. Four shape the report.
The 10 interaction primitives
navigate, snapshot, click, type_text, select_option, scroll, press_key, wait, screenshot, evaluate. These are the observable-action and page-reading tools every QA runner exposes. In agent.ts they live at lines 18, 27, 32, 44, 57, 69, 81, 90, 101, 106. A Cypress or Playwright test from a services firm ultimately compiles to this same set of operations.
The 4 email / OTP / external-verify primitives
create_temp_email (115), wait_for_verification_code (120), check_email_inbox (128), http_request (172). These remove the human-in-the-loop from any flow that needs an inbox, a magic link, or a webhook verification. Closed AI QA platforms and services firms charge extra for this or skip it entirely.
The 4 report / introspection primitives
assert, complete_scenario, suggest_improvement, wait_for_stable. These shape the final report: every PASS/FAIL is an explicit assert call, every #Case ends with complete_scenario, every UX issue becomes an improvement entry. wait_for_stable is the self-healing wait that replaces the retry-three-times flake-fighting pattern.
Default model: claude-haiku-4-5-20251001
DEFAULT_ANTHROPIC_MODEL at agent.ts:9. Each tool call is a small bounded reasoning task and Haiku is the cheapest capable option. Override with --model or the model MCP parameter. Set ANTHROPIC_BASE_URL to route to a local LLM for air-gapped deploys; the 18-tool surface is identical.
The cost per run, compared
Published figures from 2026 QA automation market reports, rounded and converted per-run where possible. The Assrt cost is Anthropic Haiku token spend at published list prices for a 5-case suite averaging 12k input and 2k output tokens, with no platform fee layered on top.
Services retainer vs. 18 primitives on disk
Every row names a thing a QA automation services firm will list in its SOW, and the corresponding Assrt implementation by file and line.
| Feature | Typical services retainer | Assrt |
|---|---|---|
| What the 'service' physically is | A team of engineers composing operations in a proprietary runner | 18 tool handlers in agent.ts:16-196 that LLMs compose for you |
| How the primitives are defined | Closed behind a runner you cannot inspect, or a YAML DSL that compiles at runtime | Typed Anthropic.Tool[] with input_schema per handler, readable on disk |
| How a signup flow with OTP gets tested | Either skipped (vendor asks for pre-provisioned test account) or billed as custom work | create_temp_email + wait_for_verification_code + check_email_inbox, lines 115-128 |
| How async / streaming UI waits are handled | Fixed timeouts, retries, and per-test tuning by an engineer | wait_for_stable polls DOM mutations until quiescent, agent.ts:186 |
| How out-of-band API verification works | Separate integration test harness, billed as additional scope | http_request at agent.ts:172 makes the test agent itself verify the webhook |
| Where your scenarios physically live | Vendor dashboard (TestRail, Jira, custom) behind an account boundary | /tmp/assrt/scenario.md — plain Markdown, grep-able, diff-able |
| What the underlying browser driver is | Proprietary runner (sometimes wrapping Selenium or Playwright, sometimes not) | @playwright/mcp driving real Chromium, spawned at browser.ts:284 |
| How you extend the primitive set | File a ticket with the vendor; plugin review; quarterly release | Append one entry to TOOLS, add a handler in the dispatcher |
| Typical annual cost | $180K agency / $312K in-house / $12K–24K closed AI + 20–50% maintenance uplift | $0 platform fee + Anthropic tokens (typically cents per run at Haiku rates) |
| Exit cost if you switch tools | Re-author every scenario into the new vendor's DSL | Copy /tmp/assrt/scenario.md into the next runner — it is just Markdown |
When 18 primitives are not enough and you should still hire a services firm
- Mainframe and green-screen terminals outside Chromium's reach
- Compliance regimes that require a human test authoring trail (FDA 21 CFR Part 11, DO-178C)
- Air-gapped environments where neither Anthropic nor Playwright MCP can run
- A web app with weekly CI runs and under 200 #Case scenarios
- A team with a coding agent already in the IDE
- A signup flow with OTP that a vendor keeps refusing to test
Checked items: legitimate services-firm territory. Unchecked: territory the 18 primitives already cover.
Want to see the 18 primitives run on your app?
20 minutes on a call, we point Assrt at your staging URL and walk through the #Case plan, the tool-call trace, and the video player together.
Book a call →Frequently asked questions
If I hire a QA automation services firm, what am I actually paying for, mechanically?
A human who composes browser primitives into scenarios. The primitives are the same on every retainer: navigate, click, type, select, wait, snapshot/read, assert, screenshot. Different DSLs (Cypress commands, Playwright Locator calls, Selenium WebDriver, proprietary YAML) dress them up differently, but the underlying unit of work is identical. Assrt exposes its set as 18 MCP tool handlers in /Users/matthewdi/assrt-mcp/src/core/agent.ts between lines 16 and 196. They are: navigate, snapshot, click, type_text, select_option, scroll, press_key, wait, screenshot, evaluate, create_temp_email, wait_for_verification_code, check_email_inbox, assert, complete_scenario, suggest_improvement, http_request, wait_for_stable. When a services firm invoices you for a QA automation engineer, the deliverable is a composition of those 18 operations, padded with a dashboard and a markup.
What makes the email and OTP primitives in agent.ts actually different from a services retainer?
create_temp_email (line 115), wait_for_verification_code (line 120), check_email_inbox (line 128), and http_request (line 172) let the test agent itself complete a signup flow end to end. When the system prompt at agent.ts:228-236 detects a signup form, it tells the model to (1) create a disposable email, (2) submit the form, (3) poll the disposable inbox for the OTP, (4) paste the digits back into the form. No QA engineer is sitting by a mailbox. On closed AI QA platforms this step is usually skipped (the vendor asks you for a pre-provisioned test account) or priced as a separate add-on. On a services retainer it is a human cost. In agent.ts it is four tools and about 150 lines of glue in /Users/matthewdi/assrt-mcp/src/core/email.ts.
What does the 18-primitive design buy me versus a YAML-style recorder?
Two things. First, every primitive lives in a readable TypeScript array (the TOOLS constant at agent.ts:16-196), so when a test misbehaves you can grep the handler name and see exactly what the agent issued to the browser. Second, the primitives map directly onto Playwright MCP's own tool names. The browser manager at browser.ts:284 resolves @playwright/mcp/cli.js and spawns it over stdio; click becomes a Playwright MCP click, snapshot becomes the accessibility-tree snapshot Playwright already exposes. There is no transpilation step from 'Assrt DSL' into browser operations, because there is no Assrt DSL. A vendor YAML has to be compiled by the vendor at runtime, which is the layer that breaks when your app changes.
How does wait_for_stable (agent.ts:186) replace the 'flaky test retry' a services firm sells as a premium add-on?
wait_for_stable polls the DOM and resolves only when no mutations have occurred for a configurable stable window (default 2 seconds, max 30 seconds of wall-clock wait). The system prompt at agent.ts:249-254 instructs the model to call wait_for_stable after any async action: submitting forms, sending chat messages, triggering search. In a consultancy workflow, the corresponding solution is 'retry three times with exponential backoff' plus a human watching the flake rate. wait_for_stable replaces both by binding the wait to the actual signal (DOM quiescence) rather than a timer guess. This is why the same #Case suite can run against a slow staging server and a fast local dev server without per-environment tuning.
Does the 'services' framing even apply when I run Assrt against prod with --extension?
It applies even more. In extension mode (browser.ts:258, arg --extension) the runner attaches to your already-running Chrome via Chrome DevTools Protocol instead of launching a fresh headless browser. That means cookies, logged-in state, and the browser fingerprint your users actually have. A services firm would script around your auth and require test credentials; the sharedBrowser in extension mode uses your credentials because it is your browser. The handshake is gated on a one-time Playwright MCP extension token (saved to ~/.assrt/extension-token by browser.ts:231-241), so subsequent runs are silent. This covers the 'test on prod without creating a new account every time' case most retainer contracts refuse to touch because of compliance handwaving.
What is the fastest path from reading this page to running the 18 primitives on my own app?
Four shell lines. npx @assrt-ai/assrt setup registers the MCP server with Claude Code, Cursor, Zed, and Windsurf (sets up ~/.cursor/mcp.json, ~/.claude/mcp.json). Start your dev server. Ask your coding agent 'Use assrt_plan on http://localhost:3000' to auto-generate 5 to 8 #Case blocks into /tmp/assrt/scenario.md. Ask 'Use assrt_test' to execute them. Under the hood, assrt_test spawns @playwright/mcp, the agent loop picks tools from the 18-entry TOOLS array, Haiku reasons over each step, and you get pass/fail plus a video player on a local port. No account, no dashboard, no per-seat license.
How does the cost compare line by line to a 2026 QA automation retainer?
2026 retainer ranges, from published sources: QA agencies average around $180K/year, in-house teams $312K/year, closed AI tools $12K to $24K/year, with hidden test-maintenance fees typically 20 to 50 percent on top. Assrt charges zero platform fee. The variable cost is Anthropic tokens at the claude-haiku-4-5-20251001 rate (agent.ts:9 sets this as DEFAULT_ANTHROPIC_MODEL). A 5-case, 3-to-4-step-per-case suite uses roughly 8k to 15k input and 2k output tokens, which at Haiku pricing clears well under a penny per run. You can also set ANTHROPIC_BASE_URL and route to a local LLM, in which case the platform fee stays zero and the variable fee is your GPU time.
Can I export or re-use the 18 primitives outside of Assrt if I decide to switch?
Yes. Every primitive in TOOLS (agent.ts:16-196) is a thin adapter over a Playwright MCP tool. If you stand up your own @playwright/mcp session (npx @playwright/mcp@latest --extension or headless), you can issue the same tool names (browser_navigate, browser_click, browser_type, browser_snapshot) and drive the same Chromium. Your #Case plans in /tmp/assrt/scenario.md are Markdown; pasting them into a different LLM-driven runner works. There is no vendor format to convert. This is what 'zero vendor lock-in' means in code: the primitives are standard MCP calls, the plan format is Markdown, the runner is open source (MIT). The exit cost is a git add and a different npx binary.
What does assrt_diagnose add on top of the 18 primitives when something fails?
assrt_diagnose (in server.ts:866, separate from the primitives) takes the failing #Case text, the error evidence, and the URL, and runs one Haiku call with a diagnose system prompt (server.ts:240-268). The model emits a Root Cause, an Analysis, a Recommended Fix, and a Corrected Test Scenario in the same #Case format so you can paste the fix straight back into /tmp/assrt/scenario.md. This is the QA-engineer-triages-a-red-build role in a retainer. It costs fractions of a cent, closes in seconds, and you can re-run it with a different model via the model parameter. The whole diagnose handler is about 60 lines because, again, it is one model call, not a ReAct loop.
Why is every primitive in a single flat array rather than a tree of modules?
Because the LLM treats the tools as a flat namespace. Anthropic's tool-calling API takes an array of Tool definitions and chooses among them by name; Gemini's is the same shape. The TOOLS constant at agent.ts:16-196 is intentionally a flat array so the schema the model sees is identical to the schema the runner dispatches against. If you want to extend Assrt with your own primitive (say, a Stripe-test-card helper or a headless-browser-console-log reader) you append one entry to TOOLS and add a handler in the dispatch loop. That is the full extension protocol. No plugin registry, no config, no yaml.
How did this page land for you?
React to reveal totals
Comments (••)
Leave a comment to see what others are saying.Public and anonymous. No signup.