Career, honestly

A QA automation career built on Playwright, in 2026, when an agent is the one writing the locators

This is the version of the answer I would give a friend who messaged me on Reddit asking whether to lean into Playwright as their automation track, and whether the job is going to survive the next two years of AI. The short version is yes, but the day looks different. The longer version is below, with file paths and verdicts.

Matthew Diakonov, Written with AI

Published May 7, 20269 min read

Direct answer, verified 2026-05-07

Yes, a Playwright-centric QA automation career is still worth it in 2026. What changed is the leverage: the highest-paid hours used to be locator authoring; in 2026 they are plan writing, pass-criteria precision, and triaging the structured pass/fail report an agent produces after running your plan through Playwright MCP. The Playwright fundamentals (docs) are still required. They are no longer the whole job.

What actually moved

For about a decade, the work that paid in QA automation was “turn this Jira ticket into a deterministic Playwright test.” You read the acceptance criteria, opened the app, dug through the DOM, picked a stable selector, wrote the case, watched it flake, tweaked the wait, watched it pass. Hours per case. Most of those hours were not the interesting part; the interesting part was deciding what to test and what to assert. The rest was glue.

The glue is the part that compresses. An LLM driving Playwright through the Playwright MCP server can take a paragraph of plain English (“add SKU-7842 to cart, verify subtotal increments by $19.99”), inspect the page, find the element by accessibility role, click it, and emit a structured pass or fail. It is not magic; it gets confused, it occasionally hallucinates a button, it sometimes needs a sharper pass criterion. But the average time-per-case has dropped enough that the rest of the day rebalances around the decisions that did not compress: what to test, what counts as passing, and what to do when the run is red.

The new loop, on one screen

If you are coming from a 2022-shaped Playwright job, this is the part that will feel different. The artifact you ship is no longer the `.spec.ts` file. The artifact is the plan and the report.

The loop, four stations

1
Write the plan
Author `#Case` blocks in plain English with explicit pass criteria. This is the part you actually own.
2
Run the agent
`assrt_test` runs each case in a shared browser, calls Playwright tools step by step, and records video.
3
Read the verdict
A `TestReport` JSON comes back: scenarios, steps, assertions, durations, plus a webm. You read it, not skim it.
4
Diagnose and decide
If a case is red, `assrt_diagnose` proposes a hypothesis. You pick app bug, plan bug, or flake and act.

The exact MCP tool surface, if you want to verify this against source rather than take my word for it, lives in assrt-mcp/src/mcp/server.ts: three tools (assrt_test, assrt_plan, assrt_diagnose), with the schema for each defined inline starting around line 343. The default model is claude-haiku-4-5-20251001; you can override it.

The artifact: what a `TestReport` actually contains

This is the moment the career conversation gets concrete. The shape you stare at every day is defined in assrt-mcp/src/core/types.ts. A run produces a TestReport with a list of ScenarioResult, each scenario carrying its own TestStep[] and TestAssertion[].

// assrt-mcp/src/core/types.ts
export interface TestStep {
  id: number;
  action: string;
  description: string;
  status: "running" | "completed" | "failed";
  reasoning?: string;
  error?: string;
  timestamp: number;
}

export interface TestAssertion {
  description: string;
  passed: boolean;
  evidence: string;
}

export interface ScenarioResult {
  name: string;
  passed: boolean;
  steps: TestStep[];
  assertions: TestAssertion[];
  summary: string;
  duration: number;
}

export interface TestReport {
  url: string;
  scenarios: ScenarioResult[];
  totalDuration: number;
  passedCount: number;
  failedCount: number;
  generatedAt: string;
}

Read the fields slowly. The reason the job did not vanish is right here. reasoning is a free-text field where the agent says why it did what it did at that step. When a case fails, you read error against reasoning against evidenceand decide whether the app shipped a bug, your plan was loose, or the environment flaked. Nobody else on the team is going to do that for you. That is the job.

What writing a `#Case` actually looks like

Here is a real plan against a fake checkout. It is the level of specificity that produces a reliable run. Vaguer plans flake; tighter plans are slower to write but cheaper to live with.

#Case 1: Add to cart updates subtotal
Navigate to /products/SKU-7842.
Click the "Add to cart" button.
Verify the cart drawer opens and shows the line item
"Blue Hoodie L" with quantity 1 at $19.99.
Verify the cart subtotal in the header reads "$19.99".

passCriteria:
The cart drawer is visible, the line item exists, and the
header subtotal reads exactly "$19.99". If anything else
is in the cart, this case fails.

#Case 2: Removing the only item empties the cart
Open the cart drawer (assume Case 1 ran first).
Click the "Remove" button next to "Blue Hoodie L".
Verify the cart drawer shows the empty state with the text
"Your cart is empty" and the header subtotal reads "$0.00".

passCriteria:
The empty-state text is visible, the line item is gone, and
the header subtotal reads exactly "$0.00".

Notice what is and is not in there. There is no XPath, nodata-testidchain, no waiting strategy. There is the user goal, the inputs, and a sharp pass criterion. The agent figures out the locator and the wait. The career skill is writing the rest of it well.

What a tight plan has

One concrete user goal per case, not a feature umbrella
Inputs spelled out: the SKU, the email, the exact button label
A single pass criterion you would accept on a code review
Negative cases written as their own `#Case`, not as a footnote
Variables for anything that changes per run (test email, build id)
Tags so CI can pick `smoke` versus `regression` cleanly

How a request actually flows

If you have used MCP before, this will look familiar. If you have not, it is the part of the new world worth understanding before you take a job that says “Playwright + AI” in the JD.

One assrt_test call, end to end

The agent never sees a CSS selector. It sees an accessibility ref like ref=e5 attached to a role-and-name pair, and it acts on that. This is why the new plans read human (“the Add to cart button”) rather than mechanical (“[data-testid="add-to-cart"]”). It is also why the trade is real: you give up some determinism for a lot of ergonomics, and the structured report is what puts the determinism back.

The triage move that separates senior from junior

Anyone can write a plan. The career is built on what you do when the report comes back red. The agent will tell you it could not click the “Sign in” button. A junior runs it again. A senior reads the run video, opens assrt_diagnose, reads the proposed hypothesis, then makes one of three calls.

Triage, in order

Read the failing step's `error` field literally; do not paraphrase
Open the run video at the timestamp of the failed step
Compare the agent's `reasoning` to your stated `passCriteria`
Decide: app bug, plan bug, or environment flake
If app bug, file with the runId and the step number
If plan bug, tighten `passCriteria` and re-run before merging
If flake, look at the previous N runs before retry-blaming the test

The reason this is the senior move is that the cost of getting it wrong compounds. A flake you mis-classify as an app bug eats engineering time. An app bug you mis-classify as a flake ships to production. The triage muscle is what gets paid in 2026, and it is the thing nobody is teaching on YouTube because it does not look impressive in a thumbnail.

Open-source matters more than it used to, not less

If you are picking the stack you are going to build a career on, picking the closed platform that costs $7,500 a month and outputs proprietary YAML is not the bet it looks like. Two years from now the test plans you wrote on it are not portable, the diagnoses you produced are locked in someone's dashboard, and the resume signal is “I am fluent in a vendor.” The Playwright + open-source-agent path produces artifacts that live in your repo: a plan file, a report JSON, a video. They are yours. They are also the artifacts a hiring manager can actually look at.

This is one of the reasons we built Assrt as the open-source variant. The MCP server, the CLI, and the test agent are all on GitHub. The default model is claude-haiku-4-5-20251001 but you can swap it. The reports land on disk under /tmp/assrt/. Nothing is hidden behind a login wall you might lose access to.

A 30-day plan if you are mid-career and reorienting

Concretely, what would I do if I were a five-year QA engineer reading this thread on a Sunday night, slightly worried about the next two years?

Week one. Re-read the Playwright best-practices doc. Make sure your Page Object instinct is current. If you cannot explain what an actionability check is in two sentences, fix that first.
Week two. Stand up an agentic runner against a personal app. With Assrt: npx @m13v/assrt setup, then npx @m13v/assrt run --url http://localhost:3000 --plan-file plan.txt --json --video. Read the resulting report end to end.
Week three. Convert one real suite at work to plans. Pick the flakiest `.spec.ts` you own. Rewrite it as `#Case` blocks. Keep the original. Run both for two weeks; compare the failure modes. This is your portfolio writeup.
Week four.Wire the new runner into a PR check. Even on a personal repo. The CI snippet is the third resume artifact. Two thousand words of Medium prose about “the future of QA” will not move a hiring manager. A working PR check will.

Author

Who wrote this

Matthew Diakonov

Founder, Assrt

Building Assrt, an open-source agentic QA runner. Reads more failed test reports than is healthy.

Stuck on the plan-versus-spec transition? Let's talk.

If you are mid-rewrite and the agent is producing flaky reports, a 20-minute call usually unsticks it faster than another week of trial and error.

Frequently asked

Frequently asked questions

Is a Playwright-focused QA automation career still worth pursuing in 2026?

Yes, but the leverage points have moved. Five years ago the job was 90 percent locator authoring and selector babysitting. In 2026 the locator-writing part is increasingly handled by an LLM that drives Playwright through the Playwright MCP server. What you actually own is the test plan, the pass criteria, and the diagnosis when the report comes back red. The Playwright skill floor (understanding the page object model, knowing what an actionability check is, reading the trace viewer) is still required, because when a run fails you are the only one who can decide whether the app is broken or the plan was wrong. The job did not disappear; the day-to-day rebalanced.

What is the actual artifact a Playwright QA engineer ships in 2026?

Two artifacts, and they are not the same as in 2022. The first is a test plan written in plain English, structured as `#Case 1: <name>`, `#Case 2: <name>` blocks with the steps and the pass criteria spelled out. In Assrt that plan lives in your repo as a `.txt` file or inline in a CI script. The second is the result of executing it: a structured `TestReport` JSON object with one `ScenarioResult` per case, each containing a `TestStep[]` (action, description, status, reasoning, error, timestamp) and a `TestAssertion[]` (description, passed, evidence), plus a webm video of the run. The `.spec.ts` file you used to commit is now the agent's runtime, not your output. You read the report and decide what shipped.

If the agent writes the locators, what skills should I actually deepen?

Four. First, plan-writing: the difference between a vague case (`verify cart works`) and a tight case (`add SKU-7842 to cart, verify subtotal increments by $19.99 and the line item reads Blue Hoodie L`) is the difference between a flaky agent and a reliable one. Second, pass-criteria precision: the `passCriteria` field in `assrt_test` is free-form text, but the agent treats it as a contract. Loose criteria produce loose verdicts. Third, failure triage: when a step shows `status: failed` with `error: timeout waiting for /products/7842`, you decide app bug versus test bug. The `assrt_diagnose` tool exists for exactly this and the LLM proposes a hypothesis, but a human still picks the answer. Fourth, the same Playwright internals you always needed: viewport, trace, network mocking, storage state, isolation. Those have not gone anywhere.

Is `npx playwright codegen` plus an LLM good enough? Why bring an agent into it?

Codegen records what you click. It is fantastic for capturing a known happy path. It does not crawl the app, it does not propose cases you did not think of, it does not re-run a flaky case until it stabilizes, and it does not write a one-page diagnosis when something fails. An agentic runner like Assrt does each of those: `assrt_plan` navigates the URL and proposes `#Case` blocks based on what it actually sees on the page, `assrt_test` runs them with a video, `assrt_diagnose` analyzes the failure mode and suggests whether the app or the plan is wrong. Codegen is a recorder. The agent is a junior QA you direct.

How long does it take to get productive on this stack if I already know Playwright?

If you can read a `.spec.ts` file and explain what `await expect(page.getByRole('button', { name: 'Sign in' })).toBeVisible()` does, you are most of the way there. The new vocabulary is small: `#Case` blocks, `passCriteria` text, the `TestReport` shape, the three MCP tools (`assrt_test`, `assrt_plan`, `assrt_diagnose`). The orchestration around it (running headless versus headed, isolated profile versus extension mode against your real Chrome) maps directly to flags you have used before. A week is enough to ship your first agent-driven suite into CI. The slower part is unlearning the instinct to write the locator chain yourself when the agent can find the element from a description.

What does this look like for someone breaking into QA automation right now, with no Playwright background yet?

Honest answer: you still learn Playwright first. The agent does not exempt you from understanding what an actionability check is, why `force: true` is a smell, or how trace viewer works. It does mean you can be productive earlier. A common path: spend two weeks on the official Playwright docs and `npx playwright codegen` against a toy app. Then write five `#Case` blocks against the same toy app and run them through Assrt. Read the resulting `TestReport` JSON; pick one failed step and walk through the `error` and `reasoning` fields. By the end of a month you have done the entire loop a senior QA does: plan, run, diagnose, fix. The senior version is the same loop with bigger stakes and better instincts.

What about salary and titles? Is this still 'QA Automation Engineer' or something else?

Titles are lagging. The role still gets posted as 'QA Automation Engineer' or 'SDET', and the JD still asks for Playwright, Cypress, or Selenium. What is shifting under the title is the tool budget: instead of a $7,500 a month closed-source platform like QA Wolf, more teams are running open-source agentic runners against their own infra and writing the test plans in their repo. The compensation band has not collapsed in the way some doomers predicted. The teams that pay best are the ones who realized that the bottleneck was never locator authoring; it was deciding what to test and owning the verdict, and those are the parts that did not automate.

Will this work for my stack? React, Next.js, Tauri, mobile web?

Anything Playwright supports works, because the agent calls Playwright underneath. That covers Chromium, Firefox, and WebKit; React, Next, Vue, Svelte, plain HTML; mobile-emulation viewports; iframes; service workers; auth-gated dashboards. Native mobile is a separate world (Appium, XCUITest) and the agent does not help there. For Tauri or Electron, Playwright works against the embedded webview and the agent rides on top of that. If your bet is web in 2026, the answer is yes.

How do I prove on a resume that I do this kind of work?

Three things land. First, a public repo with five to ten `#Case` blocks and the corresponding `TestReport` JSON checked in (sanitized of any secrets). It shows you can write tight plans and read agent output. Second, one diagnosed flake with a written-up before-and-after: the failed run, the hypothesis, the fix, the green run. This is the work nobody is doing on LeetCode. Third, a CI snippet that wires `npx @m13v/assrt run --json` into a PR check. Hiring managers in 2026 want to see that you can ship the loop, not just write a `.spec.ts` file.

Keep reading

Career

QA engineer career paths in 2026: test infrastructure vs quality strategy

The broader split inside QA: who owns the pipeline and who owns the coverage model. Worth reading next to this one.

Read

Foundations

Playwright for beginners: a working developer's first week

If you do not have Playwright in your hands yet, start here. The agent layer makes more sense once you have written a `.spec.ts` yourself.

Read

Deep dive

AI Playwright test maintenance: what self-healing actually does

Self-healing is a marketing word. Here is the part that is real, the part that is not, and what it means for your week.

Read

What actually moved

The new loop, on one screen

The loop, four stations

The artifact: what a `TestReport` actually contains

What writing a `#Case` actually looks like

How a request actually flows

The triage move that separates senior from junior

Open-source matters more than it used to, not less

A 30-day plan if you are mid-career and reorienting

Who wrote this

Stuck on the plan-versus-spec transition? Let's talk.

Frequently asked

Frequently asked questions

Keep reading

QA engineer career paths in 2026: test infrastructure vs quality strategy

Playwright for beginners: a working developer's first week

AI Playwright test maintenance: what self-healing actually does

Comments (••)

Comments ()