AI Quality Engineering

What remains of quality engineering when AI writes the code?

Fifteen years of test architecture and quality engineering — at insurers, banks, government and national utilities. Since last year I've been applying that discipline to the agentic AI layer on top: agent orchestration, review gates for LLM output, and the audit trail regulated sectors still need. On this site you'll find how I work, the cases where it showed up, four agents running in production, open-source templates on GitHub, and the pieces I publish.

What I do myself is selective. What I share isn't.

Example: a review gate I run at a client. Four checks, one verdict, in CI.
15+
Years
QA & test automation
14
Clients
Enterprise & government
4
Agents
In production, licensed
2
Open source
Templates on GitHub
Ambassador
Cypress.io
The approach

Four pillars beneath a software development process where quality stays in hand.

Every engagement touches all four, in different mixtures. My approach is always effective and pragmatic: the best solution that fits your organisation for the long term.

01

Test architecture that outlives the team.

Software testing patterns, AC/TS traceability, per-feature coverage matrix, Language-First. The foundation for any quality architecture, with Cypress and/or Playwright. Built so AI coding agents and the surrounding Quality Gates can safeguard quality continuously.

02

AI coding agents in your release pipeline.

Claude Code, Cursor, Copilot. In-repo subagents, AGENTS.md, slash commands, review gates. The agents run in your repo and your CI; your source never ends up anywhere else.

03

Audit trail by design.

DORA, GDPR, NIS2, ISO/IEC 25010, TMMi. Every change traces back to an acceptance criterion, every gate is documented. What you do, you can defend — to an inspector or to internal audit.

04

Quality as a financial story.

A report that both your CFO and your auditor understand. What does a production bug cost? What does a green CI save? I make the invisible costs in the development process visible and actionable.

Earlier work

Where the work has been applied.

A selection. The same discipline, in different contexts — from a worldwide government rollout to a solo SaaS I build myself.

QualityProfit · Solo SaaS2024 — present

Solo SaaS, four agents.

Founder · Full-stack with Claude Code

A customer-deployed dashboard that turns Jira / Azure DevOps / GitHub / GitLab signals into financial ROI for QA. Four in-repo subagents: release-reviewer, deploy-monitor, onboarding-smoke-tester, requirements-guard.

Python · FastAPI · Pydantic · React · Cypress · Docker · Caddy · Stripe · Claude Code
New Orange Digital Agency2024 — present

Their AI test stack, productized.

Architecture · Framework · Claude Code skill

Built an AI-augmented Playwright architecture, framework and reusable Claude Code skill for a Dutch digital agency. Designed to plug-and-play into any current or future client engagement, not for a single project. Codifies project structure, Page Object pattern, AC/TS traceability and per-feature coverage matrix. The agency now ships AI-augmented test suites to clients through a single Claude Code skill — productized AI-testing assets at agency scale.

Playwright · TypeScript · Next.js 16 · Turborepo · Tailwind v4 · Claude Code · Cursor · Copilot
Evides · National Utility2024 — present

Quality Framework rollout.

Quality Assurance Manager

TMMi-aligned Quality Framework on top of ISO/IEC 25010, embedded in delivery pipelines for a national utility. Quality maturity expressed in financial impact — defensible to a CFO and an auditor in the same room.

TMMi · ISO/IEC 25010 · Quality Framework
RvO · NL Government2024 — 2025

Language-First in gov.

Cypress + Playwright architecture

Test architecture across multiple government departments where different testing tools, specifications, scenarios and tests share one continuous human-readable layer. Presented at CypressConf 2024 — "Beyond the Battle: Empowering Test Automation with a Language-First Approach." The same Language-First approach I now extend into AI-augmented delivery.

Cypress · Playwright · TypeScript · Lerna · Artillery · Gherkin · Blueriq · GitLab · SonarQube
VGZ · Insurance2022 — 2024

Architect for the long run.

Test Automation Architect

Cypress + Lit Elements test architecture with Cucumber traceability, integrated into Azure DevOps. Page Object discipline and spec-to-test traceability that let the team keep the suite maintainable after I left. Every change anchored to a spec, every spec traceable to an acceptance criterion. Built to outlive me; handed back to the team.

Cypress · Lit Elements · Cucumber · Azure DevOps
Ministry of Foreign Affairs · Government2021 / 2022 — 2024

Global rollout, audited.

QA Architect & Test Manager

Test management for a worldwide rollout under ISO 25010 / TMap discipline. Every change traceable, every gate documented, every decision defensible to an inspector. Earlier engagement covered Cypress, Angular and Docker on Azure DevOps.

ISO 25010 · TMap · Cypress · Angular · Docker · Azure DevOps
Four agents in production

What turns out to work in CI pipelines.

Four review gates I distilled out of client work over the last two years. They run in my own codebase and at a handful of teams. No catalog, no pre-order — only what I'm willing to show because it makes it to production. Anyone who wants to try one knows where to find me.

In production · v0.4.2

release-reviewer

Reviews every push for risk patterns: secrets in the diff, coverage thresholds, destructive migrations, touched auth code. Posts a verdict on the PR with the failing rule IDs. Running on every commit in my own codebase since 2024.

Email me about it →
In production

deploy-monitor

Verifies container digests on the target VPS match the released artifact. Catches the silent drift between "CI was green" and "what's actually running in production."

Email me about it →
In production

onboarding-smoke-tester

Walks the full onboarding flow end-to-end through the real API on every release. Catches the "registration is broken in prod" class of regression before a customer does. Runs independently; opens an issue on failure.

Email me about it →
In production

requirements-guard

Reconciles the written spec against the live code on every PR. Flags drift between what was promised and what was built — before it reaches an auditor or a customer. The discipline the other three agents lean on.

Email me about it →
An agent isn't plug-and-play. First a short working session to see if it fits your repo — and if it clicks, a focused two-to-four-week integration. Wondering if one of these would suit your team? Just drop me a note; we'll look at it together.
The context

Three sectors, one recurring conversation.

The domains I've worked in for fifteen years: insurance, financial services and government. The common question — from auditor, regulator, internal audit — is how AI-augmented delivery stays explainable to someone who doesn't read code.

Insurance & financial services

DORA is here. So is DNB.

Insurers, banks, payment platforms, asset managers. DORA, GDPR, NIS2, internal audit and third-party ICT risk — plus the regulators behind them. For Dutch insurers: DNB and AFM oversight, Wft implications, Solvency II reporting systems, IFRS 17 reconciliation pipelines. The regulator isn't asking whether you use AI any more — they're about to ask how you control it.

Government & public sector

Auditable at delivery, by default.

Ministries, public-service implementers, government IT bodies. Algoritmeregister, AVG, BIO, NPR 5326, EU AI Act. AI-assisted delivery that survives both an inspector and a change of administration — with privacy and data residency answered by architecture, not paperwork. The discipline I built at RvO and the Ministry of Foreign Affairs.

Engineering & QA leadership

Two questions. One answer needed.

CTOs, VPs of Engineering, Heads of QA in regulated organisations. Since Claude Code, Cursor and Copilot accelerated everything, two questions land on your desk: the auditor wants to know how it's controlled, the CFO wants to know what it's worth. One story for both, or you have the conversation twice.

Outside this domain

The work doesn't fit everywhere.

A generic AI vendor with no regulatory story, a one-off Cypress audit divorced from architecture, or a pure consumer-internet context where "move fast, break things" is still the operating model — that's a different field. More honest to name it here than discover it in week six.

Trust · Continuity · Data residency

The three questions your CISO, DPO and auditor ask first.

Honest answers, named risks. The Trust & Data pack — sub-processor list, DPA, regional data-flow diagram, continuity arrangements, security questionnaire — is available to send to your inkoop, DPO and internal auditor before the first POC.

Where does your code go?

Inside your repo. Inside your CI.

The agents run inside your repository and your CI runners — no proprietary cloud holds your source. LLM access goes through your existing Claude Code, Cursor or Copilot enterprise tenant: your region, your DPA, your training opt-out. Sub-processor list, regional flow diagram and DPA highlights ship with the engagement pack.

Key-person risk

Solo founder. Named risk.

Paul is one engineer; pretending otherwise wastes everyone's time. Continuity arrangements — runbooks, named backup contractor, source-code escrow options — are scoped per engagement and signed before kick-off. Request the Trust & Data pack for the specifics that apply to your contract shape.

Voor inkoop, DPO & auditor

NL-bijsluiter, DORA / AVG / Wft.

KvK-registered company, standard DPA, sub-processor list, security questionnaire (CAIQ-lite) and a Nederlandstalige one-pager covering DORA, AVG en Wft-implicaties — voor inkoop, DPO en interne auditor. De bijsluiter wordt op aanvraag toegestuurd; vraag 'm aan via de knop hieronder.

Request Trust & Data pack → Vraag NL-bijsluiter aan (DORA / AVG / Wft) →
Tech stack

What I bring into your repo.

Pragmatic, opinionated, and chosen for AI extension — not novelty.

AI / Agents
Claude Code · Custom subagents · Hooks · Prompt engineering · AGENTS.md / SKILL.md · Cursor · GitHub Copilot · Windsurf
Testing
Cypress.io · Playwright · Jest · Cucumber / Gherkin · Postman · Artillery · JMeter · axe-core · TestNG · Selenium
Frontend
TypeScript · React · Next.js · Vue · Angular · Lit · Tailwind · Turborepo · Lerna
Backend
Python · FastAPI · Pydantic · Java · Hibernate / JPA · Node · REST · GraphQL
DevOps
Docker · GitHub Actions · GitLab CI · Azure DevOps · TeamCity · Jenkins · Caddy · SonarQube
Quality
TMMi · ISO/IEC 25010 · NPR 5326 · TMap · SHEQC Grooming · OTAP · CI/CD · Page Object pattern
Integrations
Jira · GitHub · GitLab · Azure DevOps · Blueriq · Sitecore · Stripe · AWS Cognito
Career timeline

15+ years across enterprise & government.

A selection — earlier roles span ING, SBB, Ministry of Foreign Affairs, ZLM, KPN and lecturing at The Hague University of Applied Sciences.

2024 — 2025
RvO (NL Government)
Quality Assurance Manager
2024 — present
Evides
Quality Assurance Manager
2024 — present
QualityProfit
Founder · Solo SaaS
2024 — present
New Orange Digital Agency
AI test stack productized
2022 — 2024
VGZ
Test Automation Architect
2022 — 2024
Ministry of Foreign Affairs
Test Manager
2022 — 2023
Aon
Quality Automation Architect
2021
Ministry of Foreign Affairs
QA Architect
2021
CZ
Test Automation Specialist
2020
Harlem Next · Nederlandse Transplantatie Stichting
Test Automation Specialist
2019 — 2020
Aon
Quality Assurance Manager
2018 — 2019
ING
Test Automation Specialist
Also built

QualityProfit

My solo SaaS that makes quality costs visible for teams. Same discipline, in product form. The four agents above run inside it today.

qualityprofit.io →
Working session

An hour, your release pipeline, and honest questions.

No sales call. Write me a few lines about your team — that's enough — and I'll send a short agenda back. If it fits, we go further. If it doesn't, I'll say so honestly.