{"id":507,"date":"2026-05-28T09:58:00","date_gmt":"2026-05-28T09:58:00","guid":{"rendered":"https:\/\/www.berkkibarer.com\/?p=507"},"modified":"2026-05-28T09:58:01","modified_gmt":"2026-05-28T09:58:01","slug":"building-an-autonomous-semantic-web-testing-system-by-berk-kibarer-ongoing-project-notes","status":"publish","type":"post","link":"https:\/\/www.berkkibarer.com\/?p=507","title":{"rendered":"Building an Autonomous Semantic Web Testing System by Berk Kibarer (Ongoing project notes)"},"content":{"rendered":"\n<h2 class=\"wp-block-heading\">From Exploration to Assurance<\/h2>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>Autonomous browser testing becomes genuinely difficult the moment the system must decide what actually matters.<\/p>\n<\/blockquote>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h1 class=\"wp-block-heading\">Introduction<\/h1>\n\n\n\n<p>Most autonomous web testing systems begin with the same architecture:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>crawl pages<\/li>\n\n\n\n<li>click links<\/li>\n\n\n\n<li>fill forms<\/li>\n\n\n\n<li>collect responses<\/li>\n\n\n\n<li>generate reports<\/li>\n<\/ul>\n\n\n\n<p>At first, this feels surprisingly powerful.<\/p>\n\n\n\n<p>The engine explores pages autonomously.<br>It generates traffic.<br>It captures requests.<br>It finds occasional issues.<\/p>\n\n\n\n<p>But after enough sessions, a deeper question emerges:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>Did the system actually test anything important?\n<\/code><\/pre>\n\n\n\n<p>That question changed the architecture of our system entirely.<\/p>\n\n\n\n<p>The result was a transition from:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>autonomous exploration\n<\/code><\/pre>\n\n\n\n<p>to:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>autonomous semantic assurance\n<\/code><\/pre>\n\n\n\n<p>This article summarizes the architecture, lessons learned, and major engineering shifts behind that transition.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h1 class=\"wp-block-heading\">The First Major Realization<\/h1>\n\n\n\n<h2 class=\"wp-block-heading\">Exploration Is Not Assurance<\/h2>\n\n\n\n<p>Initially, the system optimized for exploration efficiency.<\/p>\n\n\n\n<p>The engine rewarded:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>new pages<\/li>\n\n\n\n<li>new graph nodes<\/li>\n\n\n\n<li>new templates<\/li>\n\n\n\n<li>frontier expansion<\/li>\n\n\n\n<li>novelty<\/li>\n\n\n\n<li>low-cost progression<\/li>\n<\/ul>\n\n\n\n<p>This worked extremely well.<\/p>\n\n\n\n<p>Coverage exploded.<\/p>\n\n\n\n<p>Reports became larger and more sophisticated.<\/p>\n\n\n\n<p>But something important was missing.<\/p>\n\n\n\n<p>A run could show:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>150 actions<\/li>\n\n\n\n<li>70 pages<\/li>\n\n\n\n<li>multiple backend captures<\/li>\n\n\n\n<li>successful navigation<\/li>\n\n\n\n<li>replay candidates<\/li>\n<\/ul>\n\n\n\n<p>while still failing to deeply test the one semantically important form on the site.<\/p>\n\n\n\n<p>This was the first architectural turning point.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h1 class=\"wp-block-heading\">Architecture Overview<\/h1>\n\n\n\n<h2 class=\"wp-block-heading\">Autonomous Semantic Testing Architecture<\/h2>\n\n\n\n<pre class=\"wp-block-code\"><code>\u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510\n\u2502 Browser Automation Layer \u2502\n\u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518\n              \u2502\n              \u25bc\n\u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510\n\u2502 Exploration Engine       \u2502\n\u2502 - frontier discovery     \u2502\n\u2502 - navigation             \u2502\n\u2502 - state expansion        \u2502\n\u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518\n              \u2502\n              \u25bc\n\u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510\n\u2502 Semantic Extraction      \u2502\n\u2502 - field roles            \u2502\n\u2502 - form intent            \u2502\n\u2502 - environment signals    \u2502\n\u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518\n              \u2502\n              \u25bc\n\u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510\n\u2502 Flow Classification      \u2502\n\u2502 - auth                   \u2502\n\u2502 - contact_form           \u2502\n\u2502 - newsletter             \u2502\n\u2502 - search                 \u2502\n\u2502 - transactional          \u2502\n\u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518\n              \u2502\n              \u25bc\n\u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510\n\u2502 Flow Economics           \u2502\n\u2502 - expected gain          \u2502\n\u2502 - risk                   \u2502\n\u2502 - novelty                \u2502\n\u2502 - continuation value     \u2502\n\u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518\n              \u2502\n              \u25bc\n\u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510\n\u2502 High-Value Assurance     \u2502\n\u2502 - criticality            \u2502\n\u2502 - coverage states        \u2502\n\u2502 - minimum budgets        \u2502\n\u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518\n              \u2502\n              \u25bc\n\u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510\n\u2502 Backend Capture          \u2502\n\u2502 - internal writes        \u2502\n\u2502 - payload extraction     \u2502\n\u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518\n              \u2502\n              \u25bc\n\u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510\n\u2502 Mutation Planning        \u2502\n\u2502 - scoring                \u2502\n\u2502 - replay eligibility     \u2502\n\u2502 - safety filtering       \u2502\n\u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518\n              \u2502\n              \u25bc\n\u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510\n\u2502 Controlled Replay        \u2502\n\u2502 - bounded execution      \u2502\n\u2502 - same-origin replay     \u2502\n\u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518\n              \u2502\n              \u25bc\n\u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510\n\u2502 Finding Normalization    \u2502\n\u2502 - severity               \u2502\n\u2502 - confidence             \u2502\n\u2502 - reproducibility        \u2502\n\u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518\n              \u2502\n              \u25bc\n\u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510\n\u2502 Reporting &amp; Strategy     \u2502\n\u2502 - executive summaries    \u2502\n\u2502 - traceability           \u2502\n\u2502 - assurance visibility   \u2502\n\u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518\n<\/code><\/pre>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h1 class=\"wp-block-heading\">The Shift from Page-Centric to Semantic-Centric Thinking<\/h1>\n\n\n\n<p>The biggest architectural evolution was moving away from:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>pages\n<\/code><\/pre>\n\n\n\n<p>toward:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>semantic flows\n<\/code><\/pre>\n\n\n\n<p>The system no longer reasons primarily about URLs.<\/p>\n\n\n\n<p>It reasons about intent.<\/p>\n\n\n\n<p>Examples:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><thead><tr><th>Flow<\/th><th>Semantic Meaning<\/th><\/tr><\/thead><tbody><tr><td>Login form<\/td><td>auth<\/td><\/tr><tr><td>Email subscription<\/td><td>newsletter<\/td><\/tr><tr><td>Search box<\/td><td>search<\/td><\/tr><tr><td>Support form<\/td><td>contact_form<\/td><\/tr><tr><td>Checkout endpoint<\/td><td>transactional<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>This seems simple conceptually.<\/p>\n\n\n\n<p>In practice, it changes almost every downstream decision:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>exploration prioritization<\/li>\n\n\n\n<li>replay safety<\/li>\n\n\n\n<li>mutation depth<\/li>\n\n\n\n<li>stopping logic<\/li>\n\n\n\n<li>reporting<\/li>\n\n\n\n<li>retry behavior<\/li>\n\n\n\n<li>business relevance<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h1 class=\"wp-block-heading\">Exploration vs Assurance<\/h1>\n\n\n\n<p>One of the most important discoveries was that autonomous testing actually contains two competing systems.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Exploration Engine<\/h2>\n\n\n\n<p>Optimizes for:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>novelty<\/li>\n\n\n\n<li>graph growth<\/li>\n\n\n\n<li>coverage<\/li>\n\n\n\n<li>frontier expansion<\/li>\n\n\n\n<li>low-cost discovery<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">Assurance Engine<\/h2>\n\n\n\n<p>Optimizes for:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>confidence<\/li>\n\n\n\n<li>replayability<\/li>\n\n\n\n<li>mutation depth<\/li>\n\n\n\n<li>semantic importance<\/li>\n\n\n\n<li>reproducibility<\/li>\n\n\n\n<li>validation quality<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h1 class=\"wp-block-heading\">Exploration vs Assurance Model<\/h1>\n\n\n\n<pre class=\"wp-block-code\"><code>                  \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510\n                  \u2502 Exploration Engine  \u2502\n                  \u2502---------------------\u2502\n                  \u2502 novelty             \u2502\n                  \u2502 graph growth        \u2502\n                  \u2502 frontier expansion  \u2502\n                  \u2502 discovery           \u2502\n                  \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518\n                            \u2502\n                            \u25bc\n                 \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510\n                 \u2502 Flow Ranking Layer   \u2502\n                 \u2502----------------------\u2502\n                 \u2502 economics            \u2502\n                 \u2502 assurance pressure   \u2502\n                 \u2502 novelty pressure     \u2502\n                 \u2502 plateau steering     \u2502\n                 \u2502 continuity bonuses   \u2502\n                 \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518\n                           \u2502\n                           \u25bc\n                  \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510\n                  \u2502 Assurance Engine    \u2502\n                  \u2502---------------------\u2502\n                  \u2502 mutation depth      \u2502\n                  \u2502 replay validation   \u2502\n                  \u2502 backend assurance   \u2502\n                  \u2502 semantic confidence \u2502\n                  \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518\n<\/code><\/pre>\n\n\n\n<p>If exploration dominates completely:<\/p>\n\n\n\n<p>the system behaves like a crawler.<\/p>\n\n\n\n<p>If assurance dominates completely:<\/p>\n\n\n\n<p>the system gets stuck retrying a few flows forever.<\/p>\n\n\n\n<p>Balancing these forces became one of the hardest engineering problems in the project.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h1 class=\"wp-block-heading\">Plateau Logic Accidentally Hid Important Failures<\/h1>\n\n\n\n<p>The engine eventually became very good at detecting:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>repeated low-yield flows<\/li>\n\n\n\n<li>frontier starvation<\/li>\n\n\n\n<li>plateau conditions<\/li>\n\n\n\n<li>no-new-destination states<\/li>\n<\/ul>\n\n\n\n<p>Initially this improved efficiency significantly.<\/p>\n\n\n\n<p>But it introduced a subtle failure mode:<\/p>\n\n\n\n<p>semantically important flows could be abandoned too early.<\/p>\n\n\n\n<p>For example:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>contact forms<\/li>\n\n\n\n<li>auth flows<\/li>\n\n\n\n<li>state-changing endpoints<\/li>\n<\/ul>\n\n\n\n<p>might receive only shallow testing before exploration economics shifted attention elsewhere.<\/p>\n\n\n\n<p>The reports looked active.<\/p>\n\n\n\n<p>But assurance was weak.<\/p>\n\n\n\n<p>This led to the introduction of:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>High-Value Semantic Flow Assurance\n<\/code><\/pre>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h1 class=\"wp-block-heading\">High-Value Flow Assurance<\/h1>\n\n\n\n<p>Flows now receive:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>canonical category<\/li>\n\n\n\n<li>semantic criticality<\/li>\n\n\n\n<li>assurance budgets<\/li>\n\n\n\n<li>completion states<\/li>\n\n\n\n<li>plateau resistance<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h1 class=\"wp-block-heading\">Semantic Flow Lifecycle<\/h1>\n\n\n\n<pre class=\"wp-block-code\"><code>Detected\n    \u2502\n    \u25bc\nSubmitted\n    \u2502\n    \u25bc\nBackend Observed\n    \u2502\n    \u25bc\nMutation Generated\n    \u2502\n    \u25bc\nReplay Eligible\n    \u2502\n    \u25bc\nReplay Executed\n    \u2502\n    \u25bc\nValidated\n    \u2502\n    \u25bc\nCompleted\n<\/code><\/pre>\n\n\n\n<p>Exceptional states:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>Blocked\nValidation Failure\nSubmit No Effect\nEnvironment Hostile\n<\/code><\/pre>\n\n\n\n<p>This lifecycle became more important than raw page exploration.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h1 class=\"wp-block-heading\">Environment Classification Became Necessary<\/h1>\n\n\n\n<p>Another major lesson:<\/p>\n\n\n\n<p>many apparent testing failures were actually environment failures.<\/p>\n\n\n\n<p>The engine encountered:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Cloudflare challenges<\/li>\n\n\n\n<li>human verification gates<\/li>\n\n\n\n<li>auth redirects<\/li>\n\n\n\n<li>unstable navigation surfaces<\/li>\n\n\n\n<li>partial rendering environments<\/li>\n\n\n\n<li>anti-bot protections<\/li>\n<\/ul>\n\n\n\n<p>Without explicit environment modeling, reports became misleading.<\/p>\n\n\n\n<p>For example:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>exploration failed\n<\/code><\/pre>\n\n\n\n<p>might really mean:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>interaction_hostile\n<\/code><\/pre>\n\n\n\n<p>or:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>unstable\n<\/code><\/pre>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h1 class=\"wp-block-heading\">Environment Classification Pipeline<\/h1>\n\n\n\n<pre class=\"wp-block-code\"><code>Environment Detection\n        \u2502\n        \u25bc\n\u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510\n\u2502 accessible         \u2502\n\u2502 auth_required      \u2502\n\u2502 interaction_hostile\u2502\n\u2502 unstable           \u2502\n\u2502 partial            \u2502\n\u2502 blocked            \u2502\n\u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518\n          \u2502\n          \u25bc\nEnvironment Strategy Resolver\n          \u2502\n          \u25bc\nRetry Eligibility\n          \u2502\n          \u25bc\nControlled Retry\n          \u2502\n          \u25bc\nFinal Classification\n<\/code><\/pre>\n\n\n\n<p>This dramatically improved report trustworthiness.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h1 class=\"wp-block-heading\">Replay Safety Became More Important Than Replay Volume<\/h1>\n\n\n\n<p>Early mutation systems aggressively replayed everything.<\/p>\n\n\n\n<p>This generated activity.<br>It did not generate trust.<\/p>\n\n\n\n<p>The current architecture is intentionally conservative.<\/p>\n\n\n\n<p>Fields are classified semantically:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><thead><tr><th>Field<\/th><th>Role<\/th><\/tr><\/thead><tbody><tr><td>email<\/td><td>user_input<\/td><\/tr><tr><td>csrf_token<\/td><td>security_token<\/td><\/tr><tr><td>action<\/td><td>routing_action<\/td><\/tr><tr><td>hp_email<\/td><td>honeypot_or_anti_spam<\/td><\/tr><tr><td>page_url<\/td><td>tracking_context<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>Replay eligibility depends heavily on these roles.<\/p>\n\n\n\n<p>This reduced noisy findings dramatically.<\/p>\n\n\n\n<p>One major lesson:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>The best autonomous mutation systems are often highly selective systems.\n<\/code><\/pre>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h1 class=\"wp-block-heading\">Mutation Safety Pipeline<\/h1>\n\n\n\n<pre class=\"wp-block-code\"><code>Captured Request\n        \u2502\n        \u25bc\nField Role Classification\n        \u2502\n        \u25bc\nMutation Scoring\n        \u2502\n        \u25bc\nReplay Eligibility\n        \u2502\n        \u25bc\nSafety Filtering\n        \u2502\n        \u25bc\nControlled Replay\n        \u2502\n        \u25bc\nFinding Classification\n        \u2502\n        \u25bc\nSeverity &amp; Confidence Normalization\n<\/code><\/pre>\n\n\n\n<p>This pipeline turned out to be far more valuable than brute-force replaying.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h1 class=\"wp-block-heading\">Reporting Became an Engineering Problem<\/h1>\n\n\n\n<p>At some point the reports became too technically rich.<\/p>\n\n\n\n<p>They included:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>frontier economics<\/li>\n\n\n\n<li>graph growth<\/li>\n\n\n\n<li>novelty scores<\/li>\n\n\n\n<li>steering telemetry<\/li>\n\n\n\n<li>candidate rankings<\/li>\n\n\n\n<li>plateau metrics<\/li>\n<\/ul>\n\n\n\n<p>Technically useful.<\/p>\n\n\n\n<p>Humanly exhausting.<\/p>\n\n\n\n<p>Eventually we realized:<\/p>\n\n\n\n<p>The report should answer business questions first.<\/p>\n\n\n\n<p>Not engine questions.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h1 class=\"wp-block-heading\">The Most Valuable Report Surface<\/h1>\n\n\n\n<p>The most useful report section became:<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">High-Value Flow Coverage<\/h2>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><thead><tr><th>Flow<\/th><th>Criticality<\/th><th>Coverage State<\/th><th>Backend Seen<\/th><th>Replay<\/th><th>Findings<\/th><\/tr><\/thead><tbody><tr><td>Contact Form<\/td><td>High<\/td><td>mutation_generated<\/td><td>Yes<\/td><td>No<\/td><td>application_error_response<\/td><\/tr><tr><td>Newsletter<\/td><td>Medium<\/td><td>submitted<\/td><td>Yes<\/td><td>No<\/td><td>None<\/td><\/tr><tr><td>Search<\/td><td>Low<\/td><td>completed<\/td><td>No<\/td><td>No<\/td><td>None<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>This created far more trust than raw telemetry.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h1 class=\"wp-block-heading\">One Unexpected Lesson:<\/h1>\n\n\n\n<h1 class=\"wp-block-heading\">Semantic Contradictions Destroy Trust<\/h1>\n\n\n\n<p>At one point reports showed:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>Environment: accessible\nEnvironment Strategy: aborted_before_exploration\nRecorded Actions: 149\n<\/code><\/pre>\n\n\n\n<p>Technically this happened because preflight degraded while exploration later continued.<\/p>\n\n\n\n<p>But semantically the report contradicted itself.<\/p>\n\n\n\n<p>Humans noticed immediately.<\/p>\n\n\n\n<p>This became one of the most important lessons in the project:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>Autonomous systems are trusted through semantic coherence, not raw technical correctness.\n<\/code><\/pre>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h1 class=\"wp-block-heading\">The System Is No Longer a Crawler<\/h1>\n\n\n\n<p>The system now reasons about:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>semantic importance<\/li>\n\n\n\n<li>assurance depth<\/li>\n\n\n\n<li>replay safety<\/li>\n\n\n\n<li>environment hostility<\/li>\n\n\n\n<li>retry eligibility<\/li>\n\n\n\n<li>mutation value<\/li>\n\n\n\n<li>coverage progression<\/li>\n\n\n\n<li>reproducibility<\/li>\n\n\n\n<li>business relevance<\/li>\n<\/ul>\n\n\n\n<p>At this point, the architecture behaves much more like:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>an autonomous semantic testing system\n<\/code><\/pre>\n\n\n\n<p>than:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>a crawler with testing features\n<\/code><\/pre>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h1 class=\"wp-block-heading\">Final Observation<\/h1>\n\n\n\n<p>The original question was:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>How many pages did we explore?\n<\/code><\/pre>\n\n\n\n<p>The current question is:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>Did we autonomously spend enough effort on the things that actually matter?\n<\/code><\/pre>\n\n\n\n<p>That single shift changed almost the entire architecture.<\/p>\n ","protected":false},"excerpt":{"rendered":"<p>From Exploration to Assurance Autonomous browser testing becomes genuinely difficult the moment the system must decide what actually matters. Introduction Most autonomous web testing systems begin with the same architecture: At first, this feels surprisingly powerful. The engine explores pages autonomously.It generates traffic.It captures requests.It finds occasional issues. But after enough sessions, a deeper question emerges: That question changed the architecture of our system entirely. The result was a transition from: to: This article summarizes the architecture, lessons learned, and major engineering shifts behind that transition. The First Major Realization&hellip;<\/p>\n","protected":false},"author":1,"featured_media":508,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":"","_wp_rev_ctl_limit":""},"categories":[1],"tags":[],"class_list":["post-507","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-published"],"post_mailing_queue_ids":[],"_links":{"self":[{"href":"https:\/\/www.berkkibarer.com\/index.php?rest_route=\/wp\/v2\/posts\/507","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.berkkibarer.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.berkkibarer.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.berkkibarer.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.berkkibarer.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=507"}],"version-history":[{"count":1,"href":"https:\/\/www.berkkibarer.com\/index.php?rest_route=\/wp\/v2\/posts\/507\/revisions"}],"predecessor-version":[{"id":509,"href":"https:\/\/www.berkkibarer.com\/index.php?rest_route=\/wp\/v2\/posts\/507\/revisions\/509"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.berkkibarer.com\/index.php?rest_route=\/wp\/v2\/media\/508"}],"wp:attachment":[{"href":"https:\/\/www.berkkibarer.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=507"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.berkkibarer.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=507"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.berkkibarer.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=507"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}