Glasshouse: Audit Any Website for GDPR Violations in 90 Seconds (and File a DPA Complaint in Two More)

·26 min read·

I scanned ten popular consumer websites. All ten violated the GDPR. The audit took 17 minutes of compute time and cost zero consulting hours.

That's not the kind of sentence you write without backing it up, so let me be precise about what "violated" means here. Across the ten sites — a mix of social, retail, travel, and news, each a household name in Europe — every single one failed at least one criterion the European Data Protection Board has been explicit about in published guidelines since 2023. Seven of them fired tracking requests before any user interaction with the consent banner, which is the textbook ePrivacy Art. 5(3) violation that French and Spanish DPAs have been fining for years. Five of them shipped a "reject all" path that, in practice, did not stop tracking — trackers continued to fire, cookies continued to persist; the only thing that changed was that the banner disappeared. None of them offered a withdrawal mechanism that meets the plain text of GDPR Art. 7(3): "It shall be as easy to withdraw consent as to give it."

I have written about pieces of this before. Browser fingerprinting — the tracking that survives every cookie control, embeds in your GPU rendering quirks, and routinely identifies users with greater than 99% accuracy. Cookie banner dark patterns — the engineering decisions that produce asymmetric buttons, multi-layer reject paths, and revocation pages that disappear without deleting anything. Each post forced me to extend the tool I was using to find the evidence. Today that tool is on GitHub. It's called Glasshouse, it's MIT-licensed, it runs on your laptop, and it does two things: it audits any website against GDPR and ePrivacy in about 90 seconds, and it turns the resulting scan into a ready-to-file complaint dossier for a Data Protection Authority of your choice in about two minutes more.

The Short Version#

The Two Gaps#

There is a popular story about why the web is the way it is: enforcement is slow, regulators are under-resourced, the violators are large, and the law is ambiguous. Some of that is true. The slow-enforcement part is true. The under-resourced part is true. But the ambiguous part is mostly false. The law is unusually specific by regulatory standards — the ePrivacy Directive is from 2002, the GDPR from 2016, the EDPB's dark-patterns guidelines from 2023, and the consent regime hasn't materially changed since the Planet49 judgment in 2019. There is no reasonable engineer who can claim, in 2026, not to know that pre-consent tracking is illegal in Europe. The information is sitting there in publicly-available, freely-readable documents written in unusually plain English for legal texts.

So the question is not why the law fails to inform. The question is why the law fails to bite. And the honest answer, after spending several months scanning the same kinds of sites that DPAs and NGOs have been pointing at for a decade, is that compliance and enforcement both fail at exactly the same thing: distribution. The information about what your website actually does — every tracker request, every cookie set, every fingerprinting API called in the first 8 seconds after a user clicks "Accept" — is not reaching the people who could fix it or push back against it. There are two gaps. They look different but they have the same shape.

The visibility gap#

Inside a company, almost nobody runs a scan. The frontend team trusts the CMP vendor. The CMP vendor trusts whatever tag-management config the marketing team uploaded. The tag-management config trusts the agency that wrote the ad-tech pixels. The privacy notice was last updated by an external counsel who relied on a checklist drafted in 2019. None of these people have an inward-facing view of what the site actually loads in production after a user clicks through the banner. They have model views — diagrams, configuration documents, vendor self-assessments — but no instrumented look at the actual network requests, the actual cookies, the actual fingerprinting calls. Most of the time, nobody on the inside is looking.

Outside the company, the picture is very different. NOYB looks. Academics look. Journalists look. The occasional regulator looks. And when any of them looks, they find a violation. NOYB's 2021 study of 422 popular European websites found that 81% had no reject button on the first banner layer and only 18% offered an easy withdrawal mechanism. Mathur et al.'s 2019 Dark Patterns at Scale paper crawled 11,000 shopping sites and found dark patterns on 11.1% of them, with prevalence increasing with site popularity. My own scan of 10 popular consumer sites — different industries, different countries, different CMP vendors — produced 10/10 failures. The hit rate is not a coincidence. It's a function of the fact that the outside observers are the only people running the scan at all.

This asymmetry produces a specific kind of corporate stuckness. Engineering teams I've talked to are not refusing to comply. They genuinely do not know what their stack ships. When a regulator or a journalist publishes a finding, they're often surprised by their own behaviour — we didn't know the third-party fraud-prevention SDK was reading the canvas; we didn't know the new analytics version was firing before consent; we didn't know the "save preferences" button kept the tracking cookies. And the surprise is real. The mechanism by which they would have known — a regular, low-effort, locally-run scan — doesn't exist as a normal practice at most mid-size companies.

Every dark pattern in a cookie banner exists because someone wrote a Jira ticket, a CSS class, an event listener, or a tag-management rule that produced exactly the behaviour you see. The reject button is a <a> tag instead of a <button> because a stakeholder wanted to "reduce friction on opt-in." The accept button is filled and the reject button is outlined because the design system's "primary action" token won the argument. These are features, shipped on purpose.

That's a passage from the dark patterns post. I stand by it. But "shipped on purpose" is a slightly unfair phrase if nobody on the team has the means to observe what was shipped. The visibility gap is upstream of the malice debate. Even if every engineer involved wanted to comply, they would still need a tool that tells them whether their site, as deployed, actually does.

The filing gap#

Now flip to the other side of the wall. Imagine you are an individual user — a citizen with a GDPR right to lodge a complaint under Art. 77, with a fact pattern in your hands (a screen recording, a memory, a hunch about a banner that wouldn't take "no" for an answer). Your right is real. The DPA has a public submission form. The complaint is, technically, free.

In practice, the cost is enormous. To file a complaint that the DPA will actually process and not dismiss as malformed, you need to know which authority has jurisdiction (lead supervisory authority under the GDPR one-stop-shop, or your local DPA?), which articles to cite (Art. 5(3) of ePrivacy or Art. 6 of GDPR or both?), what counts as evidence (screenshot? full HAR file? signed statement?), and what format the specific DPA prefers (the AP in the Netherlands wants their online form, BfDI in Germany wants a structured letter, CNIL in France has a particular intake flow). The realistic cost of getting that right is a weekend of reading the EDPB's guidelines and your DPA's submission rules, or €500–€2,000 paid to a privacy lawyer who already knows the dance.

This is the filing gap. And it is the reason the regulator-pace-of-enforcement story is incomplete. Regulators are slow not just because they're under-resourced — they are also slow because the funnel of well-formed complaints they actually receive is narrow. NOYB exists in large part to widen that funnel by industrialising the process: their roughly 700 cookie-banner complaints between 2021 and 2023 happened because they invested in a complaint-production pipeline that ordinary citizens do not have. When filing becomes cheap, complaints scale. When complaints scale, enforcement pressure compounds. This is not a hypothesis — it's what NOYB has demonstrated.

The visibility gap and the filing gap look like different problems. They are not. They are both problems of accessible tooling. Each side of the wall — the company that ships and the citizen who notices — is missing the same kind of artefact: a cheap, local, trustworthy way to see what's actually happening and to do something about it.

mode

scan

Audit any website against GDPR and ePrivacy in about 90 seconds.

  • Three-variant headless Firefox captures pre- and post-consent state
  • Detects asymmetric buttons, hidden reject paths, broken withdrawal
  • Hooks Canvas, WebGL, AudioContext, and WebGPU for fingerprinting

Outputs

scored deck (.html)markdown reportraw scan.json
mode

file

Turn a scan into a ready-to-file DPA complaint dossier.

  • Pick which findings to file — defaults are deliberately conservative
  • Nine European DPAs supported, including NL, FR, UK, IE, DE
  • Anonymised or identified complainant, your choice

Outputs

complaint.mdfacts.mdarticles-cited.mdevidence/

What Scan Mode Does#

scan is the larger half of the tool. You give it a URL; it gives you back a scored deck, a markdown report, and a raw JSON file you can introspect or feed back into file mode. There are no accounts, no servers, no central database. Everything runs locally.

Under the hood, scan mode launches three independent headless-Firefox sessions against the target URL using Playwright. The three variants are: ignore the consent banner (capture what the page does before any user input), accept (click "Accept all" and capture what happens after), and reject (click "Reject all" or, if the site forces you down a settings page, untick the optional categories and save). Capturing all three is necessary because the three reveal different violations:

During each variant, the scanner instruments the browser to capture: every network request and its full URL, every cookie that is set (with domain, path, expiry, and value-fingerprint), every call to the Canvas, WebGL, AudioContext, and WebGPU fingerprinting APIs, every read from localStorage and IndexedDB, the TLS handshake fingerprint, and a chronological timeline of significant events. The privacy-policy and cookie-policy pages are fetched and analysed for the 13 disclosures GDPR Art. 13/14 requires. Security headers (CSP, HSTS, Referrer-Policy, Permissions-Policy) are checked against current best practice. Cross-border transfers are inferred from the geographic distribution of third-party domains.

The output is scored across seven categories on a 0–10 scale and aggregated into an overall score:

CategoryWhat it measures
ConsentBanner symmetry, reject path reachability, multi-layer concealment
Pre-consent trackingNetwork calls, cookies, fingerprinting before user interaction
Legal pagesPrivacy policy completeness against Art. 13/14, cookie-policy alignment
Cross-border transfersThird-party domains in non-adequate jurisdictions, transfer mechanism disclosure
Security headersCSP, HSTS, Referrer-Policy, Permissions-Policy, X-Frame-Options
Cookie managementPre-consent cookies, post-reject cookies, declared vs. observed purposes
Dark patternsAsymmetric buttons, hidden reject, manipulative copy, broken withdrawal

A typical run, from cold start to a written report, looks like this:

scan mode
$/glasshouse www.example.com
Scout: detecting consent banner...
  CMP detected: OneTrust
  Accept button: "Accept All"
  Reject button: "Reject All"
Running 3-variant scan (ignore / accept / reject)...
  [variant 1/3] ignore: 47 requests, 12 cookies set
  [variant 2/3] accept: 184 requests, 38 cookies set
  [variant 3/3] reject: 19 requests, 4 cookies set

Total wall-clock: 87 seconds.

Analysis:
  Consent: 4.1 / 10  (reject required two clicks vs accept's one)
  Pre-consent tracking: 2.8 / 10  (8 trackers fired before consent)
  Legal pages: 7.2 / 10
  Dark patterns: 3.5 / 10  (multi-layer concealment detected)
  Overall: 4.4 / 10
Wrote: ./example.com-privacy-audit.html
Wrote: ./example.com-privacy-audit.md
Wrote: /tmp/glasshouse-example.com-1763046271203.json

The scout phase is a quick reconnaissance pass that detects the consent banner and identifies the accept and reject buttons before the real scan runs. It exists because consent banners are wildly inconsistent — half of them are TCF-compliant IAB frameworks with predictable selectors, the other half are custom in-house implementations with non-standard markup. The scout takes about ten seconds and saves the full scan from running blind.

The presentation deck is a self-contained HTML file you can open in any browser. It has 13–15 slides depending on what the scanner found, including a side-by-side comparison of the three variants, a request-pulse visualisation, a timeline of pre-consent events, a fairness-scale slide for the dark-pattern verdict, and a prioritised recommendation list. The markdown report is the same content rendered as prose, suitable for review by people who do not need the visuals. The raw JSON file is the source of truth and the input to file mode.

One thing the scan deliberately does not do: it does not name the controller. The scanner has no opinion on who is legally responsible for a site, because that is a determination requiring legal context (imprint, joint-controller arrangements, processor-vs.-controller status, intra-group arrangements). The presentation surfaces the domain and any cookie domains observed; identifying the responsible legal entity is left to the user — usually a single look at the imprint or the privacy policy is enough.

What File Mode Does#

file is the half of the tool that targets the filing gap. Where scan produces a description of what a website does, file produces a complaint dossier addressed to a specific Data Protection Authority — a folder of files you can review and submit yourself. The tool never submits on your behalf. Filing remains your act; what changes is the cost of doing it well.

The interactive flow is short. You point file at a scan's JSON output. It lists the actionable findings — distinct violations the scan identified, each tagged with the articles it implicates. You pick which findings to file. The defaults are conservative: nothing is auto-included. You pick a target DPA. You pick anonymised (placeholders for your personal data, useful if you're testing or want to redact before submitting) or identified (your real name, address, and email — stored in a one-time local profile). The tool builds the dossier.

A representative session:

file mode
$/glasshouse file /tmp/glasshouse-example.com-1763046271203.json --list-findings
{
  "meta": { "domain": "example.com" },
  "candidates": [
    { "id": "a1b2c3d4e5", "kind": "preConsentTracker", "headline": "Trackers fired before consent", "articles": ["ePrivacy 5(3)", "GDPR 6(1)"] },
    { "id": "f6g7h8i9j0", "kind": "darkPattern", "headline": "Reject button requires two clicks vs accept's one", "articles": ["GDPR 4(11)", "GDPR 7(1)"] }
  ]
}
$/glasshouse file <scan> --dpa nl-ap --include a1b2c3d4e5,f6g7h8i9j0 --anonymize --yes
Selecting Autoriteit Persoonsgegevens (NL).
Including 2 findings.
Using anonymised complainant profile.

Complainant placeholders remain in the dossier for the user to fill before submitting.

Wrote: ./dpa-complaint-example.com-2026-05-13/
  complaint.md
  facts.md
  articles-cited.md
  submission-checklist.md
  evidence/
    scan.json
    trackers.csv
    cookies.csv
    timeline.md

The dossier is six files plus an evidence folder. Each file has a specific role:

complaint.mdmarkdown
# Complaint under the General Data Protection Regulation

**To:** Autoriteit Persoonsgegevens
**From:** [COMPLAINANT NAME]
**Date:** 2026-05-13
**Concerning:** [CONTROLLER NAME]

Pursuant to Article 77 GDPR, I lodge this complaint regarding
processing of personal data carried out by [CONTROLLER NAME] in
connection with the website www.example.com.

The processing in question is described in the attached
facts.md and is, in my view, in violation of:

- Article 5(3) of the ePrivacy Directive (2002/58/EC)
- Article 6(1) of Regulation (EU) 2016/679 (GDPR)
- Article 7(1) of Regulation (EU) 2016/679 (GDPR)

The verbatim text of each provision is reproduced in
articles-cited.md. Supporting evidence is included in the
evidence/ folder of this dossier.

I request that the Authority investigate the matters set out
in facts.md and take appropriate action under its powers in
Article 58 GDPR.

Yours sincerely,

[COMPLAINANT NAME]
[STREET]
[POSTAL CODE] [CITY]
[COUNTRY]
[COMPLAINANT EMAIL]

Nine European DPAs are supported in the current release: the Netherlands (AP), France (CNIL), the United Kingdom (ICO), Ireland (DPC), Germany at the federal level (BfDI), and Germany at the Land level for Berlin, Hamburg, Bavaria, and North Rhine-Westphalia. Each authority has a JSON adapter file specifying its submission rules, language preferences, and any local citation conventions. Adding a new DPA is one JSON file plus a validation run — the schema lives at references/dpa-adapters/_schema.json and most adapters take an hour to write if you have access to the authority's published intake guidance.

The political-economy point worth making here is small but important. Most analyses of GDPR enforcement assume the binding constraint is regulator capacity — there aren't enough authorities, they don't have enough staff, the lead-supervisory-authority bottleneck (Ireland's DPC processing complaints against most of the large US tech companies) slows everything down. All of that is true. But it is not the only binding constraint. The cost of filing a well-formed complaint is also a binding constraint, and a less-discussed one. NOYB has demonstrated, by industrialising filing at scale, that complaint volume scales as filing cost drops. If filing cost drops far enough that a private individual can produce a serious complaint dossier in five minutes from a single scan, the funnel widens substantially. Cheap, locally-runnable filing tools are the inverse of the usual enforcement-scaling story: instead of trying to make regulators bigger, they try to make citizen filings smaller and more frequent.

I built file mode in part because I wanted to see whether this was possible at all. It is — the dossiers it produces are reviewable, citable, and structurally identical to the ones a privacy lawyer would produce, modulo polish. It is also, deliberately, the half of the tool with the most guardrails. The defaults are conservative because the cost of producing a filing the user has not actually endorsed is high. The tool prompts for confirmation on every consequential choice. It never submits. The complainant data, if not anonymised, is stored locally in ~/.claude/privacy-complaint/ and is never transmitted anywhere. These are not accidents of implementation. They are the only configuration in which a tool like this should exist.

How It Got Built#

I did not set out to build a privacy auditor. I set out to write a blog post.

The fingerprinting post was first. I wanted to explain, with actual numbers, why fingerprinting matters in 2026, and the prose kept hitting a wall at the same point: I'd describe a vector — Canvas, AudioContext, WebGPU — and have nothing better than "trust me" to back the claim that real sites in the real world were using it. The literature has the numbers but the literature is two years out of date by the time it's cited. So I wrote a small Playwright scanner that hooked the APIs and watched what real sites did. The first version was ugly. The second was less ugly. By the third I had something that could give me a useful answer about a specific site in under a minute.

The dark-patterns post forced the next set of features. Writing about banner asymmetry and multi-layer reject paths needs more than a script that loads a page and reads cookies — it needs a tool that actually clicks the buttons and notices what the page does between the click and the steady state. So I added the three-variant accept/reject/ignore framework. The multi-layer banner traversal — climb from layer 1 to layer 2 when the reject button is hidden behind "Manage settings" — went in because half the sites I wanted to write about used that pattern, and skipping them would have been intellectually dishonest.

Each blog post in this series taught the scanner one thing it didn't know how to do before. The skill is the artefact of the writing, not the other way around. The capstone post you're reading now is the first one where the scanner had nothing left to learn for the prose — which felt like the right moment to give it to other people.

Install + Use It#

Glasshouse is on GitHub at github.com/datagobes/glasshouse. The repository contains a SKILL.md and the supporting scripts; on a clean machine, installation is three commands.

git clone https://github.com/datagobes/glasshouse.git ~/.claude/skills/glasshouse
cd ~/.claude/skills/glasshouse
npm install && npx playwright install firefox

After that, you run it through Claude Code with /glasshouse <url> for a scan, or /glasshouse file <scan-json-path> to build a complaint dossier from an existing scan. The skill is invoked the same way as any other Claude Code skill — the slash command parses your arguments, the model orchestrates the run.

The license is MIT. There is no telemetry, no central database, no cloud service. All scan data stays on your machine.

A few things the tool deliberately does not do, because they are constraints rather than missing features:

The tool is opinionated where it needs to be and unopinionated where it doesn't. The scoring rubric is documented in references/scoring.md and you can disagree with it; the criterion files in references/criteria/ are each a small markdown document describing exactly what the scanner checks, the legal basis it cites, and the verified enforcement examples backing it up — every claim is auditable.

If you want to contribute, three surfaces are obvious. DPA adapters — one JSON file per supervisory authority, schema documented; new authorities are welcome (especially the German Länder not yet covered, plus the other Member States). Criterion files — there are 15 in the current release and the legal landscape is moving; new precedents from CNIL, the AEPD, or the EDPB regularly justify a new file or an update to an existing one. Scanner improvements — bot-detection evasion, additional fingerprinting hooks (WebGPU is the next frontier), or better handling of weird custom banners.

The Bigger Argument#

The two predecessor posts each ended with a version of the same observation: the violations are visible, the law is clear, and the only missing piece is the willingness to look. That's a satisfying closer to write but, as a working theory, it has always been incomplete. The willingness to look is not free. It costs tools, time, and attention. The reason "we didn't know" has been a workable defence for so long is not that it's true — it's that the cost of disproving it has been too high for the people who'd want to.

Open-source compliance tooling is politically interesting because it changes the price of that disproof. When a single command and 90 seconds reveals what a website actually loads in the eight seconds after consent, the asymmetry between insider and outsider knowledge collapses. The frontend engineer at a mid-size retailer can run the same audit a journalist or a regulator's intern would run. The citizen who notices a banner that won't take "no" for an answer can produce a complaint dossier of the same structural quality a privacy lawyer would. The cost of looking goes down on both sides of the wall — and once cost goes down, the "we didn't know" defence stops being credible.

That's the whole argument. It is, deliberately, a small one. I am not claiming Glasshouse is going to fix GDPR enforcement, or that open-source tooling alone will close the gap between what the law says and what websites do. The law is bigger than the tool and so is the gap. But every individual scan run, every individual complaint filed, every individual engineering team that pulls a site up internally and looks at what it ships — that is the unit of work, and the unit of work is now cheaper than it was yesterday.

If you take one thing away from this post, take a CLI command. Scan your own site first. Look at what your stack actually loads, what your CMP actually does, what your "reject all" button actually stops. The result will surprise you, or it won't — either is informative. Then, if you're so inclined, scan someone else's. The point of cheap tooling is precisely that it does not require an ideology. It just requires that you run it.


Related reading: Browser Fingerprinting in 2026 covers the tracking that survives cookie controls; Cookie Banner Dark Patterns in 2026 covers the engineering decisions behind the consent UIs every site ships. This post is the capstone of that series.

[S.01]§ Related

Cookie Banner Dark Patterns in 2026: How They Work, Why Regulators Are Cracking Down, and How to Build Symmetric Consent

I scanned 10 popular consumer sites for dark patterns. All 10 failed at least one EDPB criterion; half had a reject path that did not actually delete tracking cookies. The CSS, the GDPR violations, and what symmetric consent costs.

20 min read

Browser Fingerprinting in 2026: How It Works, Why Regulators Are Cracking Down, and How to Defend Against It

The tracking method that survives when cookies die. A technical guide to canvas, WebGL, AudioContext, and WebGPU fingerprinting — what GDPR and ePrivacy actually say, and what defenses hold up.

15 min read

Pre-Consent Tracking in 2026: How It Works, Why Regulators Are Cracking Down, and How to Build a Site That Actually Waits

I scanned 10 popular consumer sites and timed what happens in the first few hundred milliseconds of each page load. Every single one fired non-essential trackers before the consent banner appeared. Why "we use Consent Mode v2" is not a defense, what the law actually says, and what a site that genuinely waits for consent looks like in code.

23 min read