EN 301 549 Evidence Scanner

Reproducible findings + clause mapping for EU procurement. Developer‑ready JSON outputs.

Who It's For

EU public-sector vendors, agencies, and in-house teams who must submit EN 301 549-mapped evidence (not just "WCAG issues").

QA/Dev teams who need deterministic, rerunnable results and raw data for dispute resolution.

What Makes It Different

🎯 Clause Mapping

Every issue carries WCAG 2.1 AA and EN 301 549 clause IDs with explicit crosswalk, not just "EAA mention".

🔄 Reproducibility Contract

Stable issue IDs: sha1(normalizedUrl + ruleId + selector)
Pinned engine versions (axe-core@4.8.3, Chromium build), OS, timezone, viewport, device profile
Run manifest includes hashes, timings, resource caps, and raw axe JSON
Acceptance KPI: ≥95% issue reproduction on rescan of stable pages

📋 Clear, Actionable Summary

Top issues by impact × frequency
EN 301 549 clause mapping for every finding
Deterministic, reproducible IDs per issue

👨‍💻 Developer-Ready Artifacts

issues.jsonl - Line-delimited JSON for streaming
summary.json - Aggregated statistics -- JSONL (issues.jsonl) for streaming / pipelines
accessibility-statement.md - Pre-filled draft

⚙️ SPA Realism

maxWait, waitForSelector[], javascriptEnabled
Device profiles (desktop/mobile) with proper viewport
Resource caps (block media >5MB), timing included
Auth support (basic auth + cookie injection)

🔎 Focused Scope

Web pages only (JSON results). No PDF report generation.

Quick Start

{
  "startUrls": [{"url": "https://example.com"}],
  "maxPages": 5,
  "outputFormats": ["json", "jsonl"]
}

Production Scan

{
  "startUrls": [{"url": "https://example.com"}],
  "sitemapUrl": "https://example.com/sitemap.xml",
  "maxPages": 1000,
  "includePatterns": ["https://example.com/**", "https://www.example.com/**"],
  "excludePatterns": ["\\?(utm_|ref=)", "/admin/"],
  "auth": {
    "basic": {"user": "username", "pass": "password"},
    "cookie": "session=abc123; Path=/; Domain=example.com"
  },
  "javascriptEnabled": true,
  "device": "desktop",
  "waitForSelector": ["main", "#app"],
  "scanPdfs": true,
  "outputFormats": ["json", "jsonl"],
  "language": "en"
}

Input Schema

Parameter	Type	Default	Description
`startUrls`	Array	Required	URLs to scan for accessibility compliance
`sitemapUrl`	String	-	XML sitemap URL for automatic discovery
`maxPages`	Integer	1000	Maximum pages to analyze (1-50,000)
`includePatterns`	Array	`[]`	Regex patterns for URLs to include
`excludePatterns`	Array	`[]`	Regex patterns for URLs to exclude
`auth.basic`	Object	-	Basic auth: `{user, pass}`
`auth.cookie`	String	-	Cookie string for authentication
`javascriptEnabled`	Boolean	`true`	Enable JS execution for SPAs
`device`	String	`"desktop"`	Device type: `desktop` or `mobile`
`rateLimit`	Number	5	Requests per second (1-20)
`maxConcurrency`	Integer	10	Concurrent page processing (1-50)
`maxWait`	Integer	8000	Page load timeout in milliseconds
`waitForSelector`	Array	`[]`	CSS selectors to wait for before scanning
`scanPdfs`	Boolean	`true`	Extract and analyze linked PDFs (JSON only; no PDF rendering)
`blockMediaOverBytes`	Integer	5242880	Block media files over this size (5MB)
`language`	String	`"auto"`	Report language: `auto`, `en`, `de`
`outputFormats`	Array	`["json","jsonl"]`	Desired output formats

Output Schema

Stable Issue Format

{
  "id": "a1b2c3d4e5f6...",
  "url": "https://example.com/page",
  "selector": "main h1",
  "snippet": "<h1>...</h1>",
  "ruleId": "aria-required-attr",
  "wcag": "2.1-1.3.1",
  "en301549": ["9.1.3.1", "11.1.3.1"],
  "impact": "serious",
  "repro": ["Open page", "Tab to element", "Observe issue"],
  "recommendation": "Add aria-label attribute",
  "engine": {"axe": "4.8.3", "chromium": "123.0.0"},
  "timings": {"ttfbMs": 210, "domContentLoadedMs": 950, "scanMs": 1200},
  "timestamp": "2025-09-19T10:15:00Z"
}

Run Manifest

{
  "runId": "en301549-2025-09-19T10-15-00-a1b2c3",
  "env": {"os": "linux", "tz": "UTC", "viewport": "1366x768", "device": "desktop"},
  "engines": {"axe": "4.8.3", "playwright": "1.45.0", "chromium": "123.0.0"},
  "limits": {"blockMediaOverBytes": 5242880, "pageSizeCapBytes": 2621440},
  "crawl": {"pagesScanned": 487, "skipped": 23, "errors": 4, "durationSec": 1198}
}

Output Files

Developer Artifacts

issues.jsonl - Line-delimited JSON for streaming/processing
summary.json - Aggregated statistics and compliance scores
manifest.json - Run environment and reproducibility data • Issues JSONL and complete JSON + summary/manifest JSON

Client Deliverables

• Evidence artifacts in JSON/JSONL with EN 301 549 mapping

Pricing

Pay per event (as on the Apify Store page):

Actor start: $0.0001 per run
Page scanned: $0.005 per page

Engine & Detectors

Core Engine

Playwright crawler with Chromium (pinned version)
@axe-core/playwright for WCAG 2.1 AA compliance
Custom detectors for enhanced coverage:

Beyond Axe-Core

Contrast resolver that handles CSS custom properties
Landmark structure validation (missing <main>, multiple <h1>)
Language consistency checker
Keyboard trap detector (2000ms timeout)
Focus order anomalies (sequence jumps across landmarks)
PDF quick checks (tagged, title, bookmarks, alt objects count)

Quality Gates

✅ Reproducibility: ≥95% identical issue IDs on stable content ✅ Performance: 1K pages ≤ 20 min on Apify standard worker ✅ Results Clarity: Grouped “Top issues” with affected pages ✅ Transparency: Raw axe nodes exposed for each violation ✅ False-Positive Control: Conservative detectors; uncertain issues tagged as "needs manual check"

Demo Datasets

Public Sector Example

# German government website
curl -X POST https://api.apify.com/v2/acts/barrierefix~en301549-scanner/runs \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -d '{"startUrls": [{"url": "https://www.bundesregierung.de"}], "maxPages": 10}'

E-Commerce Example

# Online shop with checkout flow
curl -X POST https://api.apify.com/v2/acts/barrierefix~en301549-scanner/runs \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -d '{"startUrls": [{"url": "https://shop.example.com"}], "includePatterns": ["/(product|cart|checkout)/"], "maxPages": 50}'

Integration Examples

CI/CD Pipeline

# .github/workflows/accessibility.yml
- name: EN 301 549 Compliance Check
  run: |
    curl -X POST https://api.apify.com/v2/acts/barrierefix~en301549-scanner/runs \
      -H "Authorization: Bearer ${{ secrets.APIFY_TOKEN }}" \
      -d '{"startUrls": [{"url": "${{ env.STAGING_URL }}"}], "maxPages": 100}' \
      | jq '.data.defaultDatasetId' | xargs -I {} \
      curl "https://api.apify.com/v2/datasets/{}/items" \
      | jq '.[] | select(.summary.compliance.wcag21AA.score < 95) | error("Accessibility compliance below threshold")'

Monitoring Dashboard

// Check compliance score
const response = await fetch(`https://api.apify.com/v2/datasets/${datasetId}/items`);
const results = await response.json();
const complianceScore = results[0].summary.compliance.wcag21AA.score;

if (complianceScore < 95) {
  alert(`Accessibility compliance dropped to ${complianceScore}%`);
}

Comparison vs Competitors

Feature	EN 301 549 Scanner	Equally AI	Accessibility Checker	Manual Audit
EN 301 549 mapping	✅ Complete crosswalk	❌ WCAG only	❌ WCAG only	✅ Manual
Stable issue IDs	✅ SHA1-based	❌ Unstable	❌ Unstable	❌ No IDs
Reproducibility	✅ 95%+ guarantee	❌ No guarantee	❌ No guarantee	❌ Manual variance
Report output	JSON/JSONL	❌ Dashboard only	❌ Limited export	✅ Custom
Raw data access	✅ JSON/JSONL	❌ UI only	❌ Limited export	❌ Documents only
SPA support	✅ Full JS support	✅ Yes	⚠️ Limited	✅ Manual
Pricing model	€0.005/page	€/result (variable)	€/month	€€€/hour
Procurement ready	✅ EU standards	❌ US focus	❌ Generic	✅ Custom

Technical Requirements

Apify Platform

Memory: 2 GB minimum, 4 GB recommended for large sites
Timeout: 20 minutes for 1,000 pages
Storage: ~50 MB per 1,000 pages (including all outputs)

Browser Requirements

Chromium: Version 123+ (automatically provided by Apify)
Playwright: Version 1.45+ (automatically installed)
Axe-core: Version 4.8.3 (pinned for reproducibility)

Support & Documentation

Getting Started

Quick Start: Test with up to 5 pages to validate outputs
Documentation: Complete API reference and examples
Support: accessibility@barrierefix.de for technical questions

Best Practices

Start Small: Test with 10-50 pages before full site scans
Authentication: Use dedicated test accounts for consistent results
Monitoring: Set up automated scans after deployments
Review Process: Combine automated results with manual expert review

Compliance Resources

License

MIT License - See LICENSE file for details.

Support: kontakt@barrierefix.de | Website: https://www.barrierefix.de/

Crawl Scope Defaults

Default behavior: If includePatterns is not provided, the scanner follows links only on the same domain as the first startUrls entry (e.g., https://example.com/**). This prevents accidental off-domain crawls from external links like “More information…”.
Allowing off-domain: Explicitly add glob patterns to includePatterns for the domains/paths you want to include, or use broad patterns like "**" if you truly want to crawl everything linked.
Pattern types:
- Link discovery uses glob patterns (e.g., https://example.com/**).
- Sitemap URL filtering uses regular expressions (in includePatterns/excludePatterns).