First step

The first step to accessible PDFs.

Find out how far your agency is from the standard.

Let's review your current status together, no cost, no commitment. Most agencies are surprised by how simple the first step really is. Of course, you decide where it goes from here.

Status check

We analyze your current PDF workflow and identify where the gaps are.

Pilot project

We process a representative set of your documents and show the benefit concretely.

Rollout

We plan the full implementation into your infrastructure together.

Frequently asked questions

Does RockDoc also work on scanned archive PDFs?

Yes. The integrated OCR layer recognises 30+ EU official languages including special characters (umlauts, Swedish, Polish, Greek, Cyrillic scripts). For extremely poor source quality we combine OCR with vision-LLM-based layout inference, reconstructing structural cues even from scans.

How do you handle complex tables (nested, spanned cells)?

A dedicated table engine based on vision LLMs with rule-based verification. Nested tables, spanned cells (rowspan/colspan), multi-row headers and footer rows are correctly tagged with <TH>/<TD> and scope attributes. Validation runs against Matterhorn checkpoint 17 ("Tables").

Can we run a proof-of-concept with real documents before purchase?

Yes, the "Pilot" is designed exactly for that. Typical duration: 4 weeks. We set up an isolated environment inside your infrastructure, process a representative sample of your documents and deliver a quantitative conformance report. License decision only afterwards.

What hardware/infrastructure does on-prem require?

Reference sizing by volume:

Up to 1,000 docs/day: 16 vCPU · 64 GB RAM · 500 GB SSD · GPU optional
Up to 10,000 docs/day: 32 vCPU · 128 GB RAM · 1 TB SSD · 1× GPU recommended
Up to 100,000+ docs/day: cluster setup with 2× GPU nodes

Container-based (Docker/Kubernetes/Podman). Bare-metal and VMware also supported.

How is the model updated as you improve the AI?

Two modes: (1) online update via a signed update channel (encrypted, certificate-authenticated), (2) air-gapped update via signed update pack on physical media for classified ("VS-NfD") environments. Updates are rollback-capable; model versions are logged in the audit trail.

How is the data-protection audit handled?

A GDPR Article 28 data-processing agreement (DPA) is standard. Technical and organisational measures are documented; ISO 27001 + BSI C5 Type 2 are in place. With on-prem operation, the data flow is trivial from a GDPR perspective, no data leaves your infrastructure.

What licensing models are available?

Three models, depending on your needs:

Volume-based: licensed per documents/year.
Capacity-based: licensed by FTE or throughput class.
Enterprise flat rate: unlimited processing across the group, including subsidiaries.

Public-sector procurement: via BBG framework contracts (Federal Procurement Agency) or direct award under BVergG.

What happens to legacy archives?

Bulk migration runs in the background, parallel to live operations. Typical approach: the migration pipeline runs at lower priority while fresh documents get first-class throughput. Migration of a 10-million-PDF archive typically takes 6–12 weeks.

How do you differ from axesPDF, Pave, Equidox, allyant, axes4?

Three differentiators:

On-prem as the core architecture, not an add-on. Many competitors only offer on-prem via custom contracts.
Vision-LLM-based layout recognition for complex layouts and historical archives, higher accuracy on non-standard PDFs.
Mass throughput with linear scaling, built for million-PDFs-per-day pipelines at banks, insurers and utilities.

Do you support PDF/UA-2 (ISO 14289-2:2024)?

PDF/UA-1 (ISO 14289-1) is the current production standard. PDF/UA-2 support is on the roadmap and will be rolled out to production once the validator landscape (PAC, veraPDF) supports PDF/UA-2 reliably, expected H2/2026.