Concepts
Agentic routing
Docira does not pick one model and hope it works. For every page, the router reads the page, decides which tier and provider should handle it, verifies the output against an independent OCR baseline, and records the entire decision so you can audit it after the fact.
The six-stage pipeline
Each request passes through six stages. The router sits at stages 2–4 and is the part you usually mean when you say “Docira”.
- 1. Ingestion. Document bytes are normalized and split into pages. PDFs are rasterised to image tiles. Office docs round-trip through a headless converter.
- 2. Classification. Each page is scored on a complexity vector — table density, column count, handwriting, equations, scan quality, language, document- type hint. Output is a single 0–1 complexity score plus the feature map.
- 3. Tier selection. The score maps to one of three tiers — Fast (≤0.30), Pro (0.30–0.80), Expert (>0.80). Thresholds are tunable via the
PW_FAST_TIER_MAX_SCOREandPW_PRO_TIER_MAX_SCOREsettings. - 4. Provider selection. Within the tier, the router picks the provider with the highest verified accuracy on this feature mix, subject to the per-provider circuit-breaker state.
- 5. Verification. The VLM output is compared to a Tesseract OCR baseline. If overlap or structural score is below threshold, the page is rerouted to a higher tier and the original output is kept as a fallback.
- 6. Delivery. Pages stitch back into a single response with Markdown, JSON, bounding boxes, the routing trace, and the metering record.
The page classifier
The classifier runs locally on a small SmolDocling-class model (with a heuristic fallback for cold-start). It does not call any upstream provider. Its job is to be fast (typically <50 ms per page) and to surface the features the router actually decides on.
Features the classifier surfaces
- has_tables — at least one table region detected, including merged or rotated cells.
- has_equations — inline or display math; affects Expert-tier promotion.
- has_handwriting — cursive or print handwriting; influences provider selection.
- is_scanned — image- origin page (low DPI, scan artefacts, skew); triggers a different preprocessing path.
- column_count — 1, 2, or 3+; multi-column layouts get providers that preserve flow.
- language — ISO 639-1 code, plus a script flag for RTL detection.
- doc_type_hint — one of academic_paper, clinical_guideline, financial_report, contract, invoice, slide_deck, or generic. Used to apply a doc-type preset upstream.
Tier selection
Tiers exist because clean text on a single-column PDF does not need a frontier-class model — and equations on a scanned chemistry exam should not be answered by a small one. The mapping is deterministic and tunable.
| Tier | Score range | Typical pages | Cost band |
|---|---|---|---|
| fast | ≤ 0.30 | Single-column typed text, simple lists | $0.001 / page |
| pro | 0.30 – 0.80 | Multi-column, tables, charts, scanned but readable | $0.005 – $0.012 / page |
| expert | > 0.80 | Equations, dense tables, handwriting, RTL, low-DPI scans | $0.025 – $0.060 / page |
Provider selection within a tier
Each tier has a pool of qualified providers. Within the pool the router picks the one with the highest verified accuracy on the page’s feature mix — that is, the model that has historically performed best on this combination of layout, language, and content type. Two things can override that pick:
- Circuit breaker. If a provider has tripped (consecutive failures, quota exhaustion, or a 5xx pattern) it is skipped until cooldown elapses. Cooldown grows exponentially up to 300 s.
- Self-hosted fallback. If you set
PW_VLLM_BASE_URLthe router will keep your private vLLM instance in the candidate pool — useful for sensitive documents that should never leave your network.
Verification and re-routing
Before returning a page’s output, Docira runs an independent Tesseract OCR pass and compares it to the VLM’s text. The comparison produces three signals:
- ocr_baseline_overlap — token-level overlap with the OCR pass.
- table_structure_score — table cell-count and header alignment match.
- layout_consistency — paragraph and column ordering match the bounding-box graph.
If any score falls below the configured threshold (PW_GRADE_*_MIN) the page is rerouted to the next tier up and the second result is returned. The original result is kept in the trace as reroute_history so you can compare them.
The routing trace
Every page in the response carries a complete routing trace. This is the differentiator: nothing the router does is hidden. You can audit per page why a tier was picked, which provider answered, what the verification score was, and whether the result was rerouted.
{
"trace_id": "rt_01HZX7K8M2",
"page_index": 4,
"classification": {
"complexity_score": 0.71,
"features": {
"has_tables": true,
"has_equations": false,
"has_handwriting": false,
"is_scanned": false,
"column_count": 2,
"language": "en",
"doc_type_hint": "academic_paper"
},
"duration_ms": 38
},
"tier_selection": {
"tier": "pro",
"reason": "complexity_score in [0.30, 0.80]",
"thresholds": { "fast_max": 0.30, "pro_max": 0.80 }
},
"provider_selection": {
"candidates": ["anthropic/claude-sonnet-4-6", "openai/gpt-4-1-mini", "google/gemini-2-5-pro"],
"chosen": "anthropic/claude-sonnet-4-6",
"reason": "highest verified accuracy on 'tables + 2-column' benchmark",
"circuit_breaker_state": "closed"
},
"vlm_call": { "duration_ms": 1240, "tokens_in": 1842, "tokens_out": 967, "cost_usd": 0.0073 },
"verification": {
"ocr_baseline_overlap": 0.91,
"table_structure_score": 0.94,
"verdict": "accept",
"reroute_attempted": false
},
"total_duration_ms": 1314
}See also
- Grounding and bounding boxes — how to use the spatial output the router emits.
- API reference — the request/response schema in full.
- Webhooks — async result delivery for batch jobs.