Concepts
Grounding and bounding boxes
Every element Docira extracts — heading, paragraph, table cell, figure, equation — comes with a bounding box that locates it on the source page. This is the spatial backbone of grounding: the bridge between the Markdown / JSON output and the original document pixels.
Why grounding matters
- Auditability. Every claim in the output points to a region on a page. RAG citations stop being approximate.
- Redaction. You can redact a specific cell or sentence in the original PDF without touching the rest of the document.
- Highlights. Reviewer UIs can draw a box around the exact text that produced an answer.
- Verification. Two elements that overlap are inspected by the verifier — a clean spatial graph is one of the signals the router uses to decide whether to reroute.
Coordinate system
Bounding boxes are normalised floats in [0, 1], relative to the page’s rendered dimensions. The origin is the top-left corner of the page; x increases to the right, y increases downward.
- x, y — top-left corner of the box.
- w, h — width and height of the box.
- rotation_deg — rotation in degrees, clockwise positive. Present when non-zero (rotated cells, vertical headers).
Normalisation means the same bbox is correct whether you render the page at 800×1000 or 2400×3000 pixels. Multiply by the target dimensions when you draw.
Element schema
Every element in pages[i].elements carries an id, a type, content fields appropriate to the type, and a bbox.
{
"page_index": 2,
"elements": [
{
"id": "el_p2_0001",
"type": "heading",
"level": 1,
"text": "Methods",
"bbox": { "x": 0.082, "y": 0.121, "w": 0.176, "h": 0.034 },
"rotation_deg": 0
},
{
"id": "el_p2_0002",
"type": "paragraph",
"text": "Patients aged 18-75 with confirmed metastatic disease ...",
"bbox": { "x": 0.080, "y": 0.171, "w": 0.420, "h": 0.082 }
},
{
"id": "el_p2_0007",
"type": "table",
"table_id": "tbl_p2_0001",
"bbox": { "x": 0.515, "y": 0.171, "w": 0.405, "h": 0.382 },
"rows": 14,
"cols": 6
},
{
"id": "el_p2_0014",
"type": "figure",
"caption_id": "el_p2_0015",
"bbox": { "x": 0.080, "y": 0.581, "w": 0.420, "h": 0.301 }
}
]
}Element types
- heading — with level 1–6.
- paragraph — text + language hint when not the document default.
- list / list_item — ordered and unordered, nested.
- table — references a table_id; cells are in a sibling tables[] collection with their own per-cell bboxes.
- figure — image or chart region; pairs with a caption_id.
- equation — LaTeX content + bbox of the rendered glyph block.
- footnote — bound to the originating element via anchor_id.
- page_header / page_footer — running content. Excluded from main-flow reading order by default.
Using grounding downstream
Drawing highlights in a PDF viewer
Convert normalised coordinates to pixel space at render time. The page dimensions come from your renderer (PDF.js, react-pdf, native viewer); they are not in the Docira response.
// 1. Page is rendered at any pixel size, e.g. 1200 x 1600.
const pageWidthPx = 1200;
const pageHeightPx = 1600;
// 2. Convert normalized bbox to pixel-space rect for an overlay.
function bboxToRect(b) {
return {
left: b.x * pageWidthPx,
top: b.y * pageHeightPx,
width: b.w * pageWidthPx,
height: b.h * pageHeightPx,
};
}Redacting a specific element
For each element to redact, write a black rectangle to the source PDF at the same bbox. PyMuPDF’s page.add_redact_annot accepts the four-tuple directly once you scale to PDF user-space coordinates.
RAG citations with provenance
When you index Markdown chunks for retrieval, store the source id and page_index alongside. On retrieval, the citation can deep-link straight to the highlighted region in the original PDF — no approximate “page 12, paragraph 3” pointers.
Rotated and scanned content
Scanned pages with skew are deskewed before classification, so their bounding boxes are reported relative to the deskewed page. The rotation applied is recorded on the page object as page.deskew_deg if you need to map back to the raw scan. Rotated table cells (vertical headers) carry per-element rotation_deg.
See also
- Agentic routing — how the spatial graph feeds the verification step.
- API reference — the full response schema, including tables[] and elements[].
- Tables & forms — table-specific grounding examples.