Docira

Concepts

Grounding and bounding boxes

Every element Docira extracts — heading, paragraph, table cell, figure, equation — comes with a bounding box that locates it on the source page. This is the spatial backbone of grounding: the bridge between the Markdown / JSON output and the original document pixels.

Why grounding matters

Coordinate system

Bounding boxes are normalised floats in [0, 1], relative to the page’s rendered dimensions. The origin is the top-left corner of the page; x increases to the right, y increases downward.

Normalisation means the same bbox is correct whether you render the page at 800×1000 or 2400×3000 pixels. Multiply by the target dimensions when you draw.

Element schema

Every element in pages[i].elements carries an id, a type, content fields appropriate to the type, and a bbox.

{
  "page_index": 2,
  "elements": [
    {
      "id": "el_p2_0001",
      "type": "heading",
      "level": 1,
      "text": "Methods",
      "bbox": { "x": 0.082, "y": 0.121, "w": 0.176, "h": 0.034 },
      "rotation_deg": 0
    },
    {
      "id": "el_p2_0002",
      "type": "paragraph",
      "text": "Patients aged 18-75 with confirmed metastatic disease ...",
      "bbox": { "x": 0.080, "y": 0.171, "w": 0.420, "h": 0.082 }
    },
    {
      "id": "el_p2_0007",
      "type": "table",
      "table_id": "tbl_p2_0001",
      "bbox": { "x": 0.515, "y": 0.171, "w": 0.405, "h": 0.382 },
      "rows": 14,
      "cols": 6
    },
    {
      "id": "el_p2_0014",
      "type": "figure",
      "caption_id": "el_p2_0015",
      "bbox": { "x": 0.080, "y": 0.581, "w": 0.420, "h": 0.301 }
    }
  ]
}

Element types

Using grounding downstream

Drawing highlights in a PDF viewer

Convert normalised coordinates to pixel space at render time. The page dimensions come from your renderer (PDF.js, react-pdf, native viewer); they are not in the Docira response.

// 1. Page is rendered at any pixel size, e.g. 1200 x 1600.
const pageWidthPx = 1200;
const pageHeightPx = 1600;

// 2. Convert normalized bbox to pixel-space rect for an overlay.
function bboxToRect(b) {
  return {
    left:   b.x * pageWidthPx,
    top:    b.y * pageHeightPx,
    width:  b.w * pageWidthPx,
    height: b.h * pageHeightPx,
  };
}

Redacting a specific element

For each element to redact, write a black rectangle to the source PDF at the same bbox. PyMuPDF’s page.add_redact_annot accepts the four-tuple directly once you scale to PDF user-space coordinates.

RAG citations with provenance

When you index Markdown chunks for retrieval, store the source id and page_index alongside. On retrieval, the citation can deep-link straight to the highlighted region in the original PDF — no approximate “page 12, paragraph 3” pointers.

Rotated and scanned content

Scanned pages with skew are deskewed before classification, so their bounding boxes are reported relative to the deskewed page. The rotation applied is recorded on the page object as page.deskew_deg if you need to map back to the raw scan. Rotated table cells (vertical headers) carry per-element rotation_deg.

See also

Ready to integrate? Read the API docs →