Articles on: PDF extraction

Directives (Data Extractor)


Directives — Tell the extractor exactly what you want

Directives are powerful one‑line commands you can optionally add to the Extra Instructions section of your Data Extractor to further fine tune your extraction process. They behave similarly to advanced search operators in Google and other search operators.


How to activate Directives in the Data Extractor


  • Each directive begins with @ and guides our AI engine before it processes your PDF.
  • Format: @directive_name: optional details here
  • Each directive must be 1) added on its own line in the Extra Instructions text box and 2) below / after any other Extra Instructions.


@ocr_mode —Tell the extractor how to “look” at your PDF


Mode

What the extractor reads

Ideal for

How to write

(default)

Native PDF text & structure

Most PDFs

(leave the directive out)

vision

Screenshot plusany native text

Interactive forms (radio buttons, check-boxes, layered elements) that confuse native parsing

@ocr_mode:vision

vision_only

Screenshot only(ignores any embedded text)

Scans whose built-in OCR is garbage or mis-aligned

@ocr_mode:vision_only


When to use

  • vision – the file is technically “searchable,” but complex form elements or hidden layers are throwing off field locations.
  • vision_only – the PDF came from a scanner that added poor OCR, duplicates lines, or puts text in the wrong order.


Tips

  • Start with vision if results look mismatched or columns are empty.
  • Switch to vision_only if you see jumbled characters or obviously wrong words in the raw output.
  • Remove the directive entirely to fall back to the fastest default mode.


@parallel — Process pages together or one‑by‑one


What it does

Chooses whether the extractor handles each page separately (“parallel”) or treats the whole file as a single document.


When to use

  • Multi‑page forms where every page stands alone
  • Speeding up very large files


How to write

  • @parallel:true – forces page‑by‑page mode
  • @parallel:false – keeps the file together
  • Leave it off and we’ll choose the best setting for you


@split_file — Break one long PDF into many small ones


What it does

Splits a PDF that actually contains a stack of individual documents (e.g., 50 invoices in one upload). Each piece is then extracted on its own.


When to use

  • You merged multiple docs to save upload time
  • You received a bulk scan from a vendor


How to write

@split_file: start a new document whenever the page says "Invoice Number"

If you omit the instruction text, we’ll try to detect changes automatically.


Tip

If Extract multiple rows per document is off, each split piece returns just one row.


@exclude_pages — Skip pages you don’t need


What it does

Filters out pages before extraction.


When to use

  • Cover sheets
  • Advertising inserts
  • Terms & Conditions or other boilerplate


How to write

@exclude_pages: ignore any page that contains "Terms and Conditions"

The rule must be based on content within each individual page.


Using both @split_file and @exclude_pages? We split first, then drop unwanted pages.


@parallel_extended_context — Carry key info across pages


Parallel mode is fast, but sometimes data that appears on page 1 is needed to understand page 2 (for example, an employee name, table headers, or a bold category). Tell us what that missing “bridge” is:


@parallel_extended_context: Employee name


Common scenarios

  • Timecards – Employee name on page 1 applies to hours listed on page 2

@parallel_extended_context: Employee name

  • Multi‑page tables – Column headers appear only once

@parallel_extended_context: Table headers


We automatically pull the most recent value of that field from earlier pages before extracting the current one.


@deep_thinking — Give the AI extra brainpower (advanced)


Most extractions run instantly. For very complex tasks—large calculations, tricky logic—you can let the AI spend more time “thinking” first.


@deep_thinking:0 (default, no extra time)

@deep_thinking:1 (a little more)

@deep_thinking:2 (moderate)

@deep_thinking:3 (maximum)


When to try it

  • Complicated reconciliations
  • Multi‑step logic that humans would “work out on paper”


Heads‑up: More thinking isn’t always better. Start with the default and increase only if you see gaps in highly analytical outputs.


Email-only directives (Email Extractor)


By default, the email extractor processes both the contents of the forwarded email and all of the attachments. You can ignore either the email contents or specific attachments with the following directives:


@attachments_only

Ignores the email subject/body and treats each attachment as its own document. You can use any of the other Data Extractor directives with the email extractor only if attachments_only is also used.


When to try it:

  • Use if you have complex attachments that require additional data extractor directives to process correctly


@skip_attachments

Allows you to skip some or all of the attachments in a forwarded email.


@skip_attachments with no additional instructions will skip all attachments in the email. If you want to only skip certain attachments, you can append specific instructions by describing those attachments in the format: @skip_attachments: description of attachment to skip


Examples of how to use it:
  • @skip_attachments: skip attachment if it is not an invoice


Quick reference

@ocr_mode: vision | vision_only

@parallel:true | false # Page‑by‑page or whole‑document mode

@split_file: your rule here # Break a big PDF into smaller pieces

@exclude_pages: your rule here # Ignore unwanted pages

@parallel_extended_context: ... # Info to carry from previous pages

@deep_thinking: 0‑3 # Allow extra reasoning time

@attachments_only # Only process data from email attachments and not the email body/subject (only works in email extractor)

@skip_attachments #Only process data from email attachments and not the email body/subject (only works in email extractor)



Add any combination—one directive per line—upload your PDF, and let the extractor do the rest.

Updated on: 05/06/2025

Was this article helpful?

Share your feedback

Cancel

Thank you!