Directives (Data Extractor)

Directives — Tell the extractor exactly what you want

Directives are powerful one‑line commands you can optionally add to the Extra Instructions section of your Data Extractor to further fine tune your extraction process. They behave similarly to advanced search operators in Google and other search operators.

How to activate Directives in the Data Extractor

Each directive begins with @ and guides our AI engine before it processes your PDF.
Format: @directive_name: optional details here
Each directive must be 1) added on its own line in the Extra Instructions text box and 2) below / after any other Extra Instructions.

@ocr_mode —Tell the extractor how to “look” at your PDF

Mode	What the extractor reads	Ideal for	How to write
(default)	Native PDF text & structure	Most PDFs	(leave the directive out)
vision	Screenshot plusany native text	Interactive forms (radio buttons, check-boxes, layered elements) that confuse native parsing	`@ocr_mode:vision`
vision_only	Screenshot only(ignores any embedded text)	Scans whose built-in OCR is garbage or mis-aligned	`@ocr_mode:vision_only`

When to use

vision – the file is technically “searchable,” but complex form elements or hidden layers are throwing off field locations.
vision_only – the PDF came from a scanner that added poor OCR, duplicates lines, or puts text in the wrong order.

Tips

Start with vision if results look mismatched or columns are empty.
Switch to vision_only if you see jumbled characters or obviously wrong words in the raw output.
Remove the directive entirely to fall back to the fastest default mode.

@parallel — Process pages together or one‑by‑one

What it does

Chooses whether the extractor handles each page separately (“parallel”) or treats the whole file as a single document.

When to use

Multi‑page forms where every page stands alone
Speeding up very large files

How to write

@parallel:true – forces page‑by‑page mode
@parallel:false – keeps the file together
Leave it off and we’ll choose the best setting for you

@split_file — Break one long PDF into many small ones

What it does

Splits a PDF that actually contains a stack of individual documents (e.g., 50 invoices in one upload). Each piece is then extracted on its own.

When to use

You merged multiple docs to save upload time
You received a bulk scan from a vendor

How to write

@split_file: start a new document whenever the page says "Invoice Number"

If you omit the instruction text, we’ll try to detect changes automatically.

Tip

If Extract multiple rows per document is off, each split piece returns just one row.

@exclude_pages — Skip pages you don’t need

What it does

Filters out pages before extraction.

When to use

Cover sheets
Advertising inserts
Terms & Conditions or other boilerplate

How to write

@exclude_pages: ignore any page that contains "Terms and Conditions"

The rule must be based on content within each individual page.

Using both @split_file and @exclude_pages? We split first, then drop unwanted pages.

@parallel_extended_context — Carry key info across pages

Parallel mode is fast, but sometimes data that appears on page 1 is needed to understand page 2 (for example, an employee name, table headers, or a bold category). Tell us what that missing “bridge” is:

@parallel_extended_context: Employee name

Common scenarios

Timecards – Employee name on page 1 applies to hours listed on page 2

@parallel_extended_context: Employee name

Multi‑page tables – Column headers appear only once

@parallel_extended_context: Table headers

We automatically pull the most recent value of that field from earlier pages before extracting the current one.

@deep_thinking — Give the AI extra brainpower (advanced)

Most extractions run instantly. For very complex tasks—large calculations, tricky logic—you can let the AI spend more time “thinking” first.

@deep_thinking:0 (default, no extra time)

@deep_thinking:1 (a little more)

@deep_thinking:2 (moderate)

@deep_thinking:3 (maximum)

When to try it

Complicated reconciliations
Multi‑step logic that humans would “work out on paper”

Heads‑up: More thinking isn’t always better. Start with the default and increase only if you see gaps in highly analytical outputs.

Email-only directives (Email Extractor)

By default, the email extractor processes both the contents of the forwarded email and all of the attachments. You can ignore either the email contents or specific attachments with the following directives:

@attachments_only

Ignores the email subject/body and treats each attachment as its own document. You can use any of the other Data Extractor directives with the email extractor only if attachments_only is also used.

When to try it:

Use if you have complex attachments that require additional data extractor directives to process correctly

@skip_attachments

Allows you to skip some or all of the attachments in a forwarded email.

@skip_attachments with no additional instructions will skip all attachments in the email. If you want to only skip certain attachments, you can append specific instructions by describing those attachments in the format: @skip_attachments: description of attachment to skip

Examples of how to use it:

@skip_attachments: skip attachment if it is not an invoice

Quick reference

@ocr_mode: vision | vision_only

@parallel:true | false # Page‑by‑page or whole‑document mode

@split_file: your rule here # Break a big PDF into smaller pieces

@exclude_pages: your rule here # Ignore unwanted pages

@parallel_extended_context: ... # Info to carry from previous pages

@deep_thinking: 0‑3 # Allow extra reasoning time

@attachments_only # Only process data from email attachments and not the email body/subject (only works in email extractor)

@skip_attachments #Only process data from email attachments and not the email body/subject (only works in email extractor)

Add any combination—one directive per line—upload your PDF, and let the extractor do the rest.

Updated on: 05/06/2025

Was this article helpful?

Thank you!

Directives (Data Extractor)

Directives — Tell the extractor exactly what you want

How to activate Directives in the Data Extractor

@ocr_mode —Tell the extractor how to “look” at your PDF

@parallel — Process pages together or one‑by‑one

@split_file — Break one long PDF into many small ones

@exclude_pages — Skip pages you don’t need

@parallel_extended_context — Carry key info across pages

@deep_thinking — Give the AI extra brainpower (advanced)

Email-only directives (Email Extractor)

@attachments_only

@skip_attachments

Examples of how to use it:

Quick reference

Directives — Tell the extractor exactly what you want

@parallel — Process pages together or one‑by‑one

@split_file — Break one long PDF into many small ones

@exclude_pages — Skip pages you don’t need

@parallel_extended_context — Carry key info across pages

@deep_thinking — Give the AI extra brainpower (advanced)