Directives (Data Extractor)
Directives — Tell the extractor exactly what you want
Directives are powerful one‑line commands you can optionally add to the Extra Instructions section of your Data Extractor to further fine tune your extraction process. They behave similarly to advanced search operators in Google and other search operators.
How to activate Directives in the Data Extractor
- Each directive begins with @ and guides our AI engine before it processes your PDF.
- Format: @directive_name: optional details here
- Each directive must be 1) added on its own line in the Extra Instructions text box and 2) below / after any other Extra Instructions.
@ocr_mode —Tell the extractor how to “look” at your PDF
Mode | What the extractor reads | Ideal for | How to write |
---|---|---|---|
(default) | Native PDF text & structure | Most PDFs | (leave the directive out) |
vision | Screenshot plusany native text | Interactive forms (radio buttons, check-boxes, layered elements) that confuse native parsing | |
vision_only | Screenshot only(ignores any embedded text) | Scans whose built-in OCR is garbage or mis-aligned | |
When to use
- vision – the file is technically “searchable,” but complex form elements or hidden layers are throwing off field locations.
- vision_only – the PDF came from a scanner that added poor OCR, duplicates lines, or puts text in the wrong order.
Tips
- Start with vision if results look mismatched or columns are empty.
- Switch to vision_only if you see jumbled characters or obviously wrong words in the raw output.
- Remove the directive entirely to fall back to the fastest default mode.
@parallel — Process pages together or one‑by‑one
What it does
Chooses whether the extractor handles each page separately (“parallel”) or treats the whole file as a single document.
When to use
- Multi‑page forms where every page stands alone
- Speeding up very large files
How to write
@parallel:true
– forces page‑by‑page mode@parallel:false
– keeps the file together- Leave it off and we’ll choose the best setting for you
@split_file — Break one long PDF into many small ones
What it does
Splits a PDF that actually contains a stack of individual documents (e.g., 50 invoices in one upload). Each piece is then extracted on its own.
When to use
- You merged multiple docs to save upload time
- You received a bulk scan from a vendor
How to write
@split_file: start a new document whenever the page says "Invoice Number"
If you omit the instruction text, we’ll try to detect changes automatically.
Tip
If Extract multiple rows per document is off, each split piece returns just one row.
@exclude_pages — Skip pages you don’t need
What it does
Filters out pages before extraction.
When to use
- Cover sheets
- Advertising inserts
- Terms & Conditions or other boilerplate
How to write
@exclude_pages: ignore any page that contains "Terms and Conditions"
The rule must be based on content within each individual page.
Using both @split_file
and @exclude_pages
? We split first, then drop unwanted pages.
@parallel_extended_context — Carry key info across pages
Parallel mode is fast, but sometimes data that appears on page 1 is needed to understand page 2 (for example, an employee name, table headers, or a bold category). Tell us what that missing “bridge” is:
@parallel_extended_context: Employee name
Common scenarios
- Timecards – Employee name on page 1 applies to hours listed on page 2
@parallel_extended_context: Employee name
- Multi‑page tables – Column headers appear only once
@parallel_extended_context: Table headers
We automatically pull the most recent value of that field from earlier pages before extracting the current one.
@deep_thinking — Give the AI extra brainpower (advanced)
Most extractions run instantly. For very complex tasks—large calculations, tricky logic—you can let the AI spend more time “thinking” first.
@deep_thinking:0
(default, no extra time)
@deep_thinking:1
(a little more)
@deep_thinking:2
(moderate)
@deep_thinking:3
(maximum)
When to try it
- Complicated reconciliations
- Multi‑step logic that humans would “work out on paper”
Heads‑up: More thinking isn’t always better. Start with the default and increase only if you see gaps in highly analytical outputs.
Email-only directives (Email Extractor)
By default, the email extractor processes both the contents of the forwarded email and all of the attachments. You can ignore either the email contents or specific attachments with the following directives:
@attachments_only
Ignores the email subject/body and treats each attachment as its own document. You can use any of the other Data Extractor directives with the email extractor only if attachments_only is also used.
When to try it:
- Use if you have complex attachments that require additional data extractor directives to process correctly
@skip_attachments
Allows you to skip some or all of the attachments in a forwarded email.
@skip_attachments
with no additional instructions will skip all attachments in the email. If you want to only skip certain attachments, you can append specific instructions by describing those attachments in the format: @skip_attachments: description of attachment to skip
Examples of how to use it:
- @skip_attachments: skip attachment if it is not an invoice
Quick reference
@ocr_mode: vision | vision_only
@parallel:true | false
# Page‑by‑page or whole‑document mode
@split_file: your rule here
# Break a big PDF into smaller pieces
@exclude_pages: your rule here
# Ignore unwanted pages
@parallel_extended_context: ...
# Info to carry from previous pages
@deep_thinking: 0‑3
# Allow extra reasoning time
@attachments_only
# Only process data from email attachments and not the email body/subject (only works in email extractor)
@skip_attachments
#Only process data from email attachments and not the email body/subject (only works in email extractor)
Updated on: 05/06/2025
Thank you!