Super-powered Extra Instructions (i.e., Directives)
Directives are deterministic instructions you add in the Extra Instructions section of your Data Extractor to fine‑tune extraction.
Each directive has a specific syntax and must appear on its own line at the bottom of the Extra Instructions.
File Extractor
@parallel:true / @parallel:false
What is this?
Controls whether the extractor treats each page separately or processes the whole document as one unit.
Use when
- @parallel:true: Each page stands on its own. Example: one form or one table per page that does not need data from other pages.
- @parallel:false: Pages depend on each other. Example: page 1 has the employee name and later pages list that employee’s hours. You want the extractor to keep that connection.
Why it helps
- @parallel:true: Many documents have each page holding complete data for that page needed for a spreadsheet row output and each page follows the same structure. In this case, parallel mode speeds things up while keeping rows clean (for example, 1 spreadsheet row per page where each row only holds data for its associated page).
- @parallel:false: Some documents rely on context set earlier. @parallel:false tells the tool to treat the uploaded document as a whole document. This prevents missing fields or mismatched rows when data spans pages (for example, when you want to output a spreadsheet row that combines page 2 and page 1 data).
How to use
Add one line to the end of your Extra Instructions:
If you must use parallel mode but still need context from earlier or later pages when a set of data is extracted from a certain page, add @parallel_extended_context: <field> (see below).
Notes
- Rule of thumb: @parallel:false works best for documents with max 20 pages and when later pages rely on earlier pages or vice versa.
@exclude_pages:
What is this
Skips pages you don’t want turned into spreadsheet rows.
Use when
- You are seeing data outputted from irrelevant pages like cover sheets, marketing content, blank pages, or Terms and Conditions.
- Your output shows junk rows that you would not expect to see.
Why it helps
- Reduces false positives and speeds up processing because fewer pages are analyzed.
How to use
Write a short rule on an independent line at the bottom of your Extra Instructions that says what makes a page skippable. Examples you can paste and tweak - for example:
@parallel_extended_context:
What is this
Tells the extractor to remember a field that appears rarely and reuse it for the data extraction of later pages -- until a new value for that field is found. Examples of fields: Employee name, Table headers, Category label.
Use when
- A key value appears only once or only occasionally but applies to many pages after it.
- Later pages need that earlier value to make sense.
Why it helps
- Prevents missing or misaligned data when the page being processed does not repeat a header, name, label...or some data that the spreadsheet output row of that certain page depends on (e.g., when outputting timesheet data for an employee whose data lives on page 3, except for their name whch is on page 1).
How to use
Add one independent line at the end of your Extra Instructions naming the field to remember - for example:
Notes
- Works best with @parallel:true.
- If the file is short (less than 20 pages), @parallel:false can also solve the same issue by processing the whole document together and considering preceding and succeeding page context when extracting data from a given page.
@deep_thinking: 0 | 1 | 2 | 3
What is this?
Lets the AI spend more time thinking before it answers.
Use when
- The extraction requires careful logic (for example, multi‑step totals, conditional rules, reasoning across columns, manipulating the order of data before being outputted) - or when simple extraction is inconsistent b/c a file is very complex and requires many "Extra Instructions" to ensure the right data is outputted.
Why it helps
- More reasoning time can improve accuracy on complex files.
How to use
Add one line with a level:@deep_thinking: 0 (default)@deep_thinking: 1@deep_thinking: 2@deep_thinking: 3Notes
- Higher is not always better. Try level 1 or 2 first.
Email Extractor
@attachments_only
What is this?
Extracts data only from attachments, not from the email body.
Use when
- Attachments contain the data you need (for example, invoice PDFs) and the email body is not needed.
Why it helps
- Avoids mixing email body text with attachment data. Keeps rows focused and predictable.
How to use
Add one line at the end of your Extra Instructions:
Notes
- If important data is in the email body, do not use this.
@skip_attachments:
What is this?
Skips attachments that are not relevant.
Use when
- Some attachments should not be processed (for example, non‑invoices).
Why it helps
- Reduces noise and speeds processing by excluding files you do not need.
How to use
Provide a simple filter rule at the end of your Extra Instructions. Examples:
Notes
- Works with or without @attachments_only.
OCR Mode
What is this?
OCR (Optical Character Recognition) turns text you can see in a PDF image into real, machine‑readable text. It helps when the PDF has no reliable text layer (for example, you try to highlight a word and nothing—or huge blocks—get highlighted).
Use when
- Data is coming out incorrect and you can’t reliably select/copy text in the PDF (nothing highlights or the wrong blocks highlight).
- Forms with checkboxes/radio buttons or complex layouts confuse the default extractor.
Why it helps
- Makes the visible page content usable as text for extraction.
- Two modes:
- @ocr_mode: vision — Use this version when data is coming out incorrect AND the PDF has some text you can highlight. Also helpful for PDFs with form layouts (e.g., forms with radio buttons, checkboxes, etc.)
- @ocr_mode: vision_only — Use this version when data is coming out incorrect AND the PDF has NO text you can highlight.
How to use
Add one independent line at the end of your Extra Instructions:
@ocr_mode: vision@ocr_mode: vision_onlyQuick reference
Independent pages → @parallel:true.Cross‑page context → @parallel:false or @parallel_extended_context.Irrelevant pages → @exclude_pages:<instructions>.Complex logic → increase @deep_thinking one level.Email attachments only → @attachments_only.Skip unwanted attachments → @skip_attachments:<rule>.Data is extracted incorrectly AND PDF has some highlightable text OR the PDF has form fields (e.g., checkboxes) → @ocr_mode: vision Data is extracted incorrectly AND PDF has NO highlightable text: → @ocr_mode: vision_onlyUpdated on: 05/09/2025
Thank you!