Automate extraction with workflows
This article walks through turning a one-off extractor into a hands-free pipeline that runs every time a new document arrives. The result is the most common Lido use case — drop a PDF in a folder, see structured data show up minutes later in your sheet (and Slack, and Outlook, and your CRM, depending on what you wire up).
Before you start
You need:
- An extractor worksheet that already works on test documents in the spreadsheet UI. If you haven't built one yet, do that first — see Extract data from PDFs and documents. Workflows reuse this configuration; if it doesn't work in the UI, it won't work in a workflow.
- A trigger source for new documents — connected Google Drive, OneDrive, Lido Mailbox, Outlook account, or an external system that can hit a webhook.
- A destination for the extracted rows — usually a sheet (Insert Rows), but can be Slack, email, an external API, or all of the above.
Pick the right trigger
Where do new documents arrive? | Use this trigger |
|---|---|
Files dropped into a Google Drive folder | Google Drive Trigger |
Files dropped into a OneDrive folder | OneDrive Trigger |
Documents sent to a Lido-hosted email address | Lido Mailbox Trigger |
Emails arriving in your Outlook inbox (with attachments to extract from) | Outlook Trigger |
Documents pushed by another system over HTTP | Webhook Trigger |
You want to run extraction on a schedule (e.g., grab today's PDFs at 9am) | Scheduled Trigger |
You want to run extraction on demand from a button | Manual Trigger |
You can have multiple workflows watching different sources for the same extractor configuration.
Step-by-step: build the workflow
- Open Workflows → New Workflow.
- Add the appropriate trigger node (e.g., Google Drive Trigger). Configure:
- Folder — the folder to watch.
- File types — restrict to
.pdf,.png, etc., to avoid extracting from unrelated files.
- Add a Data Extractor node. Connect the trigger to it.
- In the Data Extractor:
- Worksheet Name — pick your tested extractor worksheet.
- Source Type —
File(orEmailfor mailbox/Outlook triggers). - File —
{{$item.data.file}}(or Email —{{$item.data.email}}). - Response Format —
Objects. Easier to use downstream. - Split Rows as Items — turn ON if you want each extracted row to become its own item flowing through the rest of the workflow (e.g., one workflow item per invoice line). Turn OFF if you want all extracted rows together.
- Add an Insert Rows node. Connect Data Extractor to it.
- Worksheet — your destination tracker sheet.
- Map each extracted field to a destination column (e.g., map
{{$item.data.Vendor Name}}to theVendorcolumn).
- (Optional) Add a Send Slack or Send Gmail node to notify the team:
- Message —
New invoice from {{$item.data.Vendor Name}} for ${{$item.data.Total Amount}}
- Connect the Data Extractor's error output to a second Send Slack node so failures get reported instead of silently dropped.
- Test the workflow with a sample file before activating. Workflows have a Test button — use it.
- Activate. From now on, every new file in the watched folder goes through the pipeline automatically.
Worked example: invoice processing pipeline
Goal: every invoice PDF dropped in Invoices Inbox (Google Drive) lands in the Invoice Tracker sheet, and the AP team gets a Slack message.
[Google Drive Trigger] // watches Invoices Inbox folder, .pdf only
│
▼
[Data Extractor] // worksheet = "Invoice Extractor", split rows OFF
│
├─→ [Insert Rows] // destination = "Invoice Tracker" sheet
│ │
│ ▼
│ [Send Slack] // post to #ap-inbox
│
└─(error)→ [Send Slack] // post to #ap-errors
Each invoice now becomes a row in the tracker plus a Slack message in under a minute, with errors flagged separately.
Worked example: mailbox-driven extraction
Goal: vendors email invoices to invoices@yourco.com.lido.email instead of dropping files in Drive.
[Lido Mailbox Trigger] // address = invoices@yourco.com.lido.email
│
▼
[Data Extractor] // Source Type = Email, Email = {{$item.data.email}}
│
▼
[Insert Rows]
The Lido Mailbox Trigger handles emails and their attachments. The Data Extractor with Source Type = Email looks at both the email body and any attached PDFs.
Worked example: extract-and-classify-and-route
Goal: incoming documents could be invoices, receipts, or contracts. Extract differently for each.
[Google Drive Trigger]
│
▼
[Document Classifier] // returns one of "Invoice", "Receipt", "Contract"
│
▼
[Switch] // routes based on classifier output
│
├─ Invoice ─→ [Data Extractor: invoice config] ─→ [Insert Rows: Invoice Tracker]
├─ Receipt ─→ [Data Extractor: receipt config] ─→ [Insert Rows: Receipt Tracker]
└─ Contract ─→ [Data Extractor: contract config] ─→ [Insert Rows: Contract Tracker]
This pattern handles mixed inbound documents without requiring senders to use different folders.
Tips
- OCR scanned PDFs first. Add an OCR PDF node between the trigger and Data Extractor for scans. Quality is much higher.
- Use Split Rows as Items selectively. ON for line-item documents (each line becomes its own workflow item); OFF for summary documents (one item per document).
- Choose Response Format = Objects. Arrays are clunky to work with in downstream nodes.
- Always wire up the error output of the Data Extractor. Silent failures are much worse than loud ones.
- Test with at least 5 different real samples before activating. Variability is the enemy of automated extraction.
- Use Edit Item to add metadata (source filename, timestamp, workflow run ID) before Insert Rows. Future-you will want it for debugging.
Common mistakes
- Pointing the workflow at an extractor worksheet that hasn't been tested. Test in the UI first. Always.
- Forgetting to set the File parameter to
{{$item.data.file}}. Leaving it as a placeholder static value breaks at runtime with a confusing error. - Watching a folder that contains old files. Triggers can fire on existing files when first activated. Either start with an empty folder or expect a backlog burst.
- Not connecting the error output. When a file fails extraction (corrupted, unreadable scan, AI refusal), the item disappears unless you handle the error output.
- Running on every change vs. only new files. Some triggers fire on edits too. Configure the trigger to fire only on new files if that's what you want.
- Forgetting page allowance. A workflow processing 1,000 PDFs is going to use 1,000+ pages. Match your plan to your volume.
Related articles
- Extract data from PDFs and documents
- Build your first workflow
- Triggers: how workflows start
- Improve extraction accuracy
- Send extracted data to email or Slack
- Connect Google Drive
- Connect OneDrive
- Lido Mailbox
- Pricing, plans, and page allowance
Updated on: 16/04/2026
Thank you!