Articles on: AI & extraction

Automate extraction with workflows

This article walks through turning a one-off extractor into a hands-free pipeline that runs every time a new document arrives. The result is the most common Lido use case — drop a PDF in a folder, see structured data show up minutes later in your sheet (and Slack, and Outlook, and your CRM, depending on what you wire up).

Before you start

You need:

An extractor worksheet that already works on test documents in the spreadsheet UI. If you haven't built one yet, do that first — see Extract data from PDFs and documents. Workflows reuse this configuration; if it doesn't work in the UI, it won't work in a workflow.
A trigger source for new documents — connected Google Drive, OneDrive, Lido Mailbox, Outlook account, or an external system that can hit a webhook.
A destination for the extracted rows — usually a sheet (Insert Rows), but can be Slack, email, an external API, or all of the above.

Pick the right trigger

Where do new documents arrive?	Use this trigger
Files dropped into a Google Drive folder	Google Drive Trigger
Files dropped into a OneDrive folder	OneDrive Trigger
Documents sent to a Lido-hosted email address	Lido Mailbox Trigger
Emails arriving in your Outlook inbox (with attachments to extract from)	Outlook Trigger
Documents pushed by another system over HTTP	Webhook Trigger
You want to run extraction on a schedule (e.g., grab today's PDFs at 9am)	Scheduled Trigger
You want to run extraction on demand from a button	Manual Trigger

You can have multiple workflows watching different sources for the same extractor configuration.

Step-by-step: build the workflow

Open Workflows → New Workflow.
Add the appropriate trigger node (e.g., Google Drive Trigger). Configure:

Folder — the folder to watch.
File types — restrict to .pdf, .png, etc., to avoid extracting from unrelated files.

Add a Data Extractor node. Connect the trigger to it.
In the Data Extractor:

Worksheet Name — pick your tested extractor worksheet.
Source Type — File (or Email for mailbox/Outlook triggers).
File — {{$item.data.file}} (or Email — {{$item.data.email}}).
Response Format — Objects. Easier to use downstream.
Split Rows as Items — turn ON if you want each extracted row to become its own item flowing through the rest of the workflow (e.g., one workflow item per invoice line). Turn OFF if you want all extracted rows together.

Add an Insert Rows node. Connect Data Extractor to it.

Worksheet — your destination tracker sheet.
Map each extracted field to a destination column (e.g., map {{$item.data.Vendor Name}} to the Vendor column).

(Optional) Add a Send Slack or Send Gmail node to notify the team:

Message — New invoice from {{$item.data.Vendor Name}} for ${{$item.data.Total Amount}}

Connect the Data Extractor's error output to a second Send Slack node so failures get reported instead of silently dropped.
Test the workflow with a sample file before activating. Workflows have a Test button — use it.
Activate. From now on, every new file in the watched folder goes through the pipeline automatically.

Worked example: invoice processing pipeline

Goal: every invoice PDF dropped in Invoices Inbox (Google Drive) lands in the Invoice Tracker sheet, and the AP team gets a Slack message.

[Google Drive Trigger]      // watches Invoices Inbox folder, .pdf only
        │
        ▼
[Data Extractor]            // worksheet = "Invoice Extractor", split rows OFF
        │
        ├─→ [Insert Rows]   // destination = "Invoice Tracker" sheet
        │       │
        │       ▼
        │   [Send Slack]    // post to #ap-inbox
        │
        └─(error)→ [Send Slack]  // post to #ap-errors

Each invoice now becomes a row in the tracker plus a Slack message in under a minute, with errors flagged separately.

Worked example: mailbox-driven extraction

Goal: vendors email invoices to invoices@yourco.com.lido.email instead of dropping files in Drive.

[Lido Mailbox Trigger]     // address = invoices@yourco.com.lido.email
        │
        ▼
[Data Extractor]            // Source Type = Email, Email = {{$item.data.email}}
        │
        ▼
[Insert Rows]

The Lido Mailbox Trigger handles emails and their attachments. The Data Extractor with Source Type = Email looks at both the email body and any attached PDFs.

Worked example: extract-and-classify-and-route

Goal: incoming documents could be invoices, receipts, or contracts. Extract differently for each.

[Google Drive Trigger]
        │
        ▼
[Document Classifier]     // returns one of "Invoice", "Receipt", "Contract"
        │
        ▼
[Switch]                  // routes based on classifier output
        │
        ├─ Invoice  ─→ [Data Extractor: invoice config]  ─→ [Insert Rows: Invoice Tracker]
        ├─ Receipt  ─→ [Data Extractor: receipt config]  ─→ [Insert Rows: Receipt Tracker]
        └─ Contract ─→ [Data Extractor: contract config] ─→ [Insert Rows: Contract Tracker]

This pattern handles mixed inbound documents without requiring senders to use different folders.

Tips

Use Page Range, not @exclude_pages, when you can. The Page Range setting only bills for the pages it processes. The @exclude_pages directive still bills for every page in the document, because Lido has to read each page to intelligently decide which ones to exclude. For high-volume workflows, configuring Page Range in your extractor worksheet (e.g., "always page 1") can cut consumption dramatically.
OCR scanned PDFs first. Add an OCR PDF node between the trigger and Data Extractor for scans. Quality is much higher.
Use Split Rows as Items selectively. ON for line-item documents (each line becomes its own workflow item); OFF for summary documents (one item per document).
Choose Response Format = Objects. Arrays are clunky to work with in downstream nodes.
Always wire up the error output of the Data Extractor. Silent failures are much worse than loud ones.
Test with at least 5 different real samples before activating. Variability is the enemy of automated extraction.
Use Edit Item to add metadata (source filename, timestamp, workflow run ID) before Insert Rows. Future-you will want it for debugging.

Common mistakes

Pointing the workflow at an extractor worksheet that hasn't been tested. Test in the UI first. Always.
Forgetting to set the File parameter to {{$item.data.file}}. Leaving it as a placeholder static value breaks at runtime with a confusing error.
Watching a folder that contains old files. Triggers can fire on existing files when first activated. Either start with an empty folder or expect a backlog burst.
Not connecting the error output. When a file fails extraction (corrupted, unreadable scan, AI refusal), the item disappears unless you handle the error output.
Running on every change vs. only new files. Some triggers fire on edits too. Configure the trigger to fire only on new files if that's what you want.
Forgetting page allowance. A workflow processing 1,000 PDFs is going to use 1,000+ pages. Match your plan to your volume.

Extract data from PDFs and documents
Build your first workflow
Triggers: how workflows start
Improve extraction accuracy
Send extracted data to email or Slack
Connect Google Drive
Connect OneDrive
Lido Mailbox
Pricing, plans, and page allowance

Updated on: 13/05/2026

Was this article helpful?

Thank you!