Articles on: AI & extraction

Automate extraction with workflows

This article walks through turning a one-off extractor into a hands-free pipeline that runs every time a new document arrives. The result is the most common Lido use case — drop a PDF in a folder, see structured data show up minutes later in your sheet (and Slack, and Outlook, and your CRM, depending on what you wire up).



Before you start


You need:


  • An extractor worksheet that already works on test documents in the spreadsheet UI. If you haven't built one yet, do that first — see Extract data from PDFs and documents. Workflows reuse this configuration; if it doesn't work in the UI, it won't work in a workflow.
  • A trigger source for new documents — connected Google Drive, OneDrive, Lido Mailbox, Outlook account, or an external system that can hit a webhook.
  • A destination for the extracted rows — usually a sheet (Insert Rows), but can be Slack, email, an external API, or all of the above.



Pick the right trigger


Where do new documents arrive?

Use this trigger

Files dropped into a Google Drive folder

Google Drive Trigger

Files dropped into a OneDrive folder

OneDrive Trigger

Documents sent to a Lido-hosted email address

Lido Mailbox Trigger

Emails arriving in your Outlook inbox (with attachments to extract from)

Outlook Trigger

Documents pushed by another system over HTTP

Webhook Trigger

You want to run extraction on a schedule (e.g., grab today's PDFs at 9am)

Scheduled Trigger

You want to run extraction on demand from a button

Manual Trigger


You can have multiple workflows watching different sources for the same extractor configuration.



Step-by-step: build the workflow


  1. Open Workflows → New Workflow.
  2. Add the appropriate trigger node (e.g., Google Drive Trigger). Configure:
  • Folder — the folder to watch.
  • File types — restrict to .pdf, .png, etc., to avoid extracting from unrelated files.
  1. Add a Data Extractor node. Connect the trigger to it.
  2. In the Data Extractor:
  • Worksheet Name — pick your tested extractor worksheet.
  • Source TypeFile (or Email for mailbox/Outlook triggers).
  • File{{$item.data.file}} (or Email{{$item.data.email}}).
  • Response FormatObjects. Easier to use downstream.
  • Split Rows as Items — turn ON if you want each extracted row to become its own item flowing through the rest of the workflow (e.g., one workflow item per invoice line). Turn OFF if you want all extracted rows together.
  1. Add an Insert Rows node. Connect Data Extractor to it.
  • Worksheet — your destination tracker sheet.
  • Map each extracted field to a destination column (e.g., map {{$item.data.Vendor Name}} to the Vendor column).
  1. (Optional) Add a Send Slack or Send Gmail node to notify the team:
  • MessageNew invoice from {{$item.data.Vendor Name}} for ${{$item.data.Total Amount}}
  1. Connect the Data Extractor's error output to a second Send Slack node so failures get reported instead of silently dropped.
  2. Test the workflow with a sample file before activating. Workflows have a Test button — use it.
  3. Activate. From now on, every new file in the watched folder goes through the pipeline automatically.



Worked example: invoice processing pipeline


Goal: every invoice PDF dropped in Invoices Inbox (Google Drive) lands in the Invoice Tracker sheet, and the AP team gets a Slack message.


[Google Drive Trigger]      // watches Invoices Inbox folder, .pdf only


[Data Extractor] // worksheet = "Invoice Extractor", split rows OFF

├─→ [Insert Rows] // destination = "Invoice Tracker" sheet
│ │
│ ▼
[Send Slack] // post to #ap-inbox

└─(error)[Send Slack] // post to #ap-errors


Each invoice now becomes a row in the tracker plus a Slack message in under a minute, with errors flagged separately.



Worked example: mailbox-driven extraction


Goal: vendors email invoices to invoices@yourco.com.lido.email instead of dropping files in Drive.


[Lido Mailbox Trigger]     // address = invoices@yourco.com.lido.email


[Data Extractor] // Source Type = Email, Email = {{$item.data.email}}


[Insert Rows]


The Lido Mailbox Trigger handles emails and their attachments. The Data Extractor with Source Type = Email looks at both the email body and any attached PDFs.



Worked example: extract-and-classify-and-route


Goal: incoming documents could be invoices, receipts, or contracts. Extract differently for each.


[Google Drive Trigger]


[Document Classifier] // returns one of "Invoice", "Receipt", "Contract"


[Switch] // routes based on classifier output

├─ Invoice ─→ [Data Extractor: invoice config] ─→ [Insert Rows: Invoice Tracker]
├─ Receipt ─→ [Data Extractor: receipt config] ─→ [Insert Rows: Receipt Tracker]
└─ Contract ─→ [Data Extractor: contract config] ─→ [Insert Rows: Contract Tracker]


This pattern handles mixed inbound documents without requiring senders to use different folders.



Tips


  • OCR scanned PDFs first. Add an OCR PDF node between the trigger and Data Extractor for scans. Quality is much higher.
  • Use Split Rows as Items selectively. ON for line-item documents (each line becomes its own workflow item); OFF for summary documents (one item per document).
  • Choose Response Format = Objects. Arrays are clunky to work with in downstream nodes.
  • Always wire up the error output of the Data Extractor. Silent failures are much worse than loud ones.
  • Test with at least 5 different real samples before activating. Variability is the enemy of automated extraction.
  • Use Edit Item to add metadata (source filename, timestamp, workflow run ID) before Insert Rows. Future-you will want it for debugging.



Common mistakes


  • Pointing the workflow at an extractor worksheet that hasn't been tested. Test in the UI first. Always.
  • Forgetting to set the File parameter to {{$item.data.file}}. Leaving it as a placeholder static value breaks at runtime with a confusing error.
  • Watching a folder that contains old files. Triggers can fire on existing files when first activated. Either start with an empty folder or expect a backlog burst.
  • Not connecting the error output. When a file fails extraction (corrupted, unreadable scan, AI refusal), the item disappears unless you handle the error output.
  • Running on every change vs. only new files. Some triggers fire on edits too. Configure the trigger to fire only on new files if that's what you want.
  • Forgetting page allowance. A workflow processing 1,000 PDFs is going to use 1,000+ pages. Match your plan to your volume.




  • Extract data from PDFs and documents
  • Build your first workflow
  • Triggers: how workflows start
  • Improve extraction accuracy
  • Send extracted data to email or Slack
  • Connect Google Drive
  • Connect OneDrive
  • Lido Mailbox
  • Pricing, plans, and page allowance

Updated on: 16/04/2026

Was this article helpful?

Share your feedback

Cancel

Thank you!