Quickstart: extract data from your first document
This walkthrough takes about five minutes. By the end you'll have pulled structured data out of a PDF you upload — no setup, no integrations, no code. You can run it on the free trial.
Before you start
You need:
- A Lido account at sheets.lido.app (the free trial works).
- One sample document — an invoice, receipt, or any PDF with data on it. A scanned PDF works too.
- A list of the fields you want to pull out. For an invoice, that might be Vendor Name, Invoice Number, Invoice Date, Due Date, Total Amount.
Step-by-step
- Open Lido at sheets.lido.app and sign in (or sign up — no credit card required).
- Open a new spreadsheet and launch the Data Extractor from the toolbar.
- Upload your test document when the extractor opens.
- Add the columns you want extracted. One field per row:
Vendor NameInvoice NumberInvoice DateDue DateTotal Amount
Use specific names. "Invoice Total" beats "Amount". "Invoice Date" beats "Date".
- (Optional) Add instructions if anything about the document is unusual. Examples:
- "Use ISO format YYYY-MM-DD for all dates."
- "Total Amount is the grand total including tax — not the subtotal."
- "Skip line items; I only need summary fields."
- (Optional) Set Page Range if your data is on specific pages — e.g.,
1for first page only. - Toggle Multi-row if your document has a table and you want one extracted row per item (e.g., one row per invoice line). For a one-page invoice with summary fields only, leave it off.
- Click Extract.
Within 10–30 seconds you'll see your extracted data as a row (or rows) in the spreadsheet.
Try it on a second document
Upload a different invoice — same vendor or different vendor — and click Extract again. The same column configuration applies. This is the core idea of the Data Extractor: configure once, reuse on every similar document.
If the second document gives bad results:
- Check whether the column names match what's in the document. Refine them.
- Add a clarifying instruction about that document's quirks.
- For scanned PDFs that look like images, try the OCR PDF step first (see Improve extraction accuracy).
What just happened
You configured an extractor worksheet. Lido stored your column list, instructions, and settings in that worksheet. From now on, any of these paths can reuse the same configuration:
- Drag-and-drop more files into the Data Extractor — same columns, same rules.
- Build a workflow that automatically extracts from every file dropped in a Google Drive or OneDrive folder.
- Click the API button in the bottom-left of the extractor to get the exact JSON configuration to call the Lido API from your own backend.
The configuration is the same in all three cases. You only build it once.
Where to go next
- Build a workflow that extracts from new files automatically: Build your first workflow.
- Connect your file source so workflows can run on incoming documents: Connect Google Drive or Connect OneDrive.
- Use the API to extract from your own backend: Lido API: quickstart and authentication.
- Get more accurate results on tricky documents: Improve extraction accuracy.
Common mistakes
- Vague column names. "Number" forces the AI to guess what kind of number. Be specific: "Invoice Number".
- Multi-row turned on for a summary document. This makes the AI invent rows that aren't there. Only turn it on when the document actually contains a table you want each item from.
- Skipping the test step before building automation. Always extract from 3–5 sample documents in the spreadsheet UI before wiring up a workflow or API call. Catching a column-name issue at this stage is free; catching it after 500 invoices have been processed is not.
- Uploading a scanned PDF without OCR. Extraction works on scans, but quality is much higher with a real text layer. See Improve extraction accuracy.
Related articles
- What is Lido?
- Extract data from PDFs and documents
- Build your first workflow
- Improve extraction accuracy
- Lido API: quickstart and authentication
- Pricing, plans, and page allowance
Updated on: 16/04/2026
Thank you!