Articles on: Getting started

Quickstart: extract data from your first document

This walkthrough takes about five minutes. By the end you'll have pulled structured data out of a PDF you upload — no setup, no integrations, no code. You can run it on the free trial.

Before you start

You need:

A Lido account at sheets.lido.app (the free trial works).
One sample document — an invoice, receipt, or any PDF with data on it. A scanned PDF works too.
A list of the fields you want to pull out. For an invoice, that might be Vendor Name, Invoice Number, Invoice Date, Due Date, Total Amount.

Step-by-step

Open Lido at sheets.lido.app and sign in (or sign up — no credit card required).
Open a new spreadsheet and launch the Data Extractor from the toolbar.
Upload your test document when the extractor opens.
Add the columns you want extracted. One field per row:

Vendor Name
Invoice Number
Invoice Date
Due Date
Total Amount

Use specific names. "Invoice Total" beats "Amount". "Invoice Date" beats "Date".

(Optional) Add instructions if anything about the document is unusual. Examples:

"Use ISO format YYYY-MM-DD for all dates."
"Total Amount is the grand total including tax — not the subtotal."
"Skip line items; I only need summary fields."

(Optional) Set Page Range if your data is on specific pages — e.g., 1 for first page only. Page Range bills only for the pages it processes, so if your data is always on page 1 of a 50-page PDF, you pay for 1 page, not 50. (For the same reason, avoid the @exclude_pages directive if you can use Page Range — @exclude_pages still bills for every page because Lido has to read each one to decide what to skip.)
Toggle Multi-row if your document has a table and you want one extracted row per item (e.g., one row per invoice line). For a one-page invoice with summary fields only, leave it off.
Click Extract.

Within 10–30 seconds you'll see your extracted data as a row (or rows) in the spreadsheet.

Try it on a second document

Upload a different invoice — same vendor or different vendor — and click Extract again. The same column configuration applies. This is the core idea of the Data Extractor: configure once, reuse on every similar document.

If the second document gives bad results:

Check whether the column names match what's in the document. Refine them.
Add a clarifying instruction about that document's quirks.
For scanned PDFs that look like images, try the OCR PDF step first (see Improve extraction accuracy).

What just happened

You configured an extractor worksheet. Lido stored your column list, instructions, and settings in that worksheet. From now on, any of these paths can reuse the same configuration:

Drag-and-drop more files into the Data Extractor — same columns, same rules.
Build a workflow that automatically extracts from every file dropped in a Google Drive or OneDrive folder.
Click the API button in the bottom-left of the extractor to get the exact JSON configuration to call the Lido API from your own backend.

The configuration is the same in all three cases. You only build it once.

Where to go next

Build a workflow that extracts from new files automatically: Build your first workflow.
Connect your file source so workflows can run on incoming documents: Connect Google Drive or Connect OneDrive.
Use the API to extract from your own backend: Lido API: quickstart and authentication.
Get more accurate results on tricky documents: Improve extraction accuracy.

Common mistakes

Vague column names. "Number" forces the AI to guess what kind of number. Be specific: "Invoice Number".
Multi-row turned on for a summary document. This makes the AI invent rows that aren't there. Only turn it on when the document actually contains a table you want each item from.
Skipping the test step before building automation. Always extract from 3–5 sample documents in the spreadsheet UI before wiring up a workflow or API call. Catching a column-name issue at this stage is free; catching it after 500 invoices have been processed is not.
Uploading a scanned PDF without OCR. Extraction works on scans, but quality is much higher with a real text layer. See Improve extraction accuracy.

What is Lido?
Extract data from PDFs and documents
Build your first workflow
Improve extraction accuracy
Lido API: quickstart and authentication
Pricing, plans, and page allowance

Updated on: 13/05/2026

Was this article helpful?

Thank you!