Writing Effective LLM Extraction Prompts
This guide applies when using LLM-powered extraction, whether through the AI Mode or the SDK's LLM extraction feature.
✅ Do
Be Specific About Fields
- ✅ "Extract product names, prices, and ratings"
- ❌ "Extract product data"
Specify Quantity
- ✅ "Extract first 20 companies"
- ✅ "Get top 50 articles"
- ❌ "Extract some items"
Use Clear Field Names
- ✅ "Extract company name, website URL, and description"
- ❌ "Extract company info"
Target List Data
- ✅ "Extract all job postings with title, company, and location"
- ✅ "Get product listings with name and price"
❌ Don't
Multi-Step Workflows
- ❌ "Login and then extract data"
- ❌ "Click the button and extract results"
Multiple Page Types
- ❌ "Extract from homepage and product pages"
- ❌ "Get data from different sections"
Conditional Logic
- ❌ "Extract only products cheaper than $50"
- ❌ "Get articles published this week"
Data Transformations
- ❌ "Calculate the average price"
- ❌ "Convert prices to EUR"
Single Item Extraction
- ❌ "Get the CEO name"
- ❌ "Extract the main headline"
✅ When to Use LLM Extraction
- You want to quickly extract list data from a single page
- You want to avoid manually creating extraction selectors
- You're extracting common patterns (products, articles, listings)
❌ When Not to Use LLM Extraction
- You need multi-step workflows (logins, navigation between pages)
- You need form submissions before extraction
- You need to extract from multiple different page types
- You need conditional logic or data transformations
For these use cases, use Recorder Mode or the SDK's manual extraction methods to create Extract robots.