Skip to main content

Extract Robots

Build structured data extraction workflows programmatically using the SDK.

Creating Extract Robots

Extract robots can be created using LLM-based extraction or non-LLM rules.

LLM Extraction (Beta)

Create robots using natural language.

const robot = await extractor.extract('https://example.com', {
prompt: 'Extract first 20 product names and prices',
llmProvider: 'anthropic',
llmApiKey: process.env.ANTHROPIC_API_KEY
});

See AI Mode for provider details and LLM Extraction Prompts for writing effective prompts.

Non LLM Extraction

For non-LLM extraction, you define exact CSS selectors to capture data from web pages.

import { Extract } from 'maxun-sdk';

const extractor = new Extract({
apiKey: process.env.MAXUN_API_KEY
});

const robot = await extractor
.create('Product Extractor')
.navigate('https://example.com/products')
.captureText({
productName: '.product-title',
price: '.price'
});

Key Features

1. Auto List Capture

When using captureList, you only need to provide the list item selector. Maxun automatically:

  • Detects all meaningful fields within each list item
  • Extracts clean, structured data from those fields
.captureList({ 
selector: '.product-card' // That's it! Maxun finds all fields inside
})

2. Auto Pagination (Optional)

Pagination is completely optional. When you don't specify the pagination field, Maxun automatically detects and handles pagination for you.

.captureList({ 
selector: '.product-card',
maxItems: 100
})

3. Pagination with Selectors

For precise control, specify the pagination type and selector

.captureList({ 
selector: '.product-card',
pagination: {
type: 'clickNext',
selector: 'button.next-page'
},
maxItems: 100
})

Pagination Types

TypeDescriptionSelector Required?Example
scrollDownInfinite scroll (downward)❌ No{ type: 'scrollDown' }
scrollUpInfinite scroll (upward)❌ No{ type: 'scrollUp' }
clickNextClick "Next" button/link✅ Yes{ type: 'clickNext', selector: 'a.next' }
clickLoadMoreClick "Load More" button✅ Yes{ type: 'clickLoadMore', selector: 'button.load-more' }

Methods

navigate(url)

.navigate('https://example.com')

Data Extraction

captureText(fields, name?)

Extract specific text fields using CSS selectors:

.captureText({
title: '.article-title',
author: '.author-name'
}, 'Article Info')

captureList(config, name?)

Extract data from lists with automatic field detection. See Key Features above for details on auto list capture and pagination.

// Simple - auto-detects all fields
.captureList({
selector: '.product-item'
}, 'Products')

// With pagination
.captureList({
selector: '.product-item',
pagination: { type: 'scrollDown' },
maxItems: 50
}, 'Products')

captureScreenshot(name?, options?)

.captureScreenshot('Homepage', { fullPage: true })

Interaction

click(selector)

.click('button.show-more')

type(selector, text, inputType?)

.type('input[name="search"]', 'web scraping', 'text')

Input types: text, email, password, number, tel, url

scroll(direction, distance?)

.scroll('down', 500)
.scroll('up')

Waiting

waitFor(selector, timeout?)

.waitFor('.dynamic-content', 5000)

wait(milliseconds)

.wait(2000)

Configuration

setCookies(cookies)

.setCookies([
{ name: 'session', value: 'abc123', domain: '.example.com' }
])

Examples

List with Pagination

const robot = await extractor
.create('News Articles')
.navigate('https://news.example.com')
.captureList({
selector: 'article.news-item',
pagination: {
type: 'clickNext',
selector: 'a.next-page'
},
maxItems: 100
});

const result = await robot.run();

Multi-Step Workflow

const robot = await extractor
.create('Search Results')
.navigate('https://example.com')
.type('input[name="q"]', 'data extraction')
.click('button[type="submit"]')
.waitFor('.results')
.captureList({ selector: '.result-item' });

Form Fill

const robot = await extractor
.create('Login and Extract')
.navigate('https://example.com/login')
.type('input[name="email"]', 'user@example.com', 'email')
.type('input[name="password"]', 'password123', 'password')
.click('button[type="submit"]')
.waitFor('.dashboard')
.captureText({
username: '.user-name',
balance: '.account-balance'
});

Managing Robots

Get All Robots

const robots = await extractor.getRobots();

Get Specific Robot

const robot = await extractor.getRobot('robot-id');

Delete Robot

await extractor.deleteRobot('robot-id');

Running Robots

// Run immediately
const result = await robot.run();

// Run with options
const result = await robot.run({
waitForCompletion: true,
timeout: 60000
});

See Robot Management for scheduling and webhooks.