Scrape
Convert webpages into clean HTML, LLM-ready Markdown, or screenshots with zero configuration.
Creating Scrape Robots
import { Scrape } from 'maxun-sdk';
const scraper = new Scrape({
apiKey: process.env.MAXUN_API_KEY
});
const robot = await scraper.create(
'Content Scraper',
'https://example.com/article',
{ formats: ['markdown', 'html'] }
);
Output Formats
Markdown
const robot = await scraper.create(
'Article Scraper',
'https://blog.example.com/post',
{ formats: ['markdown'] }
);
const result = await robot.run();
console.log(result.data.markdown);
HTML
const robot = await scraper.create(
'HTML Scraper',
'https://example.com',
{ formats: ['html'] }
);
const result = await robot.run();
console.log(result.data.html);
Screenshots
// Visible viewport
const robot = await scraper.create(
'Screenshot Bot',
'https://example.com',
{ formats: ['screenshot-visible'] }
);
// Full page
const robot = await scraper.create(
'Full Page Screenshot',
'https://example.com',
{ formats: ['screenshot-fullpage'] }
);
Multiple Formats
const robot = await scraper.create(
'Multi-Format Scraper',
'https://example.com',
{ formats: ['markdown', 'html', 'screenshot-visible'] }
);
const result = await robot.run();
console.log(result.data.markdown);
console.log(result.data.html);
console.log(result.data.screenshots);
Smart Queries
Smart Queries let you attach a natural language prompt to a scrape robot. After the page is scraped, an LLM analyzes the page content and returns an answer to your prompt.
The result is returned as result.data.promptResult.
const robot = await scraper.create(
'Pricing Scraper',
'https://example.com/pricing',
{
formats: ['markdown'],
smartQueries: 'List all plan names and their monthly prices.'
}
);
const result = await robot.run();
console.log(result.data.markdown); // full page markdown
console.log(result.data.promptResult); // "Starter: $9/mo, Growth: $29/mo, Pro: $99/mo"
More Examples
// Extract specific data points
const robot = await scraper.create(
'Company Info',
'https://example.com/about',
{
formats: ['markdown'],
smartQueries: 'What is the company founding year and headquarters location?'
}
);
// Summarize content
const robot = await scraper.create(
'Article Summarizer',
'https://blog.example.com/post',
{
formats: ['markdown'],
smartQueries: 'Summarize this article in 3 bullet points.'
}
);
const result = await robot.run();
console.log(result.data.promptResult);
Examples
RAG Pipeline
const robot = await scraper.create(
'RAG Content',
'https://docs.example.com/guide',
{ formats: ['markdown'] }
);
const result = await robot.run();
const markdown = result.data.markdown;
// Send to embedding service
await createEmbeddings(markdown);
Content Aggregation
const urls = [
'https://blog.example.com/post-1',
'https://blog.example.com/post-2'
];
for (const url of urls) {
const robot = await scraper.create(`Article ${url}`, url, {
formats: ['markdown']
});
const result = await robot.run();
await saveToDatabase(result.data.markdown);
}
Managing Scrape Robots
// Get all scrape robots
const robots = await scraper.getRobots();
// Get specific robot
const robot = await scraper.getRobot('robot-id');
// Delete robot
await scraper.deleteRobot('robot-id');
Running Scrape Robots
// Run immediately
const result = await robot.run();
// Run with timeout
const result = await robot.run({
timeout: 30000
});
For scheduling, webhooks, and other robot management features, see Robot Management.