@@ -59,7 +59,7 @@ This allows your agents to gather information from websites, extract structured
5959
6060## Usage Instructions
6161
62- Integrate Firecrawl into the workflow. Can scrape pages, search the web, crawl entire websites , map URL structures, and extract structured data using AI.
62+ Integrate Firecrawl into the workflow. Scrape pages, search the web, crawl entire sites , map URL structures, and extract structured data with AI.
6363
6464
6565
@@ -74,25 +74,7 @@ Extract structured content from web pages with comprehensive metadata support. C
7474| Parameter | Type | Required | Description |
7575| --------- | ---- | -------- | ----------- |
7676| ` url ` | string | Yes | The URL to scrape content from |
77- | ` formats ` | json | No | Output formats \( markdown, html, rawHtml, links, images, screenshot\) . Default: \[ "markdown"\] |
78- | ` onlyMainContent ` | boolean | No | Extract only main content, excluding headers, navs, footers \( default: true\) |
79- | ` includeTags ` | json | No | HTML tags to retain in the output |
80- | ` excludeTags ` | json | No | HTML tags to remove from the output |
81- | ` maxAge ` | number | No | Return cached version if younger than this age in ms \( default: 172800000\) |
82- | ` headers ` | json | No | Custom request headers \( cookies, user-agent, etc.\) |
83- | ` waitFor ` | number | No | Delay in milliseconds before fetching \( default: 0\) |
84- | ` mobile ` | boolean | No | Emulate mobile device \( default: false\) |
85- | ` skipTlsVerification ` | boolean | No | Skip TLS certificate verification \( default: true\) |
86- | ` timeout ` | number | No | Request timeout in milliseconds |
87- | ` parsers ` | json | No | File processing controls \( e.g., \[ "pdf"\]\) |
88- | ` actions ` | json | No | Pre-scrape operations \( wait, click, scroll, screenshot, etc.\) |
89- | ` location ` | json | No | Geographic settings \( country, languages\) |
90- | ` removeBase64Images ` | boolean | No | Strip base64 images from output \( default: true\) |
91- | ` blockAds ` | boolean | No | Enable ad and popup blocking \( default: true\) |
92- | ` proxy ` | string | No | Proxy type: basic, stealth, or auto \( default: auto\) |
93- | ` storeInCache ` | boolean | No | Cache the page \( default: true\) |
94- | ` zeroDataRetention ` | boolean | No | Enable zero data retention mode \( default: false\) |
95- | ` scrapeOptions ` | json | No | Options for content scraping \( legacy, prefer top-level params\) |
77+ | ` scrapeOptions ` | json | No | Options for content scraping |
9678| ` apiKey ` | string | Yes | Firecrawl API key |
9779
9880#### Output
@@ -112,15 +94,6 @@ Search for information on the web using Firecrawl
11294| Parameter | Type | Required | Description |
11395| --------- | ---- | -------- | ----------- |
11496| ` query ` | string | Yes | The search query to use |
115- | ` limit ` | number | No | Maximum number of results to return \( 1-100, default: 5\) |
116- | ` sources ` | json | No | Search sources: \[ "web"\] , \[ "images"\] , or \[ "news"\] \( default: \[ "web"\]\) |
117- | ` categories ` | json | No | Filter by categories: \[ "github"\] , \[ "research"\] , or \[ "pdf"\] |
118- | ` tbs ` | string | No | Time-based search: qdr:h \( hour\) , qdr:d \( day\) , qdr:w \( week\) , qdr:m \( month\) , qdr:y \( year\) |
119- | ` location ` | string | No | Geographic location for results \( e.g., "San Francisco, California, United States"\) |
120- | ` country ` | string | No | ISO country code for geo-targeting \( default: US\) |
121- | ` timeout ` | number | No | Timeout in milliseconds \( default: 60000\) |
122- | ` ignoreInvalidURLs ` | boolean | No | Exclude invalid URLs from results \( default: false\) |
123- | ` scrapeOptions ` | json | No | Advanced scraping configuration for search results |
12497| ` apiKey ` | string | Yes | Firecrawl API key |
12598
12699#### Output
@@ -140,20 +113,6 @@ Crawl entire websites and extract structured content from all accessible pages
140113| ` url ` | string | Yes | The website URL to crawl |
141114| ` limit ` | number | No | Maximum number of pages to crawl \( default: 100\) |
142115| ` onlyMainContent ` | boolean | No | Extract only main content from pages |
143- | ` prompt ` | string | No | Natural language instruction to auto-generate crawler options |
144- | ` maxDiscoveryDepth ` | number | No | Depth limit for URL discovery \( root pages have depth 0\) |
145- | ` sitemap ` | string | No | Whether to use sitemap data: "skip" or "include" \( default: "include"\) |
146- | ` crawlEntireDomain ` | boolean | No | Follow sibling/parent URLs or only child paths \( default: false\) |
147- | ` allowExternalLinks ` | boolean | No | Follow external website links \( default: false\) |
148- | ` allowSubdomains ` | boolean | No | Follow subdomain links \( default: false\) |
149- | ` ignoreQueryParameters ` | boolean | No | Prevent re-scraping same path with different query params \( default: false\) |
150- | ` delay ` | number | No | Seconds between scrapes for rate limit compliance |
151- | ` maxConcurrency ` | number | No | Concurrent scrape limit |
152- | ` excludePaths ` | json | No | Array of regex patterns for URLs to exclude |
153- | ` includePaths ` | json | No | Array of regex patterns for URLs to include exclusively |
154- | ` webhook ` | json | No | Webhook configuration for crawl notifications |
155- | ` scrapeOptions ` | json | No | Advanced scraping configuration |
156- | ` zeroDataRetention ` | boolean | No | Enable zero data retention \( default: false\) |
157116| ` apiKey ` | string | Yes | Firecrawl API Key |
158117
159118#### Output
0 commit comments