Skip to content

Thinkscape/agent-smart-fetch

Repository files navigation

Agent Smart Fetch

Better web fetching for agents.

Features

  • 🔐 Browser-like TLS/SSL + HTTP fingerprints — better success on bot-defended pages
  • 🧹 Defuddle extraction — clean readable content instead of noisy HTML
  • 🧠 Useful metadata — title, author, site, language, published date when available
  • 📦 Downloads + large file support — stream attachments and binaries to temp files
  • 🔁 Client-side <meta> redirects — follows sane meta refresh redirects with loop limits
  • 🔗 Alternate content fallback — when extraction produces no/thin content, follows qualified <link rel="alternate" type="..."> entries in <head> that match the requested output format
  • Batch fetch — fetch many URLs with bounded concurrency
  • 📝 Multiple output formatsmarkdown, html, text, json, raw

Smart Fetch CLI. Install globally and use smart-fetch (or sf) from the terminal.

npm install -g @thinkscape/smart-fetch
sf https://example.com

Smart Fetch for pi.dev.

Registers:

  • web_fetch
  • batch_web_fetch

Smart Fetch for OpenClaw.

Registers:

  • smart_fetch
  • batch_smart_fetch

pi Smart Fetch

Development

This repo is a Bun monorepo.

Install dependencies:

bun install

Run the workspace:

bun run test
bun run build
bun run check

Run package-specific commands:

bun run test:core
bun run test:pi
bun run test:openclaw
bun run test:cli

bun run build:core
bun run build:pi
bun run build:openclaw
bun run build:cli

Integration tests:

bun run test:integration

Install the local pre-commit hook:

bun run hooks:install

Versioning and publishing

Versioning is global across the monorepo.

Bump all package versions together:

bun run version:patch
bun run version:minor
bun run version:major

Create a release commit and tag:

bun run release

Local manual publish commands:

bun run publish:pi
bun run publish:openclaw
bun run publish:cli
bun run publish:all

Note: development uses Bun, but CI publishing still uses npm publish so npm Trusted Publishing works correctly.

Repository

  • GitHub: https://github.com/Thinkscape/agent-smart-fetch

About

Smarter, anti-bot resistant way to fetch pages in openclaw.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors