Turn any documentation into an AI-searchable knowledge base. Feed it markdown, HTML, PDFs, web pages, or plain text — publish to a database and let anyone on your team query it through AI tools like Claude, Cursor, or Slack. Powered by the Model Context Protocol.
- Ingest — Point the CLI at your content. It accepts markdown files, HTML pages, PDFs, live URLs, or even an entire website via crawl.
- Publish — Run a command that converts your content into searchable vectors and stores it in a database.
- Query — Connect any MCP-compatible AI tool to your server. Ask questions in plain English and get answers sourced directly from your documentation.
There are two distinct roles when working with Company Docs MCP. Most people on your team only need the first one.
You just need a URL. No accounts, no installation, no terminal commands.
- Get the server URL from whoever set it up (it looks like
https://company-docs-mcp.example.workers.dev/mcp) - Add it to your AI tool:
- Claude: Settings > Connectors > Add custom connector > paste the URL
- Cursor / Windsurf: Add the URL as a remote MCP server in settings
- Start asking questions about your documentation
That's it. Cloudflare, Supabase, and the CLI are only needed by the person who sets up and maintains the server.
The rest of this README is for you. Follow the setup guide below to get everything running.
The system uses three services. All three offer free tiers that are sufficient for most teams.
flowchart TD
A["Your Content<br/>Markdown, HTML, PDFs, URLs"]
B["Cloudflare Workers AI"]
C[("Supabase")]
D["Your Team<br/>Claude, Cursor, Slack, Chat UI"]
E["Cloudflare Worker"]
A -- "ingest + publish" --> B
B -- "store vectors" --> C
D -- "ask a question" --> E
E -- "vector search" --> C
C -. "matching docs" .-> E
E -. "answers" .-> D
style A fill:#f9f9f9,stroke:#333,color:#333
style B fill:#dbeafe,stroke:#1d4ed8,color:#333
style C fill:#d4edda,stroke:#155724,color:#333
style D fill:#f0fdf4,stroke:#15803d,color:#333
style E fill:#dbeafe,stroke:#1d4ed8,color:#333
| Service | What it does | Why it's needed |
|---|---|---|
| Cloudflare | Hosts your server and converts text into searchable vectors using its built-in AI | This is where your server runs 24/7 so your team can query docs at any time. It also handles the AI processing that makes semantic search possible — no separate AI subscription needed. |
| Supabase | Stores your documentation in a PostgreSQL database with vector search | Powers "smart" search — asking "how do I deploy?" will find documents about releases, CI/CD, and shipping, not just pages containing the word "deploy." |
| npm package | A command-line tool that ingests your content (markdown, HTML, PDFs, URLs) and publishes it to the database | You run this on your computer whenever you add or update documentation. |
No third-party AI API keys are required. Cloudflare provides the AI capabilities through its Workers AI service, which is included with every Cloudflare account at no extra cost.
Before starting, create free accounts on these two services:
- Node.js 18 or later — the runtime that powers the CLI tool (download here)
- A Cloudflare account — for hosting and AI (sign up here, free tier works)
- A Supabase account — for the database (sign up here, free tier works)
That's it. No OpenAI, Anthropic, or Google API keys needed.
Follow these steps in order. Each one builds on the previous.
Open your terminal in the project where your documentation lives and run:
npm install company-docs-mcpThis downloads the CLI tool to your project. No external services are contacted yet.
Your documentation needs a database to store content and make it searchable.
- Go to supabase.com and create a new project
- Go to Settings > API and copy three values (you'll need these in Step 4):
- Project URL (looks like
https://abc123.supabase.co) - anon key (a long string starting with
eyJ) - service_role key (another long string starting with
eyJ— keep this private)
- Project URL (looks like
- Open the SQL Editor in the left sidebar, paste the contents of
database/schema.sql, and click Run
This creates the database tables and search functions the system uses.
The schema file is included in the npm package at
node_modules/company-docs-mcp/database/schema.sql.
The CLI needs access to Cloudflare's AI service to convert your documentation into searchable vectors. The simplest way to connect is through the Wrangler CLI (Cloudflare's command-line tool, included with this package).
Run this command:
npx wrangler loginA browser window will open asking you to log in to your Cloudflare account and grant permission. Click Allow and return to your terminal.
You also need your Cloudflare Account ID:
- Go to dash.cloudflare.com
- Your Account ID is shown on the right side of the overview page — copy it
That's the only Cloudflare setup needed for publishing. The CLI automatically detects the login credentials that wrangler login saved to your computer.
Token expiration: The login session expires periodically. If you see an authentication error when publishing, just run
npx wrangler loginagain.
Create a file called .env in your project root with these values:
# Supabase — where your documentation is stored
SUPABASE_URL=https://your-project.supabase.co
SUPABASE_ANON_KEY=eyJ...
SUPABASE_SERVICE_KEY=eyJ...
# Cloudflare — your Account ID (from Step 3)
CLOUDFLARE_ACCOUNT_ID=your-account-idReplace the placeholder values with the ones you copied from Supabase (Step 2) and Cloudflare (Step 3).
Keep this file private. Never commit
.envto version control — it contains credentials. Add.envto your.gitignorefile.
The system accepts multiple content formats. Use whichever fits your workflow:
| Format | How to ingest | Best for |
|---|---|---|
| Markdown (.md) | npx company-docs ingest markdown --dir=./docs |
Documentation you write and maintain as files |
| HTML (.html) | npm run ingest -- html ./file.html |
Exported web pages, saved articles |
| PDF (.pdf) | npm run ingest -- pdf ./document.pdf |
Policies, reports, specs in PDF format |
| URL (any web page) | npm run ingest -- url https://example.com/page |
Live web content you want to index |
| Website crawl | npm run ingest:web -- --url=https://docs.example.com |
Entire documentation sites |
| CSV of URLs | npm run ingest:csv -- urls.csv |
Batch-importing many web pages at once |
Note: The
npx company-docsCLI (from the npm package) directly supports markdown ingestion and publishing. The other formats (HTML, PDF, URL, website crawl, CSV) are available when running from the cloned repository.
Markdown is the most common starting point. Create files in a directory — any folder structure works:
docs/
├── onboarding/
│ ├── new-hire-checklist.md
│ └── tools-and-access.md
├── engineering/
│ ├── deployment-guide.md
│ └── code-review-process.md
└── policies/
├── pto-policy.md
└── expense-guidelines.md
You can optionally add YAML frontmatter to control how each document is categorized:
---
title: Deployment Guide
category: engineering
tags: [deploy, ci-cd, release]
description: How to deploy to production
---
# Deployment Guide
Your content here...If you don't include frontmatter, the system will auto-detect a category and extract tags from the content.
Two steps turn your content into a searchable knowledge base:
Step 1 — Ingest (converts your content into structured entries):
# Markdown (most common)
npx company-docs ingest markdown --dir=./docs
# Or from the cloned repo, for other formats:
npm run ingest -- html ./exported-page.html
npm run ingest -- pdf ./policy-document.pdf
npm run ingest -- url https://wiki.example.com/important-pageStep 2 — Publish (pushes entries to the database with search vectors):
npx company-docs publishWhat happens:
- The ingest step reads your content, extracts titles and sections, and saves structured entries as JSON files in a
content/entries/folder in your project. publishsends each entry to Cloudflare's AI to generate search vectors, then stores everything in your Supabase database. A content hash automatically skips entries that haven't changed, so re-running is fast.
To preview what would be published without actually writing to the database:
npx company-docs publish --dry-runUpdating content: Whenever you change your source files, run both steps again. Only changed entries are re-processed.
The server is what runs 24/7 and handles search queries from your team's AI tools. It's deployed as a Cloudflare Worker.
git clone https://github.com/southleft/company-docs-mcp.git
cd company-docs-mcp
npm installEdit wrangler.toml with your organization name:
name = "company-docs-mcp"
main = "src/index.ts"
compatibility_date = "2024-01-01"
compatibility_flags = ["nodejs_compat"]
[ai]
binding = "AI"
[vars]
ORGANIZATION_NAME = "Your Organization"
VECTOR_SEARCH_ENABLED = "true"
VECTOR_SEARCH_MODE = "vector"The Worker caches recent search results to keep things fast. Run this command to create the cache:
npx wrangler kv namespace create CONTENT_CACHEIt will print an ID. Add it to wrangler.toml:
[[kv_namespaces]]
binding = "CONTENT_CACHE"
id = "the-id-that-was-printed"These are stored securely as encrypted secrets — they never appear in plain text in the dashboard or config files.
echo "your-supabase-url" | npx wrangler secret put SUPABASE_URL
echo "your-anon-key" | npx wrangler secret put SUPABASE_ANON_KEY
echo "your-service-key" | npx wrangler secret put SUPABASE_SERVICE_KEYMake sure you're logged in (you should be from Step 3 — if not, run npx wrangler login again), then:
npm run deployYour server is now live at https://company-docs-mcp.<your-subdomain>.workers.dev.
Share this URL with your team:
https://company-docs-mcp.<your-subdomain>.workers.dev/mcp
Claude: Settings > Connectors > Add custom connector > paste the URL.
Cursor / Windsurf / Other MCP clients: Add the URL as a remote MCP server in your client's settings.
Once connected, your AI tool will have access to these search tools:
| Tool | What it does |
|---|---|
search_documentation |
Finds documentation that matches your question using semantic search |
search_chunks |
Searches specific sections within documents |
browse_by_category |
Lists all documentation in a category (categories come from frontmatter, the --category flag, or auto-detection) |
get_all_tags |
Lists every tag used across your documentation |
Since Cloudflare appears in several steps, here's a plain-language summary of what it does and when:
| When | What Cloudflare does | How it's accessed |
|---|---|---|
| Publishing docs (Step 6) | Converts your text into numerical vectors that enable semantic search | CLI calls the Cloudflare REST API using your wrangler login credentials |
| Running the server (Step 7+) | Hosts the always-on server that your team queries; generates vectors for incoming questions | Built-in — no API keys needed at runtime |
Is Cloudflare optional? No — it's required for both publishing and hosting. However, the free tier is more than sufficient and no separate AI subscription is needed. The only setup required is creating an account and running npx wrangler login.
These work anywhere via npx company-docs:
company-docs <command> [options]
| Command | Description |
|---|---|
ingest markdown |
Parse markdown files into content/entries/ |
publish |
Push entries to the database with AI-generated vectors |
ingest supabase |
Same as publish |
manifest |
Generate content/manifest.json (used during Worker deployment) |
These are available when running from the cloned repository:
| Command | Description |
|---|---|
npm run ingest -- html <file> |
Ingest an HTML file |
npm run ingest -- pdf <file> |
Ingest a PDF document |
npm run ingest -- url <url> |
Ingest a single web page |
npm run ingest:csv -- <file> |
Ingest URLs listed in a CSV file |
npm run ingest:web -- --url=<url> |
Crawl and ingest an entire website |
| Option | Description | Default |
|---|---|---|
--dir, -d |
Folder containing your markdown files | ./docs |
--category, -c |
Category label for the content (overrides frontmatter) | documentation |
--recursive |
Include files in subfolders | true |
--verbose, -v |
Show detailed output | false |
| Option | Description |
|---|---|
--clear |
Delete all existing data before publishing (start fresh) |
--dry-run |
Preview what would change without writing to the database |
--verbose |
Show detailed per-entry progress |
# Ingest markdown from different folders with different categories
npx company-docs ingest markdown --dir=./docs/engineering --category=engineering
npx company-docs ingest markdown --dir=./docs/policies --category=hr
npx company-docs publish
# Ingest a PDF and a web page (from cloned repo)
npm run ingest -- pdf ./policies/employee-handbook.pdf
npm run ingest -- url https://wiki.example.com/onboarding
npx company-docs publish
# Crawl an entire documentation site (from cloned repo)
npm run ingest:web -- --url=https://docs.example.com --max-pages=50
# Full re-publish from scratch
npx company-docs publish --clear
# Preview changes
npx company-docs publish --dry-run --verboseEach markdown file can optionally include a YAML frontmatter block at the very top. The system reads these fields:
---
title: Page Title
category: engineering
tags: [deploy, ci-cd, release]
description: A short summary of this page
status: stable
version: 1.0.0
source: src/path/to/source.ts
figma: https://figma.com/...
author: Jane Smith
department: Engineering
---| Field | Effect |
|---|---|
title |
Used as the document title (overrides the first # Heading) |
category |
Sets the browseable category for this document |
tags |
Adds tags for filtering and discovery |
description |
Stored as metadata, returned in search results |
status |
Stored as metadata (e.g., draft, stable, deprecated) |
version |
Stored as metadata |
source, figma, author, department |
Stored as metadata, available in search results |
All fields are optional. If no frontmatter is present, the system auto-detects a category and extracts tags from the content.
Priority order: Frontmatter values take highest priority, followed by CLI flags (like --category), followed by auto-detection.
The system is designed for repeated runs — you don't need to start from scratch each time:
- Content hashing — Only entries whose content has actually changed are re-processed
- Deterministic IDs — The same source file or URL always produces the same database ID, preventing duplicates
- Stale cleanup — Entries removed from your source files are automatically cleaned up from the database
# Edit your content, then re-ingest and re-publish — only changes are processed
npx company-docs ingest markdown --dir=./docs
npx company-docs publishThe server includes a Slack slash command so team members can search documentation directly from Slack:
/docs deployment process
/docs PTO policy
/docs how to set up staging
See docs/SLACK_SETUP.md for setup instructions.
The server includes a web-based chat UI at its root URL (visit the Worker URL in a browser). It has two modes:
- Search mode — Finds relevant documentation using the same vector search as MCP. No additional setup needed.
- AI chat mode — Sends your question to OpenAI GPT-4o, which searches your docs and synthesizes a conversational answer. This is the only feature that requires an OpenAI API key (set as a Worker secret:
OPENAI_API_KEY).
Customize the chat UI with environment variables in wrangler.toml:
[vars]
ORGANIZATION_NAME = "Your Organization"
ORGANIZATION_LOGO_URL = "https://example.com/logo.svg"
ORGANIZATION_TAGLINE = "Ask anything about our documentation"See docs/BRANDING.md for full branding options.
By default, the system uses Cloudflare's Workers AI for embeddings (free, no extra keys). If your organization prefers OpenAI, you can switch:
OPENAI_API_KEY=sk-...
EMBEDDING_PROVIDER=openai| Provider | Model | Dimensions | When to use |
|---|---|---|---|
| Workers AI (default) | @cf/baai/bge-large-en-v1.5 |
1024 | Default. No extra keys. Free on Cloudflare. |
| OpenAI | text-embedding-3-small |
1536 | If your organization already standardizes on OpenAI. |
Important: The embedding provider must match the database schema. The default schema.sql uses 1024 dimensions (Workers AI). If switching to OpenAI, change all vector(1024) to vector(1536) in the schema before running it.
No results from search
- Verify
npx company-docs publishcompleted without errors - Check that your
.envhas the correct Supabase credentials - Run
npx company-docs publish --dry-runto see what entries exist
Authentication error when publishing
- Your
wrangler loginsession may have expired — runnpx wrangler loginagain - Verify
CLOUDFLARE_ACCOUNT_IDis set in your.env
Duplicate entries
- Re-run
npx company-docs ingest markdownfollowed bynpx company-docs publish— duplicates are cleaned up automatically
MCP client not connecting
- Make sure the Worker is deployed and accessible
- Use the
/mcppath in the URL (not just the root URL) - Restart your MCP client after adding the connector
Wrangler login not working
- If you have a
CLOUDFLARE_API_TOKENset in your environment or.envfile, it can interfere with the login flow. Remove or comment it out, then trynpx wrangler loginagain.
- Never commit
.envfiles — they contain credentials - The
SUPABASE_SERVICE_KEYhas full database access — keep it private - The
SUPABASE_ANON_KEYis restricted by Row Level Security policies (read-only) - Review docs/SECURITY_KEY_ROTATION.md if you need to rotate credentials
MIT — see LICENSE for details.
Issues and pull requests are welcome at github.com/southleft/company-docs-mcp.