Skip to content

mnfst/awesome-free-llm-apis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

43 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Awesome

LLM APIs with permanent free tiers for text inference.

All endpoints are OpenAI SDK-compatible unless noted. Each link points to the provider's API key page.

Contents

Provider APIs

APIs run by the companies that train or fine-tune the models themselves.

Cohere πŸ‡¨πŸ‡¦

Free "Trial" API key, no credit card. 1,000 API calls/month. Non-commercial use only.

Base URL: https://api.cohere.com/v2

Model Name Context Max Output Modality Rate Limit
Command A (111B) 256K 4K Text 20 RPM
Command R+ 128K 4K Text 20 RPM
Command R 128K 4K Text 20 RPM
Command R7B 128K 4K Text 20 RPM
Embed 4 β€” β€” Embeddings (Text + Image) 2,000 inputs/min
Rerank 3.5 β€” β€” Reranking 10 RPM

Google Gemini πŸ‡ΊπŸ‡Έ

Free tier unavailable in EU/UK/Switzerland. Free-tier prompts may be used by Google to improve products. 1

Base URL: https://generativelanguage.googleapis.com/v1beta

Model Name Context Max Output Modality Rate Limit
Gemini 2.5 Flash 1M 65K Text + Image + Audio + Video 10 RPM, 250 RPD
Gemini 2.5 Flash-Lite 1M 65K Text + Image + Audio + Video 15 RPM, 1,000 RPD

Mistral AI πŸ‡«πŸ‡·

Free "Experiment" plan, no credit card. ~1B tokens/month.

Base URL: https://api.mistral.ai/v1

Model Name Context Max Output Modality Rate Limit
Mistral Small 4 256K 256K Text + Image + Code ~1 RPS, 500K TPM
Mistral Medium 3 128K 128K Text ~1 RPS, 500K TPM
Mistral Large 3 256K 256K Text ~1 RPS, 500K TPM
Mistral Nemo (12B) 128K 128K Text ~1 RPS, 500K TPM
Codestral 256K 256K Code ~1 RPS, 500K TPM
Pixtral Large 128K 128K Text + Image ~1 RPS, 500K TPM

Z AI (Zhipu AI) πŸ‡¨πŸ‡³

Permanent free models, no credit card required.

Base URL: https://open.bigmodel.cn/api/paas/v4

Model Name Context Max Output Modality Rate Limit
GLM-4.7-Flash 200K 128K Text 1 concurrent request
GLM-4.5-Flash 128K ~8K Text 1 concurrent request
GLM-4.6V-Flash 128K ~4K Text + Image 1 concurrent request

Inference providers

Third-party platforms that host open-weight models from various sources.

Cerebras πŸ‡ΊπŸ‡Έ

Free tier, no credit card. Ultra-fast inference (~2,600 tok/s). 1M tokens/day cap.

Base URL: https://api.cerebras.ai/v1

Model Name Context Max Output Modality Rate Limit
llama3.1-8b 128K (8K on free) 8K Text 30 RPM, 14,400 RPD, 1M TPD
gpt-oss-120b 128K (8K on free) 8K Text 30 RPM, 14,400 RPD, 1M TPD
qwen-3-235b-a22b-instruct-2507 131K (8K on free) 8K Text 30 RPM, 14,400 RPD, 1M TPD
zai-glm-4.7 128K (8K on free) 8K Text 10 RPM, 100 RPD, 1M TPD

Cloudflare Workers AI πŸ‡ΊπŸ‡Έ

10,000 Neurons/day free. 50+ models available on free tier.

Base URL: https://api.cloudflare.com/client/v4/accounts/{account_id}/ai/run

Model Name Context Max Output Modality Rate Limit
@cf/meta/llama-3.3-70b-instruct-fp8-fast 131K Shared w/ context Text 10K neurons/day (shared)
@cf/meta/llama-3.1-8b-instruct-fp8-fast 131K Shared w/ context Text 10K neurons/day (shared)
@cf/meta/llama-3.2-11b-vision-instruct 131K Shared w/ context Text + Vision 10K neurons/day (shared)
@cf/meta/llama-4-scout-17b-16e-instruct Up to 10M Shared w/ context Multimodal 10K neurons/day (shared)
@cf/mistralai/mistral-small-3.1-24b-instruct 128K Shared w/ context Text 10K neurons/day (shared)
@cf/google/gemma-4-26b-a4b-it 256K Shared w/ context Text 10K neurons/day (shared)
@cf/qwen/qwq-32b 32K Shared w/ context Text 10K neurons/day (shared)
@cf/deepseek-ai/deepseek-r1-distill-qwen-32b 32K Shared w/ context Text 10K neurons/day (shared)
+ 42 more models Varies Varies Text, Image, Audio, Embeddings 10K neurons/day (shared)

GitHub Models πŸ‡ΊπŸ‡Έ

Free prototyping for all GitHub users. 45+ models. Per-request limits (8K in / 4K out).

Base URL: https://models.inference.ai.azure.com

Model Name Context Max Output Modality Rate Limit
gpt-4.1 1M 32K Text 10 RPM, 50 RPD
gpt-4.1-mini 1M 32K Text 15 RPM, 150 RPD
gpt-4o 128K 16K Text + Vision 10 RPM, 50 RPD
o3-mini 200K 100K Text (reasoning) 10 RPM, 50 RPD
o4-mini 200K 100K Text (reasoning) 10 RPM, 50 RPD
Llama-4-Scout-17B-16E 512K ~4K Text + Vision 15 RPM, 150 RPD
Llama-4-Maverick-17B-128E 256K ~4K Text + Vision 10 RPM, 50 RPD
Meta-Llama-3.3-70B 131K ~4K Text 15 RPM, 150 RPD
DeepSeek-R1 64K 8K Text (reasoning) 15 RPM, 150 RPD
Mistral-Small-3.1 128K ~4K Text + Vision 15 RPM, 150 RPD
+ 35 more models Varies Varies Text / Image Varies by tier

Groq πŸ‡ΊπŸ‡Έ

Free tier, no credit card. Ultra-fast LPU inference. 2

Base URL: https://api.groq.com/openai/v1

Model Name Context Max Output Modality Rate Limit
llama-3.3-70b-versatile 131K 32K Text 30 RPM, 14,400 RPD
llama-3.1-8b-instant 131K 131K Text 30 RPM, 14,400 RPD
llama-4-scout-17b-16e-instruct 131K 8K Text + Vision 30 RPM, 14,400 RPD
llama-4-maverick-17b-128e-instruct 131K 8K Text + Vision 15 RPM, 500 RPD
qwen3-32b 131K 131K Text 30 RPM, 14,400 RPD
gpt-oss-120b 131K 32K Text 30 RPM, 14,400 RPD
kimi-k2-instruct 262K 262K Text 30 RPM, 14,400 RPD
deepseek-r1-distill-70b 131K 8K Text 30 RPM, 14,400 RPD
whisper-large-v3 β€” β€” Audio β†’ Text 20 RPM, 2,000 RPD
whisper-large-v3-turbo β€” β€” Audio β†’ Text 20 RPM, 2,000 RPD

Hugging Face πŸ‡ΊπŸ‡Έ

Free Serverless Inference API + ~$0.10/month free credits. Thousands of models.

Base URL: https://api-inference.huggingface.co/models

Model Name Context Max Output Modality Rate Limit
Meta-Llama-3.1-8B-Instruct 128K ~4K Text ~1,000 RPD
Mistral-7B-Instruct-v0.3 32K ~4K Text ~1,000 RPD
Mixtral-8x7B-Instruct-v0.1 32K ~4K Text ~1,000 RPD
Phi-3.5-mini-instruct 128K ~4K Text ~1,000 RPD
Qwen2.5-7B-Instruct 131K ~4K Text ~1,000 RPD
+ thousands of community models Varies Varies Text, Image, Audio, Embeddings ~$0.10/month free credits

Kilo Code πŸ‡ΊπŸ‡Έ

Free models with no credit card required. kilo-auto/free auto-router routes to minimax/minimax-m2.5:free (80%) and stepfun/step-3.5-flash:free (20%). 3

Base URL: https://api.kilo.ai/api/gateway

Model Name Context Max Output Modality Rate Limit
bytedance-seed/dola-seed-2.0-pro:free β€” β€” Text ~200 req/hr
x-ai/grok-code-fast-1:optimized:free β€” β€” Text (code) ~200 req/hr
nvidia/nemotron-3-super-120b-a12b:free 262K 32K Text ~200 req/hr
arcee-ai/trinity-large-thinking:free β€” β€” Text (reasoning) ~200 req/hr
openrouter/free Varies Varies Text ~200 req/hr

LLM7.io πŸ‡¬πŸ‡§

Zero-friction API gateway. No registration needed for basic access. 30+ models.

Base URL: https://api.llm7.io/v1

Model Name Context Max Output Modality Rate Limit
deepseek-r1-0528 β€” β€” Text (reasoning) 30 RPM (120 with token)
deepseek-v3-0324 β€” β€” Text 30 RPM (120 with token)
gemini-2.5-flash-lite β€” β€” Text + Vision 30 RPM (120 with token)
gpt-4o-mini β€” β€” Text + Vision 30 RPM (120 with token)
mistral-small-3.1-24b 32K β€” Text 30 RPM (120 with token)
qwen2.5-coder-32b β€” β€” Text (code) 30 RPM (120 with token)
+ ~24 more models Varies Varies Text 30 RPM (120 with token)

ModelScope πŸ‡¨πŸ‡³

Free API-Inference for registered users. Requires Alibaba Cloud account binding + real-name verification. 4

Base URL: https://api-inference.modelscope.cn/v1

Model Name Context Max Output Modality Rate Limit
Qwen/Qwen3.5-35B-A3B β€” β€” Text + Vision 2,000 RPD total; <=500 RPD/model (dynamic)
Qwen/Qwen3.5-27B β€” β€” Text 2,000 RPD total; <=500 RPD/model (dynamic)
Qwen/Qwen-Image β€” β€” Image Generation 2,000 RPD total; model/AIGC-specific caps
+ API-Inference-enabled models Varies Varies LLM, MLLM, AIGC Dynamic quotas + dynamic concurrency

NVIDIA NIM πŸ‡ΊπŸ‡Έ

Free with NVIDIA Developer Program membership. 100+ models. No daily token cap.

Base URL: https://integrate.api.nvidia.com/v1

Model Name Context Max Output Modality Rate Limit
deepseek-ai/deepseek-r1 128K ~163K Text (reasoning) ~40 RPM
nvidia/llama-3.1-nemotron-ultra-253b-v1 128K 4K Text ~40 RPM
nvidia/nemotron-3-super-120b-a12b 262K 262K Text ~40 RPM
nvidia/nemotron-3-nano-30b-a3b 128K 32K Text ~40 RPM
meta/llama-3.1-405b-instruct 128K 4K Text ~40 RPM
qwen/qwen2.5-72b-instruct 128K 8K Text ~40 RPM
google/gemma-4-31b 128K 8K Text ~40 RPM
mistralai/mistral-large-2-instruct 128K 4K Text ~40 RPM
nvidia/nemotron-nano-2-vl 128K 8K Vision + Text + Video ~40 RPM
minimax/minimax-m2.7 128K 8K Text ~40 RPM
+ 90 more models Varies Varies Text, Image, Video, Speech, Embeddings ~40 RPM

Ollama Cloud πŸ‡ΊπŸ‡Έ

Free tier with qualitative usage limits. 400+ models from Ollama library. Not OpenAI SDK-compatible; uses Ollama API. 5

Base URL: https://api.ollama.com

Model Name Context Max Output Modality Rate Limit
llama3.1:cloud 128K Model-dependent Text Session/weekly limits (unpublished)
deepseek-r1:cloud 128K Model-dependent Text (reasoning) Session/weekly limits (unpublished)
qwen2.5:cloud 128K Model-dependent Text Session/weekly limits (unpublished)
gemma2:cloud 8K Model-dependent Text Session/weekly limits (unpublished)
mistral:cloud 32K Model-dependent Text Session/weekly limits (unpublished)
+ 400 more models Varies Varies Text Session/weekly limits (unpublished)

OpenRouter πŸ‡ΊπŸ‡Έ

35+ free models (marked with :free suffix). OpenAI SDK-compatible. 6

Base URL: https://openrouter.ai/api/v1

Model Name Context Max Output Modality Rate Limit
deepseek/deepseek-r1-0528:free 163K ~163K Text (reasoning) 20 RPM, 200 RPD
deepseek/deepseek-chat-v3-0324:free 163K 163K Text 20 RPM, 200 RPD
qwen/qwen3.6-plus:free 1M 65K Text 20 RPM, 200 RPD
qwen/qwen3-coder-480b-a35b:free 262K ~32K Text 20 RPM, 200 RPD
meta-llama/llama-4-scout:free 10M 16K Multimodal 20 RPM, 200 RPD
meta-llama/llama-4-maverick:free 1M 16K Multimodal 20 RPM, 200 RPD
meta-llama/llama-3.3-70b-instruct:free 65K ~16K Text 20 RPM, 200 RPD
google/gemma-4-31b-it:free 256K ~8K Multimodal 20 RPM, 200 RPD
nvidia/nemotron-3-super-120b-a12b:free 1M ~32K Text 20 RPM, 200 RPD
openai/gpt-oss-120b:free 131K 131K Text 20 RPM, 200 RPD
minimax/minimax-m2.5:free 196K 8K Text 20 RPM, 200 RPD
mistralai/devstral-2512:free 256K ~32K Text 20 RPM, 200 RPD
+ ~23 more free models Varies Varies Text / Image 20 RPM, 200 RPD

OVHcloud AI Endpoints πŸ‡«πŸ‡·

Free anonymous tier (no API key, no signup): 2 RPM per IP per model. 40+ open-weight models hosted in EU. OpenAI SDK-compatible. 7

Base URL: https://oai.endpoints.kepler.ai.cloud.ovh.net/v1

Model Name Context Max Output Modality Rate Limit
Meta-Llama-3_3-70B-Instruct 131K ~4K Text 2 RPM (anonymous)
Meta-Llama-3_1-8B-Instruct 131K ~4K Text 2 RPM (anonymous)
DeepSeek-R1-Distill-Llama-70B 131K ~32K Text (reasoning) 2 RPM (anonymous)
Qwen3-32B 131K ~32K Text 2 RPM (anonymous)
Qwen3-Coder-30B-A3B-Instruct 262K ~32K Text (code) 2 RPM (anonymous)
Qwen2.5-VL-72B-Instruct 128K ~8K Text + Vision 2 RPM (anonymous)
Mixtral-8x7B-Instruct-v0.1 32K ~4K Text 2 RPM (anonymous)
Mistral-Nemo-Instruct-2407 128K ~4K Text 2 RPM (anonymous)
Qwen3Guard-Gen-8B 32K ~4K Text (safety guard) 2 RPM (anonymous)
Qwen3Guard-Gen-0.6B 32K ~4K Text (safety guard) 2 RPM (anonymous)
+ 30 more models Varies Varies Text, Vision, Code, Image, Speech 2 RPM (anonymous)

SiliconFlow πŸ‡¨πŸ‡³

Free tier with 14 CNY signup credits. Permanently free models available.

Base URL: https://api.siliconflow.cn/v1

Model Name Context Max Output Modality Rate Limit
Qwen/Qwen3-8B 131K 131K Text 1,000 RPM, 50K TPM
deepseek-ai/DeepSeek-R1-0528-Qwen3-8B ~33K 16K Text (reasoning) 1,000 RPM, 50K TPM
deepseek-ai/DeepSeek-R1-Distill-Qwen-7B 131K Configurable Text (reasoning) 1,000 RPM, 50K TPM
THUDM/glm-4-9b-chat 32K 32K Text 1,000 RPM, 50K TPM
THUDM/GLM-4.1V-9B-Thinking 66K 66K Vision + Text 1,000 RPM, 50K TPM
deepseek-ai/DeepSeek-OCR β€” 8K Vision (OCR) 1,000 RPM, 50K TPM
+ embedding/speech models Varies Varies Embeddings, Speech 1,000 RPM, 50K TPM

Glossary

Abbreviation Meaning
RPM Requests per minute
RPD Requests per day
TPM Tokens per minute
TPD Tokens per day
RPS Requests per second

Contributing

Know a free tier that's missing? Open a PR. Include the provider, endpoint, rate limits (link to their docs), and a few notable models. Trial credits and time-limited promos don't count.

Footnotes

  1. Free tier not available in the EU, UK, or Switzerland (available regions). ↩

  2. Groq rate limits vary by model. Llama 4 Maverick is limited to 500 RPD. Most other models get 14,400 RPD (rate limits). ↩

  3. Kilo Code free model list may change over time. nvidia/nemotron-3-super-120b-a12b:free is for trial use only β€” prompts are logged by NVIDIA. Auto-router kilo-auto/free routes to minimax/minimax-m2.5:free (80%) and stepfun/step-3.5-flash:free (20%). ↩

  4. API-Inference is free for registered users. Current published limits are 2,000 requests/day per user (total across models), with per-model daily quotas dynamically adjusted and capped at 500; concurrency is also dynamically rate-limited. Requires Alibaba Cloud account binding and real-name verification (limits, intro). ↩

  5. Ollama Cloud measures usage by GPU time, not tokens or requests. Free tier described as "light usage" with session limits resetting every 5 hours and weekly limits every 7 days. Pro (50x more) and Max (250x more) plans available. Not OpenAI SDK-compatible; uses the Ollama API. ↩

  6. Free models default to 200 RPD. A one-time purchase of $10+ in credits unlocks 1,000 RPD for free models. OpenRouter also offers a Free Models Router (openrouter/free) and model fallbacks for chaining models in priority order. ↩

  7. OVHcloud AI Endpoints offers a permanent free anonymous tier (2 requests per minute per IP, per model) with no signup or API key required β€” click "Get your free token" at endpoints.ai.cloud.ovh.net. Higher rate limits (400 RPM per Public Cloud project per model) require an API key and are billed pay-as-you-go per token; new Public Cloud accounts get up to $200 in free trial credits. Models are hosted in EU data centers. ↩

Releases

No releases published

Packages

 
 
 

Contributors