Public Beta - Now Available

Stop Paying the "Dumb Agent" Tax

UnforgeAPI is a Hybrid RAG Router that intelligently routes queries to the most efficient path. Save up to 70% on API costs while reducing latency by 3x.

Start Building Free
POST /v1/chat
// One endpoint. Three intelligent paths.
const response = await fetch('https://homerun-snowy.vercel.app/api/v1/chat', {
  method: 'POST',
  headers: { 'Authorization': `Bearer $${API_KEY}` },
  body: JSON.stringify({
    query: "What's the deadline for Project X?",
    context: "Project X deadline: Jan 15, 2026..."
  })
})

// Response: Routed to CONTEXT path (no web search!)
// → Cost: $0.0001 | Latency: 0.3s | Savings: 90%
70%
Cost Reduction
3x
Faster Responses
99.9%
Uptime SLA
<100ms
Router Latency

Three Paths. One Smart Router.

Stop burning money on unnecessary API calls. UnforgeAPI routes each query to the most cost-effective path.

Intelligent Router Brain

Our specialized classifier analyzes intent in under 100ms to determine the optimal execution path.

CHAT Path

Greetings and casual conversation routed to fast Llama-3-8b. No search costs incurred.

CONTEXT Path

Questions answerable from your provided context skip web search entirely. Maximum savings.

RESEARCH Path

Factual queries that need current data get Tavily search + Llama-3-70b synthesis.

Privacy-First

Zero data retention. Queries and context exist only in ephemeral memory during the request.

BYOK Support

Bring your own Groq and Tavily keys for unlimited scaling with zero markup.

New: Turbo Mode

Deep Research for Systems

Ultra-fast structured research for agents and backend systems. Built for machines, not browsers. JSON output in seconds.

Flash-Groq Relay Architecture
structured JSON output
1. Search
Tavily fetches raw content
2. Reason
Gemini extracts structured JSON
3. Render
Groq writes English at hardware speed
4. Report
Structured, cited, API-ready

The secret: Gemini performs compact structured reasoning. Groq renders the result into English at hardware speed. Result: deep research without the latency tax.

Optimized for Latency

Separation of thinking (Gemini) and writing (Groq) eliminates the single-model bottleneck. End-to-end in ~4 seconds.

Built for Automation

Deterministic, machine-friendly output designed for APIs and agent pipelines — not conversational fluff.

BYOK Gemini Supported

Bring your own Gemini, Groq, and Tavily keys. Full control over costs with zero markup on your tokens.

POST /v1/deep-research
const report = await fetch('/api/v1/deep-research', {
  method: 'POST',
  headers: {
    'Authorization': `Bearer $${API_KEY}`,
    // Optional: BYOK headers
    'x-gemini-key': 'your-gemini-key',
    'x-groq-key': 'your-groq-key',
  },
  body: JSON.stringify({
    query: "Latest AI agent frameworks comparison 2026"
  })
})

// Response in ~4 seconds
// → Structured report with citations
// → meta.latency_ms: 3847

How It Works

Integrate in minutes. See savings immediately.

Step 01

Send Your Request

POST to /v1/chat with your query and optional context (documents, emails, database rows).

fetch('/v1/chat', {
  body: JSON.stringify({
    query: "User question",
    context: "Local data..."
  })
})
Step 02

Router Analyzes Intent

Our Router Brain classifies the query in <100ms using pattern matching and lightweight ML.

// Router Decision
{
  "intent": "CONTEXT",
  "confidence": 0.95,
  "reason": "Answerable from provided data"
}
Step 03

Optimal Path Execution

Query is routed to CHAT, CONTEXT, or RESEARCH path based on what is actually needed.

// Response
{
  "answer": "Based on your document...",
  "meta": {
    "routed_to": "CONTEXT",
    "cost_savings": true
  }
}

Choose Your Infrastructure

Two paths to intelligent query routing. Pick the one that fits your workflow.

— OR —

Zero configuration required. We handle the LLM infrastructure. You just call the API.

Sandbox

Free

Perfect for testing the API

  • 50 requests / day
  • Chat & Context paths only
  • ❌ Search disabled
  • System API keys
  • Community support
Start Free
Most Popular

Managed Pro

$20/month

For production applications

  • Unlimited Chat & Context
  • 1,000 Web Search requests / month
  • ✅ Full research capabilities
  • System API keys
  • Priority support
  • 50,000 req/mo fair usage policy
Start Trial
Both paths use the same intelligent routing engine

Frequently Asked Questions

Ready to Stop Overpaying?

Join the developers who cut their AI costs by 70% with intelligent query routing.