Stop Paying the "Dumb Agent" Tax
UnforgeAPI is a Hybrid RAG Router that intelligently routes queries to the most efficient path. Save up to 70% on API costs while reducing latency by 3x.
// One endpoint. Three intelligent paths.
const response = await fetch('https://homerun-snowy.vercel.app/api/v1/chat', {
method: 'POST',
headers: { 'Authorization': `Bearer $${API_KEY}` },
body: JSON.stringify({
query: "What's the deadline for Project X?",
context: "Project X deadline: Jan 15, 2026..."
})
})
// Response: Routed to CONTEXT path (no web search!)
// → Cost: $0.0001 | Latency: 0.3s | Savings: 90%Three Paths. One Smart Router.
Stop burning money on unnecessary API calls. UnforgeAPI routes each query to the most cost-effective path.
Intelligent Router Brain
Our specialized classifier analyzes intent in under 100ms to determine the optimal execution path.
CHAT Path
Greetings and casual conversation routed to fast Llama-3-8b. No search costs incurred.
CONTEXT Path
Questions answerable from your provided context skip web search entirely. Maximum savings.
RESEARCH Path
Factual queries that need current data get Tavily search + Llama-3-70b synthesis.
Privacy-First
Zero data retention. Queries and context exist only in ephemeral memory during the request.
BYOK Support
Bring your own Groq and Tavily keys for unlimited scaling with zero markup.
Deep Research for Systems
Ultra-fast structured research for agents and backend systems. Built for machines, not browsers. JSON output in seconds.
The secret: Gemini performs compact structured reasoning. Groq renders the result into English at hardware speed. Result: deep research without the latency tax.
Optimized for Latency
Separation of thinking (Gemini) and writing (Groq) eliminates the single-model bottleneck. End-to-end in ~4 seconds.
Built for Automation
Deterministic, machine-friendly output designed for APIs and agent pipelines — not conversational fluff.
BYOK Gemini Supported
Bring your own Gemini, Groq, and Tavily keys. Full control over costs with zero markup on your tokens.
const report = await fetch('/api/v1/deep-research', {
method: 'POST',
headers: {
'Authorization': `Bearer $${API_KEY}`,
// Optional: BYOK headers
'x-gemini-key': 'your-gemini-key',
'x-groq-key': 'your-groq-key',
},
body: JSON.stringify({
query: "Latest AI agent frameworks comparison 2026"
})
})
// Response in ~4 seconds
// → Structured report with citations
// → meta.latency_ms: 3847How It Works
Integrate in minutes. See savings immediately.
Send Your Request
POST to /v1/chat with your query and optional context (documents, emails, database rows).
fetch('/v1/chat', {
body: JSON.stringify({
query: "User question",
context: "Local data..."
})
})Router Analyzes Intent
Our Router Brain classifies the query in <100ms using pattern matching and lightweight ML.
// Router Decision
{
"intent": "CONTEXT",
"confidence": 0.95,
"reason": "Answerable from provided data"
}Optimal Path Execution
Query is routed to CHAT, CONTEXT, or RESEARCH path based on what is actually needed.
// Response
{
"answer": "Based on your document...",
"meta": {
"routed_to": "CONTEXT",
"cost_savings": true
}
}Choose Your Infrastructure
Two paths to intelligent query routing. Pick the one that fits your workflow.
Zero configuration required. We handle the LLM infrastructure. You just call the API.
Sandbox
Perfect for testing the API
- 50 requests / day
- Chat & Context paths only
- ❌ Search disabled
- System API keys
- Community support
Managed Pro
For production applications
- Unlimited Chat & Context
- 1,000 Web Search requests / month
- ✅ Full research capabilities
- System API keys
- Priority support
- 50,000 req/mo fair usage policy
Frequently Asked Questions
Ready to Stop Overpaying?
Join the developers who cut their AI costs by 70% with intelligent query routing.