All endpoints (except /health) require a Bearer token in the Authorization header.
Authorization:Bearer YOUR_API_KEY
Tip: Set your base URL to https://4ort.io/v1 and pass your API key — works with any OpenAI-compatible SDK or tool.
Rate limits: Max throughput is ~1,000 requests/min (burst). Requests are load-balanced across 18 providers via round-robin with automatic failover. No per-request rate limit — sustained use of 200-300 RPM is recommended to preserve daily quotas.
POST/v1/chat/completions
OpenAI-compatible chat completions. Supports streaming, tool calling, vision, and structured output. 141+ chat models auto-selected by priority or specify one explicitly.
Parameter
Type
Description
messages
array
Array of message objects with role and content. Required.
model
string
Model ID (e.g. google/gemini-2.5-flash). Optional — auto-selects best available model if omitted.
stream
boolean
Enable SSE streaming. Default: false
max_tokens
integer
Maximum tokens to generate. Optional
temperature
number
Sampling temperature 0-2. Optional
tools
array
Tool/function definitions for tool calling. Optional
response_format
object
Structured output format, e.g. {"type":"json_object"}. Optional
// Request
{
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is 2+2?"}
],
"model": "google/gemini-2.5-flash",
"stream": false,
"max_tokens": 1024
}
Auto-routing: Omit the model field and the proxy will automatically select the highest-priority healthy model. If a model fails, the request is automatically retried on the next best model.
Routing Headers (optional)
Header
Values
Description
X-Routing
round-robin, priority
Load-balancing strategy. round-robin distributes across models; priority always picks the top model first. Default: server config
X-Parallel
1-5
Number of models to try simultaneously. 1 = sequential (saves quota), 2-3 = fast failover (uses more quota). Default: server config
X-No-Fallback
true
When set with an explicit model, disables all fallback behavior. If the requested model fails, returns the error directly instead of trying other models. Useful for premium/paid models where you want consistent quality. Default: off
// Example: fast mode — round-robin + 3 parallel attemptsX-Routing:round-robinX-Parallel:3// Example: quota-saver — priority routing, no parallelismX-Routing:priorityX-Parallel:1// Example: premium model — no fallback to free modelsX-No-Fallback:true
Throughput Benchmarks
Real-world load test results with round-robin routing and parallel fallback enabled.
Concurrency
Requests
Success
Throughput
p50 Latency
p90 Latency
10
20
100%
~600 RPM
701ms
1.1s
20
50
100%
~137 RPM
639ms
1.4s
30
100
100%
~1,060 RPM
769ms
2.0s
50
200
95%
~179 RPM
884ms
2.5s
Recommendation: For best results use 30 or fewer concurrent requests for 100% success rate and sub-second median latency. Sustained throughput of 200-300 RPM is recommended to preserve daily provider quotas. Burst up to ~1,000 RPM is supported.
POST/v1/images/generations
OpenAI-compatible image generation. 23 image models including Nano Banana 2 (Gemini), GPT Image 1.5 (OpenAI), Qwen-Image, FLUX models, and x.ai Grok Imagine. Auto-selects best model or specify one.
Parameter
Type
Description
prompt
string
Text description of the image to generate. Required.
model
string
Model ID (e.g. gemini/gemini-3.1-flash-image-preview). Optional — auto-selects best.
size
string
Image size, e.g. 1024x1024, 1536x1024. Default: 1024x1024
n
integer
Number of images to generate (1-4). Default: 1
quality
string
Quality level: standard or hd. Optional, model-dependent
// Request
{
"prompt": "A futuristic city skyline at sunset, cyberpunk style",
"model": "gemini/gemini-3.1-flash-image-preview",
"size": "1024x1024",
"n": 1
}
// Response
{
"created": 1740600000,
"data": [
{
"b64_json": "iVBORw0KGgo...",
// or "url": "https://..." depending on model
}
]
}
Image models: Gemini models return b64_json. SiliconFlow models return url. OpenAI and x.ai models return b64_json. All responses follow the OpenAI format with a data array.
POST/v1/videos/generations
Async video generation via 6 models across x.ai and fal.ai. Returns a request_id — poll GET /v1/videos/:requestId for the result.
Parameter
Type
Description
prompt
string
Text description of the video to generate. Required.
model
string
Model ID. Available: xai/grok-imagine-video, fal/minimax-hailuo, fal/kling-v2.1, fal/wan-2.6, fal/hunyuan-video, fal/pika-2.2
duration
integer
Video duration in seconds (model-dependent max: 5-15s). Optional
aspect_ratio
string
Aspect ratio: 1:1, 16:9, 9:16, etc. Optional, varies by model
resolution
string
480p, 720p, or 1080p. Optional, varies by model
// Step 1: Create video generation jobPOST /v1/videos/generations
{
"prompt": "A rocket launching from a tropical island at sunset",
"model": "xai/grok-imagine-video",
"duration": 10,
"aspect_ratio": "16:9",
"resolution": "720p"
}
Async workflow: Video generation takes 30-180 seconds. Poll the status endpoint every 5-10 seconds until status is "completed" or "failed". Pricing: $0.04-$0.08/second depending on model.
POST/v1/entities/search
Entity search across Google Knowledge Graph and Wikidata. Discover entity relationships, types, relevance scores, and structured data for SEO, knowledge base construction, or research.
Google KG scores: The score field represents how strongly Google associates the entity with the query — higher scores indicate stronger relevance. Useful for SEO topical authority analysis.
Wikidata QIDs: Wikidata returns universal entity identifiers (e.g. Q28865) useful for cross-referencing entities across systems. Free, no rate limits.
POST/v1/search
Web search via self-hosted SearXNG. Returns results from DuckDuckGo, Brave, Wikipedia, ArXiv, GitHub, and more.
Param
Type
Description
query
string
Search query. Required.
categories
string
Comma-separated categories: general, science, it, files
{
"query": "typescript generics",
"number_of_results": 5,
"results": [
{
"title": "TypeScript: Documentation - Generics",
"url": "https://www.typescriptlang.org/docs/handbook/2/generics.html",
"content": "Generics provide a way to make components work with any data type...",
"engine": "brave",
"score": 1.0
}
],
"suggestions": [],
"infoboxes": []
}
GET/v1/models
List all available models. Returns chat, image, video, and entity-search models — each with a type field and pricing (0 = free).
Pricing: Chat models: pricing.input / pricing.output in $/million tokens. Image models: pricing.perImage. Video models: pricing.perSecond. Entity search: pricing.perRequest. A value of 0 means the model is free.
GET/v1/status
Model rankings, health status, scoring breakdown, and rate limit usage. Useful for monitoring which models are currently healthy and available.
// Response includes model rankings, health data, and rate limits
{
"totalModels": 165,
"imageModels": 23,
"videoModels": 6,
"entitySearchModels": 2,
"providers": {"gemini": 12, "groq": 8, /* ... */},
"rateLimits": {/* per-provider daily usage */},
"models": [/* ranked by effective priority */]
}
GET/health
Public health check endpoint. No authentication required. Returns server status and model counts.