API Reference

OpenAI-compatible endpoints for chat, image, video, and entity search

Auth Chat Images Videos Entity Search Web Search Models Status Health Quick Start

Authentication

All endpoints (except /health) require a Bearer token in the Authorization header.
Authorization: Bearer YOUR_API_KEY
Tip: Set your base URL to https://4ort.io/v1 and pass your API key — works with any OpenAI-compatible SDK or tool.
Rate limits: Max throughput is ~1,000 requests/min (burst). Requests are load-balanced across 18 providers via round-robin with automatic failover. No per-request rate limit — sustained use of 200-300 RPM is recommended to preserve daily quotas.

POST/v1/chat/completions

OpenAI-compatible chat completions. Supports streaming, tool calling, vision, and structured output. 141+ chat models auto-selected by priority or specify one explicitly.
ParameterTypeDescription
messagesarrayArray of message objects with role and content. Required.
modelstringModel ID (e.g. google/gemini-2.5-flash). Optional — auto-selects best available model if omitted.
streambooleanEnable SSE streaming. Default: false
max_tokensintegerMaximum tokens to generate. Optional
temperaturenumberSampling temperature 0-2. Optional
toolsarrayTool/function definitions for tool calling. Optional
response_formatobjectStructured output format, e.g. {"type":"json_object"}. Optional
// Request { "messages": [ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is 2+2?"} ], "model": "google/gemini-2.5-flash", "stream": false, "max_tokens": 1024 }
// Response { "id": "chatcmpl-abc123", "object": "chat.completion", "model": "google/gemini-2.5-flash", "choices": [{ "index": 0, "message": { "role": "assistant", "content": "2 + 2 = 4." }, "finish_reason": "stop" }], "usage": {"prompt_tokens": 25, "completion_tokens": 8, "total_tokens": 33} }
Auto-routing: Omit the model field and the proxy will automatically select the highest-priority healthy model. If a model fails, the request is automatically retried on the next best model.

Routing Headers (optional)

HeaderValuesDescription
X-Routinground-robin, priorityLoad-balancing strategy. round-robin distributes across models; priority always picks the top model first. Default: server config
X-Parallel1-5Number of models to try simultaneously. 1 = sequential (saves quota), 2-3 = fast failover (uses more quota). Default: server config
X-No-FallbacktrueWhen set with an explicit model, disables all fallback behavior. If the requested model fails, returns the error directly instead of trying other models. Useful for premium/paid models where you want consistent quality. Default: off
// Example: fast mode — round-robin + 3 parallel attempts X-Routing: round-robin X-Parallel: 3 // Example: quota-saver — priority routing, no parallelism X-Routing: priority X-Parallel: 1 // Example: premium model — no fallback to free models X-No-Fallback: true

Throughput Benchmarks

Real-world load test results with round-robin routing and parallel fallback enabled.
ConcurrencyRequestsSuccessThroughputp50 Latencyp90 Latency
1020100%~600 RPM701ms1.1s
2050100%~137 RPM639ms1.4s
30100100%~1,060 RPM769ms2.0s
5020095%~179 RPM884ms2.5s
Recommendation: For best results use 30 or fewer concurrent requests for 100% success rate and sub-second median latency. Sustained throughput of 200-300 RPM is recommended to preserve daily provider quotas. Burst up to ~1,000 RPM is supported.

POST/v1/images/generations

OpenAI-compatible image generation. 23 image models including Nano Banana 2 (Gemini), GPT Image 1.5 (OpenAI), Qwen-Image, FLUX models, and x.ai Grok Imagine. Auto-selects best model or specify one.
ParameterTypeDescription
promptstringText description of the image to generate. Required.
modelstringModel ID (e.g. gemini/gemini-3.1-flash-image-preview). Optional — auto-selects best.
sizestringImage size, e.g. 1024x1024, 1536x1024. Default: 1024x1024
nintegerNumber of images to generate (1-4). Default: 1
qualitystringQuality level: standard or hd. Optional, model-dependent
// Request { "prompt": "A futuristic city skyline at sunset, cyberpunk style", "model": "gemini/gemini-3.1-flash-image-preview", "size": "1024x1024", "n": 1 }
// Response { "created": 1740600000, "data": [ { "b64_json": "iVBORw0KGgo...", // or "url": "https://..." depending on model } ] }
Image models: Gemini models return b64_json. SiliconFlow models return url. OpenAI and x.ai models return b64_json. All responses follow the OpenAI format with a data array.

POST/v1/videos/generations

Async video generation via 6 models across x.ai and fal.ai. Returns a request_id — poll GET /v1/videos/:requestId for the result.
ParameterTypeDescription
promptstringText description of the video to generate. Required.
modelstringModel ID. Available: xai/grok-imagine-video, fal/minimax-hailuo, fal/kling-v2.1, fal/wan-2.6, fal/hunyuan-video, fal/pika-2.2
durationintegerVideo duration in seconds (model-dependent max: 5-15s). Optional
aspect_ratiostringAspect ratio: 1:1, 16:9, 9:16, etc. Optional, varies by model
resolutionstring480p, 720p, or 1080p. Optional, varies by model
// Step 1: Create video generation job POST /v1/videos/generations { "prompt": "A rocket launching from a tropical island at sunset", "model": "xai/grok-imagine-video", "duration": 10, "aspect_ratio": "16:9", "resolution": "720p" }
// Response — job created (async) { "request_id": "vg_abc123def456", "status": "in_progress" }

GET/v1/videos/:requestId

Poll for video generation status. When status is "completed", the response includes the video URL.
// Step 2: Poll for result GET /v1/videos/vg_abc123def456
// Response — completed { "status": "completed", "video": { "url": "https://...", "duration": 10 } }
Async workflow: Video generation takes 30-180 seconds. Poll the status endpoint every 5-10 seconds until status is "completed" or "failed". Pricing: $0.04-$0.08/second depending on model.

POST/v1/entities/search

Entity search across Google Knowledge Graph and Wikidata. Discover entity relationships, types, relevance scores, and structured data for SEO, knowledge base construction, or research.
ParameterTypeDescription
querystringSearch query (e.g. "Python programming"). Required.
providerstringgoogle-kg or wikidata. Default: google-kg
limitintegerMax entities to return (1-20). Default: 10
languagestringLanguage code (e.g. en, es). Default: en
typesarraySchema.org type filter (Google KG only), e.g. ["Person", "Organization"]. Optional
// Request — Google Knowledge Graph { "query": "Tesla", "provider": "google-kg", "limit": 5 }
// Response { "ok": true, "status": 200, "body": { "query": "Tesla", "provider": "google-kg", "entities": [ { "id": "kg:/m/0dr90d", "name": "Tesla, Inc.", "types": ["Corporation", "Organization", "Thing"], "description": "Automotive company", "detailedDescription": "Tesla, Inc. is an American multinational...", "url": "https://www.tesla.com", "image": "https://...", "score": 94876 } ] } }
// Request — Wikidata { "query": "Python programming", "provider": "wikidata", "limit": 5 }
// Response { "ok": true, "status": 200, "body": { "query": "Python programming", "provider": "wikidata", "entities": [ { "id": "Q28865", "name": "Python", "types": [], "description": "general-purpose programming language", "detailedDescription": null, "url": "https://www.wikidata.org/wiki/Q28865", "image": null, "score": null } ] } }
Google KG scores: The score field represents how strongly Google associates the entity with the query — higher scores indicate stronger relevance. Useful for SEO topical authority analysis.
Wikidata QIDs: Wikidata returns universal entity identifiers (e.g. Q28865) useful for cross-referencing entities across systems. Free, no rate limits.

GET/v1/models

List all available models. Returns chat, image, video, and entity-search models — each with a type field and pricing (0 = free).
// Response (truncated) { "object": "list", "data": [ { "id": "google/gemini-2.5-flash", "object": "model", "type": "chat", "owned_by": "google", "pricing": {"input": 0, "output": 0}, "capabilities": {"tools": true, "vision": true, "reasoning": true} }, { "id": "gemini/gemini-3.1-flash-image-preview", "type": "image", "pricing": {"perImage": 0}, "supported_sizes": ["1024x1024", "1536x1024"] }, { "id": "xai/grok-imagine-video", "type": "video", "pricing": {"perSecond": 0.05}, "async": true, "endpoint": "/v1/videos/generations" }, { "id": "google-kg", "type": "entity-search", "pricing": {"perRequest": 0}, "endpoint": "/v1/entities/search" } ] }
Pricing: Chat models: pricing.input / pricing.output in $/million tokens. Image models: pricing.perImage. Video models: pricing.perSecond. Entity search: pricing.perRequest. A value of 0 means the model is free.

GET/v1/status

Model rankings, health status, scoring breakdown, and rate limit usage. Useful for monitoring which models are currently healthy and available.
// Response includes model rankings, health data, and rate limits { "totalModels": 165, "imageModels": 23, "videoModels": 6, "entitySearchModels": 2, "providers": {"gemini": 12, "groq": 8, /* ... */}, "rateLimits": {/* per-provider daily usage */}, "models": [/* ranked by effective priority */] }

GET/health

Public health check endpoint. No authentication required. Returns server status and model counts.
// Response {"status": "ok", "uptime": 86400, "totalModels": 165}

Quick Start

from openai import OpenAI import requests, time client = OpenAI( base_url="https://4ort.io/v1", api_key="YOUR_API_KEY", ) # Chat completion response = client.chat.completions.create( model="google/gemini-2.5-flash", messages=[{"role": "user", "content": "Hello!"}], ) print(response.choices[0].message.content) # Image generation image = client.images.generate( model="gemini/gemini-3.1-flash-image-preview", prompt="A cat astronaut floating in space", size="1024x1024", ) print(image.data[0].b64_json[:50]) # Video generation (async) headers = {"Authorization": f"Bearer {client.api_key}", "Content-Type": "application/json"} job = requests.post("https://4ort.io/v1/videos/generations", json={"prompt": "A rocket launch at sunset"}, headers=headers).json() while True: status = requests.get(f"https://4ort.io/v1/videos/{job['request_id']}", headers=headers).json() if status["status"] in ("completed", "failed"): break time.sleep(5) print(status) # Entity search entities = requests.post("https://4ort.io/v1/entities/search", json={"query": "Tesla", "provider": "google-kg"}, headers=headers).json() for e in entities["body"]["entities"]: print(f"{e['name']}: {e['score']}")
import OpenAI from "openai"; const client = new OpenAI({ baseURL: "https://4ort.io/v1", apiKey: "YOUR_API_KEY", }); // Chat completion const response = await client.chat.completions.create({ model: "google/gemini-2.5-flash", messages: [{role: "user", content: "Hello!"}], }); console.log(response.choices[0].message.content); // Image generation const image = await client.images.generate({ model: "gemini/gemini-3.1-flash-image-preview", prompt: "A cat astronaut floating in space", size: "1024x1024", }); console.log(image.data[0].b64_json?.slice(0, 50)); // Video generation (async) const headers = { "Authorization": `Bearer ${client.apiKey}`, "Content-Type": "application/json", }; const job = await fetch("https://4ort.io/v1/videos/generations", { method: "POST", headers, body: JSON.stringify({prompt: "A rocket launch at sunset"}), }).then(r => r.json()); let status; do { await new Promise(r => setTimeout(r, 5000)); status = await fetch(`https://4ort.io/v1/videos/${job.request_id}`, {headers}).then(r => r.json()); } while (status.status === "in_progress"); console.log(status); // Entity search const entities = await fetch("https://4ort.io/v1/entities/search", { method: "POST", headers, body: JSON.stringify({query: "Tesla", provider: "google-kg"}), }).then(r => r.json()); entities.body.entities.forEach(e => console.log(`${e.name}: ${e.score}`));
# Chat completion curl https://4ort.io/v1/chat/completions \ -H "Authorization: Bearer YOUR_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "messages": [{"role": "user", "content": "Hello!"}], "model": "google/gemini-2.5-flash" }' # Image generation curl https://4ort.io/v1/images/generations \ -H "Authorization: Bearer YOUR_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "prompt": "A cat astronaut floating in space", "model": "gemini/gemini-3.1-flash-image-preview" }' # Video generation — Step 1: Create job curl https://4ort.io/v1/videos/generations \ -H "Authorization: Bearer YOUR_API_KEY" \ -H "Content-Type: application/json" \ -d '{"prompt": "A rocket launch at sunset"}' # → {"request_id": "vg_abc123", "status": "in_progress"} # Video generation — Step 2: Poll for result curl https://4ort.io/v1/videos/vg_abc123 \ -H "Authorization: Bearer YOUR_API_KEY" # → {"status": "completed", "video": {"url": "...", "duration": 10}} # Entity search — Google Knowledge Graph curl https://4ort.io/v1/entities/search \ -H "Authorization: Bearer YOUR_API_KEY" \ -H "Content-Type: application/json" \ -d '{"query": "Tesla", "provider": "google-kg", "limit": 5}' # Entity search — Wikidata curl https://4ort.io/v1/entities/search \ -H "Authorization: Bearer YOUR_API_KEY" \ -H "Content-Type: application/json" \ -d '{"query": "Python programming", "provider": "wikidata"}'
Home Dashboard