API Reference

158 models · 20 providers · OpenAI-compatible

Auth Chat Embeddings Images Image Edits Videos Entities SPARQL Web Search Extract Scrape SEO Keywords Models Services Status Health Inference Quick Start

Authentication

All endpoints (except /health) require a Bearer token in the Authorization header.
Authorization: Bearer YOUR_API_KEY
Tip: Set your base URL to https://4ort.io/v1 and pass your API key — works with any OpenAI-compatible SDK or tool. Bare /chat/completions, /embeddings, etc. (without /v1/) also work.
Rate limits: Round-robin distributes across 18 locked free models (mostly NVIDIA NIM at 38 RPM each = ~684 RPM combined headroom). Per-model RPM tracked proactively — no wasted 429 attempts. Sustained 10-15 req/sec is comfortable; bursts higher work but increase tail latency.

POST/v1/chat/completions

OpenAI-compatible chat completions. Supports streaming, tool calling, vision, and structured output. 130+ chat models auto-selected by priority, or specify one explicitly. Models not in the registry are passed through to OpenRouter automatically.
ParameterTypeDescription
messagesarrayArray of message objects with role and content. Required.
modelstringModel ID (e.g. google/gemini-2.5-flash). Optional — auto-selects best available model if omitted.
streambooleanEnable SSE streaming. Default: false
max_tokensintegerMaximum tokens to generate. Optional
temperaturenumberSampling temperature 0-2. Optional
toolsarrayTool/function definitions for tool calling. Optional
response_formatobjectStructured output format, e.g. {"type":"json_object"}. Optional
// Request { "messages": [ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is 2+2?"} ], "model": "google/gemini-2.5-flash", "stream": false, "max_tokens": 1024 }
// Response { "id": "chatcmpl-abc123", "object": "chat.completion", "model": "google/gemini-2.5-flash", "choices": [{ "index": 0, "message": { "role": "assistant", "content": "2 + 2 = 4." }, "finish_reason": "stop" }], "usage": {"prompt_tokens": 25, "completion_tokens": 8, "total_tokens": 33} }
Auto-routing: Omit the model field and the proxy will automatically select the highest-priority healthy model. If a model fails, the request is automatically retried on the next best model.

Routing Headers (optional)

HeaderValuesDescription
X-Routinground-robin, priorityLoad-balancing strategy. round-robin distributes across models; priority always picks the top model first. Default: server config
X-Parallel1-5Number of models to try simultaneously. 1 = sequential (saves quota), 2-3 = fast failover (uses more quota). Default: server config
X-No-FallbacktrueWhen set with an explicit model, disables all fallback behavior. If the requested model fails, returns the error directly instead of trying other models. Useful for premium/paid models where you want consistent quality. Default: off
X-App-NamestringTag requests with your app name for usage tracking and analytics. Optional
// Example: fast mode — round-robin + 3 parallel attempts X-Routing: round-robin X-Parallel: 3 // Example: quota-saver — priority routing, no parallelism X-Routing: priority X-Parallel: 1 // Example: premium model — no fallback to free models X-No-Fallback: true

Throughput Benchmarks

Real-world load test results with round-robin routing and per-model RPM limiting (April 2026).
ConcurrencyRequestsSuccessp50 Latencyp90 Latency
1510099%1.6s90s
2020079.5%963ms37s
Recommendation: Stay under 15 concurrent requests for ~99% success and snappy p50. Sub-second median latency on small/medium models. Big models (Mistral Large 675B, DeepSeek 671B) can take 60-90s — set client timeout to 120s+. Big-model timeouts mostly happen at high concurrency.

POST/v1/embeddings

OpenAI-compatible text embeddings across 5 models. Auto-selects highest-priority available model — no fallback (embedding dimensions differ between providers).
ParameterTypeDescription
inputstring | arrayText or array of texts to embed. Required.
modelstringModel ID. Optional — auto-selects highest priority available.

Available Models

Model IDDimNotes
4ort/qwen3-embedding-4b2000Self-hosted Qwen3-Embedding-4B (truncated from 2560 for pgvector HNSW). Highest priority.
gemini/gemini-embedding-0013072Google Gemini embeddings. Free tier.
openai/text-embedding-3-large3072OpenAI large embeddings.
openai/text-embedding-3-small1536OpenAI small embeddings.
siliconflow/BAAI/bge-m31024BAAI BGE-M3 multilingual.
// Request { "input": "The quick brown fox jumps over the lazy dog", "model": "4ort/qwen3-embedding-4b" }
// Response { "object": "list", "data": [{ "object": "embedding", "embedding": [0.0023, -0.0091, 0.0152, ...], "index": 0 }], "model": "4ort/qwen3-embedding-4b", "usage": {"prompt_tokens": 9, "total_tokens": 9} }
Batch input: Pass an array of strings to input to embed multiple texts in a single request — far more efficient than one request per text.
No auto-fallback: Different embedding models have different dimensions (1024-3072), so a fallback would silently change the vector shape. Pin a model explicitly if you're storing vectors in a database.

POST/v1/images/generations

OpenAI-compatible image generation. 23 image models across Gemini (Nano Banana 2/Pro), OpenAI (GPT Image 1.5, DALL-E 3), x.ai (Grok Imagine), and SiliconFlow (FLUX, Qwen-Image, Stable Diffusion, Kolors). Auto-selects best model or specify one.
ParameterTypeDescription
promptstringText description of the image to generate. Required.
modelstringModel ID (e.g. gemini/gemini-3.1-flash-image-preview). Optional — auto-selects best.
sizestringImage size, e.g. 1024x1024, 1536x1024. Default: 1024x1024
nintegerNumber of images to generate (1-4). Default: 1
qualitystringQuality level: standard or hd. Optional, model-dependent
// Request { "prompt": "A futuristic city skyline at sunset, cyberpunk style", "model": "gemini/gemini-3.1-flash-image-preview", "size": "1024x1024", "n": 1 }
// Response { "created": 1740600000, "data": [ { "b64_json": "iVBORw0KGgo...", // or "url": "https://..." depending on model } ] }
Response format varies by provider: Gemini and OpenAI return base64 in b64_json (heavy — ~100KB+ per image). x.ai and SiliconFlow return temporary url. All wrapped in OpenAI's standard { data: [...] } shape. Tip: if you're feeding the response back into an LLM context window, prefer providers that return URLs.

POST/v1/images/edits

OpenAI-compatible image editing — edit an existing image with a text prompt. Currently routed through x.ai Grok Imagine. Supports single or multi-image inputs.
ParameterTypeDescription
promptstringEdit instruction (e.g. "Make the sky red"). Required.
imagestring | arrayURL or base64 data URI of the source image. Required (or use images).
imagesarrayArray of URLs or base64 data URIs for multi-image edits. Alternative to image.
modelstringModel ID. Optional — auto-selects.
nintegerNumber of variations to generate. Default: 1
sizestringOutput size. Optional
// Request { "prompt": "Add a sunset glow and dramatic clouds", "image": "https://example.com/photo.jpg", "model": "xai/grok-imagine-image" }

POST/v1/videos/generations

Async video generation via 6 models across x.ai and fal.ai. Returns a request_id — poll GET /v1/videos/:requestId for the result.
ParameterTypeDescription
promptstringText description of the video to generate. Required.
modelstringModel ID — required. Available: xai/grok-imagine-video, fal/minimax-hailuo, fal/kling-v2.1, fal/wan-2.6, fal/hunyuan-video, fal/pika-2.2
durationintegerVideo duration in seconds (model-dependent max: 5-15s). Optional
aspect_ratiostringAspect ratio: 1:1, 16:9, 9:16, 4:3, etc. Optional, varies by model
resolutionstring480p, 720p, or 1080p. Optional, varies by model
image_urlstringURL or base64 data URI for image-to-video generation. Optional (xAI only)
source_video_urlstringURL of an existing video to extend (continuation). New duration is appended to the source. Source must be ≤15s. Optional (xAI only)
// Step 1: Create video generation job POST /v1/videos/generations { "prompt": "A rocket launching from a tropical island at sunset", "model": "xai/grok-imagine-video", "duration": 10, "aspect_ratio": "16:9", "resolution": "720p" }
// Response — job created (async) { "request_id": "vg_abc123def456", "status": "in_progress" }

GET/v1/videos/:requestId

Poll for video generation status. When status is "completed", the response includes the video URL.
// Step 2: Poll for result GET /v1/videos/vg_abc123def456
// Response — completed { "status": "completed", "video": { "url": "https://...", "duration": 10 } }
Async workflow: Video generation takes 30-180 seconds. Poll the status endpoint every 5-10 seconds until status is "done" or "failed". Pricing: $0.04-$0.08/second depending on model.
Continuation chain: To build a longer narrative, generate a base clip → grab its video.url from the status response → POST a new request with source_video_url set to it. Repeat to chain. xAI keeps style/character consistent from the source clip's last frame. Source must be ≤15 seconds.
Request ID prefix: Returned IDs are prefixed with the provider (e.g. xai_abc123) so the same status endpoint works regardless of which provider generated it.

POST/v1/entities/search

Unified entity search across 4 knowledge graph providers. Each provider returns standardized Entity objects with provider-specific metadata for SEO, academic research, and identity resolution.
ParameterTypeDescription
querystringSearch query. Required.
providerstringOne of: wikidata, openalex, crossref, orcid. Default: wikidata
limitintegerMax entities to return (1-20). Default: 10
languagestringLanguage code (Wikidata only). Default: en

Providers

ProviderBest ForMetadata Fields
wikidataGeneral entities, QIDs, cross-linkingQID, types
openalexAcademic works, authors, institutionscitedByCount, worksCount, externalId
crossrefScholarly works, DOI resolutionauthors, publisher, journal, citationCount, ISSN, license
orcidResearcher profiles, identity resolutionkeywords, externalIds, urls
// Request { "query": "Python programming", "provider": "wikidata", "limit": 5 }
// Response { "query": "Python programming", "provider": "wikidata", "entities": [ { "id": "Q28865", "name": "Python", "types": [], "description": "general-purpose programming language", "url": "https://www.wikidata.org/wiki/Q28865", "image": null, "score": null, "metadata": { /* provider-specific fields */ } } ] }
Cross-linking: Wikidata QIDs are the universal identifier — many entities have ORCID iDs (P496), DOIs (P356), and OpenAlex IDs (P10283) attached. Use SPARQL to traverse these relationships.
ORCID note: ORCID search is a two-step lookup (search + per-profile fetch) so it's slower but returns much richer person data than the others.

POST/v1/entities/sparql

Raw SPARQL queries against Wikidata's graph database. Returns unprocessed bindings for maximum flexibility — relationship traversal, complex filters, and cross-linking entities to ORCID, DOI, OpenAlex.
ParameterTypeDescription
querystringValid SPARQL query against the Wikidata endpoint. Required.

Useful Properties

PropertyMeaning
wdt:P31instance of (e.g. wd:Q5 = human)
wdt:P279subclass of
wdt:P361part of
wdt:P496ORCID iD — links to ORCID provider
wdt:P356DOI — links to CrossRef provider
wdt:P10283OpenAlex ID — links to OpenAlex provider
// Request — Find Nobel Prize laureates { "query": "SELECT ?item ?label WHERE { ?item wdt:P31 wd:Q5 . ?item wdt:P166 wd:Q35637 . ?item rdfs:label ?label . FILTER(lang(?label)='en') } LIMIT 5" }
// Response { "query": "SELECT ?item ...", "provider": "wikidata-sparql", "results": [ { "item": { "value": "http://www.wikidata.org/entity/Q937" }, "label": { "value": "Albert Einstein" } } ] }

POST/v1/extract

Extract clean article content from any URL using Mozilla Readability. Returns title, author, date, and markdown body. Lightweight alternative to Firecrawl for article extraction.
ParameterTypeDescription
urlstringURL to extract content from. Required.
formatstringmarkdown or text. Default: markdown
// Response { "title": "Article Title", "author": "Author Name", "date": "2026-04-17", "content": "Clean article text in markdown...", "excerpt": "First paragraph summary...", "url": "https://example.com/article" }

POST/v1/scrape

Extract structured data from any URL using CSS selectors. Server-side DOM parsing via linkedom — works on any HTML page.
ParameterTypeDescription
urlstringURL to scrape. Required.
selectorsobjectKey-value pairs of CSS selectors. Each value extracts text from matching element(s). Required.
// Request { "url": "https://example.com", "selectors": { "title": "h1", "description": "meta[name='description']", "links": "a[href]" } }
// Response { "url": "https://example.com", "data": { "title": "Example Domain", "description": "An example website", "links": ["More information..."] } }

SEO Keyword Research

DataForSEO-powered SEO keyword endpoints — search volume, difficulty, suggestions, intent classification, and AI-search keyword research.

POST/v1/seo/keywords/volume

Get monthly search volume, CPC, competition for up to 1,000 keywords from Google Ads.
// Request { "keywords": ["typescript", "javascript"], "location": "United States", // or location code (e.g. 2840) "language": "en" }

POST/v1/seo/keywords/research

Full keyword research data — volume, CPC, search intent, SERP info — for up to 700 keywords. Requires location + language.

POST/v1/seo/keywords/suggestions

Generate related keyword suggestions from a single seed keyword.
// Request { "keyword": "running shoes", "location": "United States", "language": "en", "limit": 100, "include_seed": true }

POST/v1/seo/keywords/difficulty

SEO difficulty scores (0-100) showing how hard it is to rank for each keyword. Up to 1,000 keywords. Requires location + language.

POST/v1/seo/keywords/intent

Classify search intent (informational, navigational, commercial, transactional) for each keyword. Up to 1,000. Requires language.

POST/v1/seo/keywords/ai-volume

AI search volume (queries to ChatGPT, Perplexity, Claude, etc.) for up to 1,000 keywords. Useful for tracking AI-driven traffic. Requires location + language.
Location: Pass either a country name (e.g. "United States") or a DataForSEO location code (e.g. 2840).
Pricing: SEO endpoints are not free — billed via the underlying DataForSEO account. Other endpoints on 4ort.io are free or use the proxy's free-tier providers.

GET/v1/models

List all available models — chat, image, video, embedding, and entity-search. Each entry has a type field and a pricing object. 0 = free.
// Response (truncated) { "object": "list", "data": [ { "id": "nvidia/qwen/qwen3-coder-480b-a35b-instruct", "object": "model", "type": "chat", "owned_by": "nvidia", "pricing": {"input": 0, "output": 0}, "capabilities": {"tools": true, "vision": false, "reasoning": false} }, { "id": "gemini/gemini-3.1-flash-image-preview", "type": "image", "pricing": {"perImage": 0.067}, "supported_sizes": ["1024x1024", "1536x1024", "1024x1536"] }, { "id": "xai/grok-imagine-video", "type": "video", "pricing": {"perSecond": 0.05}, "async": true, "endpoint": "/v1/videos/generations" }, { "id": "4ort/qwen3-embedding-4b", "type": "embedding", "dimensions": 2000, "pricing": {"input": 0} } ] }
Pricing fields: Chat: pricing.input/output in $/million tokens. Image: pricing.perImage. Video: pricing.perSecond. Embedding: pricing.input. Entity search: pricing.perRequest.

GET/v1/services

Programmatic service discovery — returns all enabled endpoints, their parameter schemas, and provider lists. Useful for agents that need to discover capabilities at runtime.
// Response (excerpted) { "services": { "chat": { "endpoint": "POST /v1/chat/completions", "params": { /* ... */ } }, "images": { "endpoint": "POST /v1/images/generations", "providers": ["gemini/...", "openai/..."] }, "videos": { "endpoints": ["POST /v1/videos/generations", "GET /v1/videos/:requestId"] }, "entity_search": { "providers": ["wikidata", "openalex", "crossref", "orcid"] }, "sparql": { "endpoint": "POST /v1/entities/sparql" }, "embeddings": { "providers": ["4ort/qwen3-embedding-4b", "gemini/..."] }, "web_search": { "endpoint": "POST /v1/search" }, "seo_keywords": { "endpoints": ["POST /v1/seo/keywords/volume", /* ... */] } } }

GET/v1/status

Live model rankings, health, rate-limit usage, and routing breakdown. Updated continuously from real request data — useful for monitoring which models are healthy and available right now.
// Response (excerpted) { "totalModels": 158, "imageModels": 23, "videoModels": 6, "embeddingModels": 5, "entitySearchModels": 4, "providers": {"nvidia": 35, "openrouter": 85, /* ... */}, "rateLimits": {/* per-provider daily/RPM usage */}, "models": [/* ranked by effective priority + health */] }

GET/health

Public health check. No authentication required. Returns server status and model counts.
// Response {"status": "ok", "uptime": 86400, "totalModels": 158}

GET/v1/inference/health

Status of the self-hosted inference server (where 4ort/* models run). Returns CPU, memory, per-model status, queue depth, and request stats. Cached for 3 seconds — safe to poll often.
// Response (excerpted) { "available": true, "status": "ok", // ok | degraded | down "uptime_seconds": 12345, "system": { "cpu_percent": 42.3, "memory_used_gb": 28.4, "memory_total_gb": 256, "load_avg": [3.2, 3.5, 3.8] }, "models": { "fast": { "name": "Qwen3.5-0.8B", "status": "idle", "queue_depth": 0, "requests_ok": 1542, "avg_latency_ms": 342 }, "quality": { "name": "GLM-4.7-Flash-REAP-23B", "status": "processing", /* ... */ }, "embedding": { "name": "Qwen3-Embedding-4B", "status": "idle", /* ... */ } } }

Quick Start

from openai import OpenAI import requests, time client = OpenAI( base_url="https://4ort.io/v1", api_key="YOUR_API_KEY", ) # Chat completion response = client.chat.completions.create( model="google/gemini-2.5-flash", messages=[{"role": "user", "content": "Hello!"}], ) print(response.choices[0].message.content) # Image generation image = client.images.generate( model="gemini/gemini-3.1-flash-image-preview", prompt="A cat astronaut floating in space", size="1024x1024", ) print(image.data[0].b64_json[:50]) # Embeddings emb = client.embeddings.create( model="4ort/qwen3-embedding-4b", input=["hello world", "goodbye world"], ) print(f"vec dim: {len(emb.data[0].embedding)}") # Video generation (async) headers = {"Authorization": f"Bearer {client.api_key}", "Content-Type": "application/json"} job = requests.post("https://4ort.io/v1/videos/generations", json={"prompt": "A rocket launch at sunset", "model": "xai/grok-imagine-video"}, headers=headers).json() while True: status = requests.get(f"https://4ort.io/v1/videos/{job['request_id']}", headers=headers).json() if status["status"] in ("done", "failed"): break time.sleep(5) print(status) # Entity search (Wikidata) entities = requests.post("https://4ort.io/v1/entities/search", json={"query": "Albert Einstein", "provider": "wikidata"}, headers=headers).json() for e in entities["entities"]: print(f"{e['name']}: {e['description']}") # Web search search = requests.post("https://4ort.io/v1/search", json={"query": "latest typescript features", "engines": "brave,duckduckgo"}, headers=headers).json() for r in search["results"][:3]: print(r["title"], r["url"])
import OpenAI from "openai"; const client = new OpenAI({ baseURL: "https://4ort.io/v1", apiKey: "YOUR_API_KEY", }); // Chat completion const response = await client.chat.completions.create({ model: "google/gemini-2.5-flash", messages: [{role: "user", content: "Hello!"}], }); console.log(response.choices[0].message.content); // Image generation const image = await client.images.generate({ model: "gemini/gemini-3.1-flash-image-preview", prompt: "A cat astronaut floating in space", size: "1024x1024", }); console.log(image.data[0].b64_json?.slice(0, 50)); // Embeddings const emb = await client.embeddings.create({ model: "4ort/qwen3-embedding-4b", input: ["hello world", "goodbye world"], }); console.log(`vec dim: ${emb.data[0].embedding.length}`); // Video generation (async) const headers = { "Authorization": `Bearer ${client.apiKey}`, "Content-Type": "application/json", }; const job = await fetch("https://4ort.io/v1/videos/generations", { method: "POST", headers, body: JSON.stringify({ prompt: "A rocket launch at sunset", model: "xai/grok-imagine-video", }), }).then(r => r.json()); let status; do { await new Promise(r => setTimeout(r, 5000)); status = await fetch(`https://4ort.io/v1/videos/${job.request_id}`, {headers}).then(r => r.json()); } while (!["done", "failed"].includes(status.status)); console.log(status); // Entity search (Wikidata) const entities = await fetch("https://4ort.io/v1/entities/search", { method: "POST", headers, body: JSON.stringify({query: "Albert Einstein", provider: "wikidata"}), }).then(r => r.json()); entities.entities.forEach(e => console.log(`${e.name}: ${e.description}`));
# Chat completion curl https://4ort.io/v1/chat/completions \ -H "Authorization: Bearer YOUR_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "messages": [{"role": "user", "content": "Hello!"}], "model": "google/gemini-2.5-flash" }' # Image generation curl https://4ort.io/v1/images/generations \ -H "Authorization: Bearer YOUR_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "prompt": "A cat astronaut floating in space", "model": "gemini/gemini-3.1-flash-image-preview" }' # Embeddings curl https://4ort.io/v1/embeddings \ -H "Authorization: Bearer YOUR_API_KEY" \ -H "Content-Type: application/json" \ -d '{"input": "hello world", "model": "4ort/qwen3-embedding-4b"}' # Video generation — Step 1: Create job curl https://4ort.io/v1/videos/generations \ -H "Authorization: Bearer YOUR_API_KEY" \ -H "Content-Type: application/json" \ -d '{"prompt": "A rocket launch at sunset", "model": "xai/grok-imagine-video"}' # → {"request_id": "xai_abc123", "model": "xai/grok-imagine-video"} # Video generation — Step 2: Poll for result curl https://4ort.io/v1/videos/xai_abc123 \ -H "Authorization: Bearer YOUR_API_KEY" # → {"status": "done", "video": {"url": "...", "duration": 10}} # Web search curl https://4ort.io/v1/search \ -H "Authorization: Bearer YOUR_API_KEY" \ -H "Content-Type: application/json" \ -d '{"query": "typescript generics", "engines": "brave,duckduckgo"}' # Article extraction curl https://4ort.io/v1/extract \ -H "Authorization: Bearer YOUR_API_KEY" \ -H "Content-Type: application/json" \ -d '{"url": "https://example.com/article"}' # Entity search — Wikidata curl https://4ort.io/v1/entities/search \ -H "Authorization: Bearer YOUR_API_KEY" \ -H "Content-Type: application/json" \ -d '{"query": "Python programming", "provider": "wikidata"}'
Home Dashboard