Swaps models intelligently. 74KB Python, zero deps, works 4 you.
Troca modelos de forma inteligente. 74KB Python, zero deps, trabalha por você.
智能切换模型。74KB Python,零依赖,为你而生。
Single daemon, stdlib-only Python, zero external dependencies. Designed for RTX 5090 but works on any NVIDIA GPU.
Daemon unico, Python stdlib puro, zero dependencias externas. Projetado para RTX 5090, funciona em qualquer GPU NVIDIA.
Priority-based leases with TTL. No workload exceeds the GPU budget. Automatic expiration of idle leases.
Leases baseados em prioridade com TTL. Nenhum workload excede o budget da GPU. Expiracao automatica de leases ociosos.
Adaptive throttling that scales with GPU contention. Prevents model load/unload storms that kill throughput.
Throttling adaptativo que escala com contencao da GPU. Evita tempestades de load/unload que destroem throughput.
The /cofit endpoint tells you which models can run in parallel, fitting within VRAM constraints.
O endpoint /cofit diz quais modelos rodam em paralelo, encaixando nos limites de VRAM.
Computes optimal sequential and parallel schedules for multi-model pipelines. Maximum GPU utilization.
Calcula schedules otimos sequenciais e paralelos para pipelines multi-modelo. Utilizacao maxima da GPU.
Three-level system: warn, pressure, emergency. Manages OOM scores. Prevents system-wide crashes from runaway processes.
Sistema de tres niveis: alerta, pressao, emergencia. Gerencia OOM scores. Previne crashes do sistema por processos descontrolados.
Detects unmanaged GPU processes consuming VRAM outside the broker. Full visibility into rogue workloads.
Detecta processos GPU nao gerenciados consumindo VRAM fora do broker. Visibilidade total de workloads nao autorizados.
Unused leases are auto-demoted. No resource hoarding. GPU time goes to workloads that actually need it.
Leases ociosos sao automaticamente rebaixados. Sem acumulacao de recursos. Tempo de GPU vai pra quem precisa.
Monitors GPU memory fragmentation. Alerts when allocation patterns degrade performance below SLO thresholds.
Monitora fragmentacao de memoria GPU. Alerta quando padroes de alocacao degradam performance abaixo dos limiares SLO.
Drains active requests before evicting. No killed inferences. Graceful handoff between priority levels.
Drena requests ativos antes de despejar. Nenhuma inferencia morta. Handoff gracioso entre niveis de prioridade.
After crashes, starts in conservative mode. Reduced allocations until stability is confirmed. Self-healing.
Apos crashes, inicia em modo conservador. Alocacoes reduzidas ate estabilidade confirmada. Auto-recuperacao.
Change policy.json and POST /reload. No restarts. Zero downtime configuration.
Altere policy.json e POST /reload. Sem restarts. Configuracao sem downtime.
Per-priority-class latency targets. The broker enforces response time contracts across competing workloads.
Alvos de latencia por classe de prioridade. O broker aplica contratos de tempo de resposta entre workloads concorrentes.
All endpoints accept and return JSON. The daemon binds to 127.0.0.1 by default.
Todos endpoints aceitam e retornam JSON. O daemon escuta em 127.0.0.1 por padrao.
| Method | Path | Description | Descricao |
|---|---|---|---|
| POST | /acquire | Request a VRAM lease | Solicitar um lease VRAM |
| POST | /release | Free a lease | Liberar um lease |
| POST | /renew | Extend lease TTL | Estender TTL do lease |
| POST | /cofit | Co-residency advisor | Consultor de co-residencia |
| POST | /priority | Set daily focus priority | Definir prioridade do dia |
| POST | /reload | Hot-reload policy.json | Recarregar policy.json |
| GET | /status | Full broker snapshot | Snapshot completo do broker |
| GET | /leases | Active leases | Leases ativos |
| GET | /ledger | GPU state ledger | Estado da GPU |
| GET | /health | Health check | Verificacao de saude |
A single daemon mediates all GPU access. Clients acquire leases before touching VRAM.
Um unico daemon intermedia todo acesso a GPU. Clientes adquirem leases antes de tocar a VRAM.
Copy files and create the symlink. No pip install, no venv, no Docker.
Copie os arquivos e crie o symlink. Sem pip install, sem venv, sem Docker.
# Copy to tools
cp -r corino-gate/ ~/tools/gpu_gate/
chmod +x ~/tools/gpu_gate/gate
ln -sf ~/tools/gpu_gate/gate ~/bin/gate
One process, one port. Runs on 127.0.0.1:18600.
Um processo, uma porta. Roda em 127.0.0.1:18600.
python3 ~/tools/gpu_gate/gated.py &
CLI for humans, Python client for scripts, REST API for everything else.
CLI para humanos, cliente Python para scripts, API REST para todo o resto.
# CLI
gate status
gate cofit
from gate_client import gate_lease with gate_lease(tag="my_job", vram_mib=15000) as lease: run_inference()
Download Corino-Gate. Zero dependencies. One file. Full control.
Baixe o Corino-Gate. Zero dependencias. Um arquivo. Controle total.
Download corino-gate.zip Baixar corino-gate.zip