v2.0 — Zero Dependenciesv2.0 — Zero Dependenciasv2.0 — 零依赖

Corino-Gate

Swaps models intelligently. 74KB Python, zero deps, works 4 you.

Troca modelos de forma inteligente. 74KB Python, zero deps, trabalha por você.

智能切换模型。74KB Python,零依赖,为你而生。

Download .zip Baixar .zip 下载 .zip Quick Start → Inicio Rapido → 快速开始 →
$ gate status
GPU 0  RTX 4060  2.1/8.0 GiB free
Leases: 1 active  |  Priority: focus
RAM guard: OK  |  Fragmentation: low
$ gate cofit
llama3.2:7b + mistral:7b = 6.8G ✓

Everything you need to tame GPU contention

Tudo que voce precisa para domar a GPU

Single daemon, stdlib-only Python, zero external dependencies. Designed for RTX 5090 but works on any NVIDIA GPU.

Daemon unico, Python stdlib puro, zero dependencias externas. Projetado para RTX 5090, funciona em qualquer GPU NVIDIA.

🔒

VRAM Admission Control

Controle de Admissao VRAM

Priority-based leases with TTL. No workload exceeds the GPU budget. Automatic expiration of idle leases.

Leases baseados em prioridade com TTL. Nenhum workload excede o budget da GPU. Expiracao automatica de leases ociosos.

Anti-Thrash Engine

Motor Anti-Thrash

Adaptive throttling that scales with GPU contention. Prevents model load/unload storms that kill throughput.

Throttling adaptativo que escala com contencao da GPU. Evita tempestades de load/unload que destroem throughput.

🧩

Co-Residency Advisor

Consultor de Co-Residencia

The /cofit endpoint tells you which models can run in parallel, fitting within VRAM constraints.

O endpoint /cofit diz quais modelos rodam em paralelo, encaixando nos limites de VRAM.

📋

Execution Planner

Planejador de Execucao

Computes optimal sequential and parallel schedules for multi-model pipelines. Maximum GPU utilization.

Calcula schedules otimos sequenciais e paralelos para pipelines multi-modelo. Utilizacao maxima da GPU.

🛡

RAM Pressure Guard

Guarda de Pressao RAM

Three-level system: warn, pressure, emergency. Manages OOM scores. Prevents system-wide crashes from runaway processes.

Sistema de tres niveis: alerta, pressao, emergencia. Gerencia OOM scores. Previne crashes do sistema por processos descontrolados.

🔍

Squatter Enforcement

Deteccao de Invasores

Detects unmanaged GPU processes consuming VRAM outside the broker. Full visibility into rogue workloads.

Detecta processos GPU nao gerenciados consumindo VRAM fora do broker. Visibilidade total de workloads nao autorizados.

Probationary Leases

Leases Probatorios

Unused leases are auto-demoted. No resource hoarding. GPU time goes to workloads that actually need it.

Leases ociosos sao automaticamente rebaixados. Sem acumulacao de recursos. Tempo de GPU vai pra quem precisa.

🧊

Fragmentation Detection

Deteccao de Fragmentacao

Monitors GPU memory fragmentation. Alerts when allocation patterns degrade performance below SLO thresholds.

Monitora fragmentacao de memoria GPU. Alerta quando padroes de alocacao degradam performance abaixo dos limiares SLO.

🔄

Soft Preemption

Preempcao Suave

Drains active requests before evicting. No killed inferences. Graceful handoff between priority levels.

Drena requests ativos antes de despejar. Nenhuma inferencia morta. Handoff gracioso entre niveis de prioridade.

🚨

Boot Safe Mode

Modo Seguro de Boot

After crashes, starts in conservative mode. Reduced allocations until stability is confirmed. Self-healing.

Apos crashes, inicia em modo conservador. Alocacoes reduzidas ate estabilidade confirmada. Auto-recuperacao.

Hot-Reload Policy

Recarga de Politica a Quente

Change policy.json and POST /reload. No restarts. Zero downtime configuration.

Altere policy.json e POST /reload. Sem restarts. Configuracao sem downtime.

Latency SLOs

SLOs de Latencia

Per-priority-class latency targets. The broker enforces response time contracts across competing workloads.

Alvos de latencia por classe de prioridade. O broker aplica contratos de tempo de resposta entre workloads concorrentes.

REST API on port 18600

API REST na porta 18600

All endpoints accept and return JSON. The daemon binds to 127.0.0.1 by default.

Todos endpoints aceitam e retornam JSON. O daemon escuta em 127.0.0.1 por padrao.

MethodPathDescriptionDescricao
POST/acquireRequest a VRAM leaseSolicitar um lease VRAM
POST/releaseFree a leaseLiberar um lease
POST/renewExtend lease TTLEstender TTL do lease
POST/cofitCo-residency advisorConsultor de co-residencia
POST/prioritySet daily focus priorityDefinir prioridade do dia
POST/reloadHot-reload policy.jsonRecarregar policy.json
GET/statusFull broker snapshotSnapshot completo do broker
GET/leasesActive leasesLeases ativos
GET/ledgerGPU state ledgerEstado da GPU
GET/healthHealth checkVerificacao de saude

How it works

Como funciona

A single daemon mediates all GPU access. Clients acquire leases before touching VRAM.

Um unico daemon intermedia todo acesso a GPU. Clientes adquirem leases antes de tocar a VRAM.

Clients
Clientes
Ollama
PyTorch
llama.cpp
ComfyUI
Custom Scripts
Scripts Custom
↓ acquire / release / renew ↓
gated.py — :18600
policy.json
ram_guard.py
lifecycle.py
↓ nvidia-smi ↓
NVIDIA GPU (RTX 5090 / 23.9 GiB)

Up and running in 60 seconds

Funcionando em 60 segundos

Install

Instalar

Copy files and create the symlink. No pip install, no venv, no Docker.

Copie os arquivos e crie o symlink. Sem pip install, sem venv, sem Docker.

bash
# Copy to tools
cp -r corino-gate/ ~/tools/gpu_gate/
chmod +x ~/tools/gpu_gate/gate
ln -sf ~/tools/gpu_gate/gate ~/bin/gate

Start the daemon

Iniciar o daemon

One process, one port. Runs on 127.0.0.1:18600.

Um processo, uma porta. Roda em 127.0.0.1:18600.

bash
python3 ~/tools/gpu_gate/gated.py &

Use it

Usar

CLI for humans, Python client for scripts, REST API for everything else.

CLI para humanos, cliente Python para scripts, API REST para todo o resto.

bash
# CLI
gate status
gate cofit
python
from gate_client import gate_lease

with gate_lease(tag="my_job", vram_mib=15000) as lease:
    run_inference()

Rated by the AIs that built it

Avaliado pelas IAs que o construiram

DeepSeek V3.2
10/10
"Flawless resource isolation"
"Isolamento de recursos perfeito"
NVIDIA NIM
10/10
"Production-grade memory management"
"Gerenciamento de memoria production-grade"
Claude Opus
10/10
"Elegant zero-dependency design"
"Design elegante com zero dependencias"

Minimal by design

Minimal por design

🐍
Python 3.11+
stdlib only
apenas stdlib
💻
nvidia-smi
any NVIDIA GPU
qualquer GPU NVIDIA
🦙
Ollama
optional
opcional

Ready to take control of your GPU?

Pronto para assumir o controle da sua GPU?

Download Corino-Gate. Zero dependencies. One file. Full control.

Baixe o Corino-Gate. Zero dependencias. Um arquivo. Controle total.

Download corino-gate.zip Baixar corino-gate.zip