Open Source

Open source AI inference efficiency and attribution

Measure, attribute, and act on energy consumption across GPU infrastructure — turning raw hardware telemetry into the operational intelligence that infrastructure, finance, and sustainability teams need.

v0.8.0 — Latest release Apache 2.0 Changelog
Aitra
Quick install
$ kubectl label node <gpu-node> aitra-ai.github.io/gpu=true && helm repo add aitra https://aitra-ai.github.io/helm-charts && helm install aitra-meter aitra/aitra-meter --namespace aitra-system --create-namespace
Full guide →
J/token measurement
Joules per output token — continuously measured across every workload × model × hardware combination in your cluster.
Multi-hardware support
NVIDIA via NVML, AMD via ROCm, CPU+DRAM, and Jetson — through the Zeus energy sidecar (default) or a pure-Go NVML backend (alternative). No hardware lock-in.
Rich attribution
Every measurement labeled by namespace, workload, model, and team. Full dimensional data model for chargeback, cost allocation, and capacity planning.
Carbon & cost tracking
$/M tokens and gCO₂/token from J/token × electricity cost and grid intensity. Idle GPU power tracking to surface and eliminate waste.
CNCF integrations
Native integration with Prometheus, Grafana, OpenTelemetry, KEDA, and OpenCost. ServiceMonitor auto-registers with kube-prometheus-stack on install.
Simple operation
One Helm install. No infrastructure changes. No application code changes. Developed in Go — statically linked binaries, all pods running within 60 seconds.
How it works

GPU energy, measured at the token level

Aitra Meter reads GPU power directly via NVML or the Zeus sidecar, correlates it with token output from your inference server, and computes J/token continuously — per workload, per model, per hardware tier.

Aitra Meter connects GPU hardware energy to AI output volume at the token level. It runs entirely inside a single Kubernetes cluster — a DaemonSet agent, an aggregation service, a SQLite store, and a dashboard. One Helm install, no changes to your inference server code.

Read the architecture docs
aitra_j_per_token

Joules per output token — workload × model × hardware. The energy cost of a single generated token.

aitra_co2_per_token_grams

gCO₂ per token — J/token × grid carbon intensity. Track and report your inference carbon footprint.

aitra_cost_per_million_tokens_usd

$/M tokens — J/token × electricity cost. The per-token cost visible to finance and platform teams.

aitra_idle_time_ratio

Fraction of the last hour spent idle per node. Surface over-provisioned GPU capacity instantly.

Full metrics reference →
Cloud Native

Cloud native from the ground up

Designed to drop into your existing cloud native stack — not replace it. Works alongside the tools you already run on Kubernetes.

Prometheus
ServiceMonitor auto-registers with kube-prometheus-stack on install. All aitra_* metrics available instantly in PromQL.
Grafana
Pre-built dashboard JSON with six views — J/token live table, cluster trend, namespace chargeback, idle consumption, carbon and cost. Auto-provisioned via sidecar.
OpenTelemetry
OTLP export of gen_ai.infrastructure.energy.* metrics to any OTel collector. Opt-in via a single Helm value.
KEDA
Scale on aitra_j_per_token and aitra_idle_time_ratio. Reference ScaledObjects included in examples/keda/.
OpenCost
Complementary — OpenCost gives $/GPU-hr, Aitra Meter gives $/M tokens. Combined Grafana panel in examples/grafana/.
Inference Servers
vLLM, TGI, SGLang, and Ollama via their /metrics endpoints. Extensible InferenceMetricsProvider interface for custom servers.
Open Source

Open source and community-driven

Aitra Meter is community-driven. All components are available under the Apache 2.0 License on GitHub. There are three high-value contribution paths — a new inference server, a new energy backend, or a new measurement agent in any language.

Open Governance

Aitra Meter is a SODA Foundation project with public governance, a published roadmap, and open maintainership.

SODA Foundation