Open Source

Open source AI inference efficiency and attribution

Measure, attribute, and act on energy consumption across GPU infrastructure — turning raw hardware telemetry into the operational intelligence that infrastructure, finance, and sustainability teams need.

Get started View on GitHub

v0.3.0 — Latest release Apache 2.0 Changelog

What's in Aitra Meter v0.3.0

Everything Aitra Meter measures and ships — with the v0.3.0 additions woven in: the DCGM energy provider, the model-level AI-efficiency metric family, cost-budget and TTFT alerting, per-GPU / per-model attribution, and Grafana as the default dashboard surface.

How it works

GPU energy, measured at the token level

Aitra Meter reads GPU power directly via NVML (default), DCGM, or the Zeus community sidecar, correlates it with token output from your inference server, and computes J/token continuously — per workload, per model, per hardware tier.

Aitra Meter connects GPU hardware energy to AI output volume at the token level. It runs entirely inside a single Kubernetes cluster — a DaemonSet agent, an aggregation service, a SQLite store, and a dashboard. One Helm install, no changes to your inference server code.

Read the architecture docs

aitra_j_per_token

Joules per output token — workload × model × hardware. The energy cost of a single generated token.

aitra_co2_per_token_grams

gCO₂ per token — J/token × grid carbon intensity. Track and report your inference carbon footprint.

aitra_cost_per_million_tokens_usd

$/M tokens — J/token × electricity cost. The per-token cost visible to finance and platform teams.

aitra_idle_time_ratio

Fraction of the last hour spent idle per node. Surface over-provisioned GPU capacity instantly.

aitra_model_energy_per_1m_tokens

Joules per one million output tokens, by model — the headline primitive of the v0.3.0 model-level AI-efficiency family.

Full metrics reference →

Cloud Native

Cloud native from the ground up

Designed to drop into your existing cloud native stack — not replace it. Works alongside the tools you already run on Kubernetes.

Prometheus

ServiceMonitor auto-registers with kube-prometheus-stack on install. All aitra_* metrics available instantly in PromQL.

Grafana

The default visualization surface. Audience-specific dashboards — platform, finance, and sustainability — auto-provisioned via Helm. Standalone dashboard available as an opt-in.

OpenTelemetry

OTLP export of gen_ai.infrastructure.energy.* metrics to any OTel collector. Opt-in via a single Helm value.

KEDA

Scale on aitra_j_per_token and aitra_idle_time_ratio. Reference ScaledObjects included in examples/keda/.

OpenCost

Complementary — OpenCost gives $/GPU-hr, Aitra Meter gives $/M tokens. Cost-per-token recording rules integrate directly with OpenCost. Combined Grafana panel in examples/grafana/.

Inference Servers

vLLM, TGI, SGLang, and Ollama via their /metrics endpoints. Extensible InferenceMetricsProvider interface for custom servers.

Alerting

Cost-budget and Time-To-First-Token (TTFT) alerts ship with runbooks and Alertmanager rules. Budgets are Helm-configurable per namespace.

Open source and community-driven

Aitra Meter is community-driven. All components are available under the Apache 2.0 License on GitHub. There are three high-value contribution paths — a new inference server, a new energy backend, or a new measurement agent in any language.

Open source AI inference efficiency and attribution

What's in Aitra Meter v0.3.0

GPU energy, measured at the token level

Cloud native from the ground up

Open source and community-driven

Open Governance