The Right Surface for the Work: Instrumenting AI Usage Beyond Token Counts

4 minute read

Most AI usage tracking answers the wrong question. It tells you how many tokens you burned. It does not tell you whether the work even belonged on a paid plan or a metered API in the first place.

That distinction is the whole game.

Tokens are the raw material, not the decision

A token count is a volume reading. It tells you how much you consumed, not whether you consumed it in the right place.

A paid plan is a fixed-cost commitment. You have already spent the money, so the marginal cost of one more request is effectively zero until you hit a limit. A metered API is the opposite: every call is incremental spend that scales directly with usage. The same task can be nearly free on one surface and a noticeable line item on the other, and the token count alone will not tell you which situation you are in.

So the decision I actually care about is which surface a piece of work belonged on. Token usage is an input to that decision, not the answer to it. If I only look at tokens, I am optimizing the bill. If I look at the surface, I am deciding how the work should run. Visibility belongs at the decision layer, not just the token layer.

Where API usage is the right answer

APIs are the right answer for product features, automation, evals, backend workflows, and anything that needs to run as software.

Some concrete cases where I reach for a metered API instead of a paid plan:

Product features. A feature that summarizes user input or drafts a reply has to run unattended, at request time, with predictable latency and a per-call cost I can model. That is an API call, not a chat session.
Automation. A nightly job that triages new GitHub issues, or Autobot picking up scoped repo tasks, runs on its own schedule with no human waiting. The spend should scale with the work done, which is exactly what metered billing does.
Evals. Running a prompt across hundreds of test cases needs reproducibility and the ability to diff one run against the next. That means API plus code, not a window I am typing into by hand.
Backend workflows. Classification, extraction, and enrichment steps where the model is one stage in a larger pipeline. The model output feeds the next step; nobody is sitting there reading it.

The common thread is that the work runs as software, on its own schedule, and the cost should scale with how much of it happens. Paying a flat subscription for that would either cap my throughput or waste the commitment on quiet days.

Where human-in-the-loop work lives

A lot of my AI usage is human-in-the-loop: coding, debugging, planning, writing, research, repo exploration, agent experiments.

For that kind of work, I want to know more than “how many tokens did I use?” I want to know:

which project caused the usage
which tool or model handled it
whether it used a paid plan or a metered API
whether the output was worth it
whether that workflow belongs somewhere else

This is the work that fits a paid plan well. It is interactive, bursty, and hard to predict. I might spend an afternoon deep in one repo and barely touch AI the next day. A fixed-cost plan absorbs that unevenness without making me think about the meter on every message. The question is no longer “what did this cost?” but “is this the surface this work should stay on?”

Sessions-by-project report attributing AI usage across repositories and internal automation projects — Project-level attribution shows where AI usage is actually going, instead of relying on memory.

Project attribution answers “where,” but it does not answer “on which surface.” For that I want the usage broken down by paid plan versus metered API, so I can see whether a workflow is quietly running on the wrong one. A sanitized view of that breakdown looks like this:

Project	Tool / Model	Surface	Worth keeping here?
Coding agent on repo A	IDE assistant (paid plan)	Paid plan	Yes — interactive, bursty
Issue triage automation	Bedrock model (API)	Metered API	Yes — runs unattended, scales with work
Bulk doc summarization	Chat UI (paid plan)	Paid plan	No — should move to a metered API
Eval suite	Direct API calls	Metered API	Yes — reproducible, diffable

The interesting rows are the ones where the surface and the workload disagree — like bulk summarization being run by hand in a chat window when it should be a scripted API job. You cannot spot that from a token count. You can spot it the moment usage is attributed by project, tool, and surface.

What I’m building for it

That is the layer I’m building into Yaah for myself.

Not token shaming. Not trying to use less AI. Just making the usage visible enough that I can keep building with it intentionally.

AI is becoming part of the operating system for how I work. I want instrumentation for that.

This is the same thesis I wrote about when I explained why AI coding agents need an operations layer: the model is only part of the system, and the real work is the visibility, cost controls, and telemetry around it. Surface attribution is one more signal in that operations layer. Tokens tell me the volume; the surface tells me whether the work is in the right place to begin with.

I’m building this for myself first, but if you’re also trying to understand AI usage across paid plans, APIs, coding tools, and agents, I’d be interested in comparing notes.

I share more of my DevOps, cloud, AI automation, and project work at jonprice.io.

Share on

X Facebook LinkedIn Bluesky

Jon Price

The Right Surface for the Work: Instrumenting AI Usage Beyond Token Counts

Tokens are the raw material, not the decision

Where API usage is the right answer

Where human-in-the-loop work lives

What I’m building for it

Share on

You may also enjoy

Central Auth: The Boring Platform Project That Keeps My Apps From Turning Into Permission Spaghetti

Building and Deploying Serverless Applications on AWS: A Practical Guide

The Role of Cloud Platforms in Serverless Architectures

The Role of Monitoring and Debugging in Serverless Architectures