The One Number That Tells You If Your AI Marketing Agent Is Working

The internet is filling up with AI agent ROI frameworks. Seven dimensions. Weighted scoring matrices. Cost-benefit models you need a spreadsheet to run. They look thorough. They are also useless for someone shipping today.
I have been running marketing agents in production for over a year. I have watched teams burn cycles building evaluation dashboards that nobody looked at after week two. The problem with multi-metric ROI frameworks is not that they are wrong. It is that they diffuse attention. When everything matters, nothing gets watched.
Here is a better approach. Pick exactly one number. Check it every Monday. If it moves in the right direction, keep going. If it does not, stop and fix the thing that broke. Everything else is theatre.
Why one number beats seven
The human brain can track one metric with discipline. It cannot track seven. By week three of a multi-metric dashboard, most teams are looking at whichever number makes them feel better and ignoring the rest.
There is a deeper reason too. An agent deployed in production is a system, not a feature. It interacts with your CRM, your email platform, your content pipeline. Each of those systems has its own metrics. If you try to isolate the agent effect across seven different dimensions, you spend more time arguing about attribution than improving the agent.
One number forces clarity. It says: this is what we are trying to change. Everything the agent does either pushes this number up or it does not. If an optimization makes the agent sound smarter but does not move the number, cut it. It was decoration.
Which number to pick
The number must satisfy three conditions. First, it must be measurable with data you already have. If you need a new tracking system, the number is too ambitious. Start with what exists.
Second, it must connect to revenue or cost in a way you can state in one sentence. "The agent improved engagement scores" is not a sentence a founder can take to a board meeting. "The agent increased repeat purchase rate from 14 percent to 19 percent" is.
Third, it must be resistant to gaming. If the number can be moved by changing a definition or excluding a segment, it is the wrong number. Pick something that only moves because the underlying behavior changed.
Examples of good single numbers:
- Repeat purchase rate within 90 days
- Percentage of leads that receive a personal follow-up within 4 hours
- Dormant customer reactivation rate
- Average revenue per retained customer
- Time from lead to qualified pipeline stage
Examples of bad single numbers: "engagement," "satisfaction score," "agent utilization rate," "number of tasks completed."
The Monday morning check
Pick the number. Write it down. Every Monday morning, look at exactly two things: what the number is this week, and what it was the week before you deployed the agent. That is the baseline.
Do not look at the chart of the last six months. Do not check three related metrics "just for context." Two numbers. Now and before.
If the number moved in the right direction, do not change anything. Not the prompt. Not the model. Not the workflow. The most common mistake after a good week is over-optimization. The agent is working. Let it work.
If the number did not move, ask one question: was the failure in the agent output or in the human workflow around it? Most agent "failures" are actually workflow failures. The agent produced the right output but nobody acted on it. Or the output went to the wrong person. Or the data the agent needed was stale. Fix the workflow before you touch the agent.
If the number moved in the wrong direction, stop the agent immediately. Do not tweak the prompt and "see if next week is better." An agent that is actively making things worse should not be running. Diagnose offline, fix, redeploy.
What I learned from ignoring my own advice
Last fall I had an agent running follow-up sequences for a services pipeline. I picked the number: qualified leads that received a response within 24 hours. Good number. Measurable. Connected to revenue.
Then I got curious. I added a second number: average response quality score. Then a third: agent suggestion acceptance rate. Then a fourth: time spent per lead by the human reviewer.
Within three weeks I was looking at the dashboard instead of the business. The original number had improved by 40 percent and I barely noticed because I was distracted by the other three, which were noisy and inconclusive.
I killed the extra metrics. Went back to one number. The clarity returned immediately. The lesson stuck: evaluation is not about comprehensiveness. It is about attention. You can only pay attention to one thing at a time. Make it the thing that matters.
The exception: when you need a second number
There is one case where a second number is warranted. If your single number is a lagging indicator that takes months to move, you need a leading indicator to track in the meantime.
For example, if your number is "repeat purchase rate within 90 days," you cannot wait three months to find out if the agent is working. Pick a leading indicator: "percentage of first-time buyers who open and click the post-purchase email sequence." That number moves within days and predicts the 90-day number.
One leading indicator. Not five. And kill it once the lagging indicator starts reporting. You do not need both.
The test to run right now
If you have an agent running today, do this: look at whatever you are currently measuring. Count the metrics. If there are more than two, pick one and write it on a sticky note. Put the sticky note on your monitor. For the next four Mondays, look at nothing else.
If the number moved by the end of the month, you have a working agent and a working evaluation system. If it did not move, you learned something useful: either the agent is not working, or you picked the wrong number. Either way, you know more than you did with seven metrics and a dashboard.
If you want to see what a custom AI growth operator could return inside your firm, the Growth Audit Call maps your pipeline, follow-up, and reactivation gaps in a single conversation. No dashboards. One clear picture.
Enjoyed this? Get more like it in your inbox.
Subscribe to The Agentic Dispatch →