The biggest risk of AI isn’t that it can make mistakes. It’s that it produces answers so fast, and packages them so convincingly, that we stop noticing when it does.
Generating an answer, even to an extremely complex question, has become effortless. Consider a straightforward workplace analytics task: a business leader highlights a growing risk and asks what trends we are seeing in our marketing engagement data. With modern AI tools, all you really have to do to generate a response is copy-paste the email thread into code assist and provide some basic guidance on the data sources. Within 15 minutes it could be done, with a clean executive summary, crisp charts, and a coherent narrative.
The problem is you’d have approximately no idea whether any of it is believable. Do you send it right away? Do you review the code line-by-line, eating back the precious time you tried to save in the first place?
And before you say well, just train AI to evaluate itself, ask yourself if that would really make you trust the output more? Your answer is probably no, and it’s worth working through exactly why that is. What’s needed is the right evaluation framework that will build confidence in the process while still realizing efficiency improvements. For now, there are a few reasons why this critical role needs to be owned by a human being:
People are willing to trust people, but not always AI
Imagine I handled the workplace task above without using AI – manually writing the code, interpreting the results, making the charts, and drafting the executive summary. And let’s say the business leader strongly disagreed with my interpretation and recommendations. For human-human interactions, this is a familiar scenario. We can schedule a meeting, discuss how we ended up in different places interpreting the results and next steps, and come to an agreement on next steps that we can both accept.
The same scenario if I’d just directly shared the AI output is much more complicated. Consider a simple case of an error in the analysis. If I’d made the error we have clear processes in place – it can reflect in my performance evaluation and reputation, I can be asked to take a training course or receive mentorship, or I could be reassigned to a different type of work. If I violated a company code or broke a law, I could be fired, fined, or sent to jail.
But the accountability if the model made the error is unclear. Sure we could retrain the model, but how do we know fixing this issue wouldn’t create five more? Even more concerning is that if we ask the same thing again using slightly different language, we might not even get the same answer. And the reputation that will be damaged isn’t AI’s – it’s the human who decided to ship the erroneous response.
We simply do not have a framework for holding AI accountable without effectively treating a human, at some level, as the owner. And if something can’t be held accountable, it is going to be hard to trust. Ultimately we understand the range of moral, legal, and ethical frameworks that other humans generally live by, and this makes them trustworthy in a way that machines are not yet.
The better AI gets, the harder evaluation becomes
To state the obvious, humans are not machines. But as AI improves and becomes better and better at modeling human behavior, the line will become increasingly blurred. AI failures will look increasingly human and persuasive, eroding the feedback loop and degrading AI performance.
But hold on, you might say. If the errors are getting harder to detect isn’t that just a different way of saying performance is improving, not degrading? It’s not necessarily the case though. Behavioral economists have spent decades identifying cognitive biases and heuristics that lead humans to make decisions that feel right but are not optimal – even in extremely high stakes conditions. The list of biases is long. As AI mimics humans more and more, will its own cognitive biases emerge?
The example that most worries me is confirmation bias, which is the psychological tendency for humans to selectively filter information in a way that supports their preexisting beliefs. Put simply, we like it when we hear something that confirms we were right a lot more than something challenging our beliefs. In a world where every chat with AI leans into this bias by telling you “that is a very meaningful thought and one you should take seriously”, it will become increasingly hard to catch enough errors to meaningfully improve the system.
Ironically, the very human behaviors we turned to machines to escape through mathematical rigor and automation could end up baked into the machines themselves — laundered through an algorithm, and handed back to us with more confidence than ever. Which means that evaluating AI isn’t just about checking the math. It’s about recognizing when a model has learned to reflect our own blind spots back at us, and having a disciplined evaluation framework to keep AI objective.
Someone has to tell AI what “good” looks like – and it’s not easy
There is a common fallacy, I think, that if we can simply create enough metrics, AI will be able to deduce the proper way to balance them and produce an objectively optimal output. But this ignores the fundamental limitation that all these metrics exist to measure something. And those things are ultimately human, tied to the goals we have, the moral code we want to live by, and the values that drive our decisions.
Take an example from baseball. If you watch baseball a lot like I do, you’ve seen an explosion of metrics in recent years. A century ago you might know a hitter’s average, home runs and RBI; now we keep track of who’s leading in exit velocity. But do we know how important hitting the ball fast is to actually winning a game? And could focusing on a metric like exit velocity actually lead to unintended negative consequences like tons of strikeouts?
“When a measure becomes a target, it ceases to be a good measure” – this is Goodhart’s Law. In theory we can ask AI to consider an ever growing list of metrics to represent and approximate the real goals we are trying to achieve. But we will always be faced with the fact that once you set the metrics, the system can be gamified.
And the risk of this with AI is high because it’s an extremely powerful gamifier. Fundamentally, the technology works by looking for predictive associations across huge sets of data points. It’s all too easy to envision how it can find unintended associations and exploit them, ultimately driving metrics that diverge more and more from the real goals we wanted to achieve.
What does it all mean for the value of human judgment in an AI powered world?
The time and effort it takes to generate content with AI is collapsing toward zero. And what it actually creates is becoming increasingly compelling. Its arguments are persuasive, its organization is sophisticated, and its graphics are fancy. It’s easy to feel right now like anything we can do, it can do better.
But at the end of the day, the consumer of all that generated content is still a human being. And humans are likely to stay in that driver seat for some time because we’re simply not ready to afford AI the same level of trust. We don’t have the right accountability framework and we don’t know how to tell it what to do in a way that will sustain for the long term. I think that working as an evaluator at the AI-human interface will remain a valuable human contribution for a long time to come.
I’m not saying that AI won’t be the evaluator in many cases. As the volume of what is produced increases by orders of magnitude there simply won’t be time for humans to comb through everything. But for major decisions involving large risk/reward tradeoffs, with legal or moral implications, or without a clear answer, we’re going to want a human in charge.
The shift from maker to evaluator isn’t a demotion. It’s a different skill set, and an undervalued one. Thinking Empirically is about building exactly that, the habit of asking the right questions, stress testing the answers you get, and learning when to trust what’s in front of you. Those skills matter more than ever.