This picks up from my previous article.
When I began this series, I set out to explore whether a Mixture of Experts (MoE) architecture could form the foundation of a viable AI therapy tool. After working with several software clients who were trying to build exactly that, I've come to a conclusion I didn't expect to reach: we aren't there yet, and anything shipped today without acknowledging that is being built without forethought to the consequences.
This is my closing article on the subject. Not because the core ideas were wrong, but because the problems run deeper than architecture.
The Sycophancy Problem
The central issue isn't compute, or model size, or even architecture — it's sycophancy. Normally, there is a certain level of people pleasing that we all do to help facilitate social interactions. However a LLM doesn't have this social awareness to know how to employ that. Instead, it's just a mirror of the underlying language it was trained on.
If you've used a modern AI chatbot for any length of time, you've probably noticed it has a tendency to agree with you, validate you, and mirror your framing back to you. That's annoying in a productivity tool and is already getting people into delusions. How do you ensure that the model is not only resistant to it's own delusions but the delusions that a user brings into the conversation?
Consider a user who comes to a therapy chatbot in a state of catastrophizing. They're convinced that a bad day at work means their career is over, or that a conflict with a friend means they are fundamentally unlovable. A sycophantic response would generally follow the form:
"That [issue] sounds really hard. Big things can feel very overwhelming and you're right to be frustrated."
It's warm. It feels supportive. And yet it does absolutely nothing to challenge the distortion. In fact, it reinforces it. Sometimes supporting someone means breaking them from a distorted version of reality that isn't supporting them.
What a good therapeutic response actually looks like is almost the opposite. It grounds the user in the specific reality of the moment — asking what led up to it, what people actually said, what else was going on in the environment. The goal is to get the user to step outside the catastrophized version of events and examine it from multiple angles.
The reason runs all the way down to the training data. The source material available for training these models is almost entirely one of two things: either it is sycophantic in origin — the kind of supportive, affirming language that dominates online mental health spaces — or it is overtly clinical, the kind of language that belongs in a diagnostic manual, not a conversation. Neither is appropriate. What's needed is something in between: a model that can hold a warm, human conversation while also being grounded enough to push back on a user's defense mechanisms when needed.
But tell me, fellow AI developer, where's that training data when those conversations are hidden deeply behind confidentiality?
What The Client Work Taught Me
Working with clients on this problem, the engagements typically started optimistically. The conversations were about architecture: what should these expert models actually be? How should they be classified? What data should they be trained on? How do we run this on a realistic hardware budget — which, it turned out, was almost always an unrealistic expectation given what this actually requires.
The projects ran into walls. And for me personally, the conclusion was less dramatic than a single failure moment and more like a slow realization: these models never matured to the point where I was satisfied from an engineering standpoint that they were performing better than the general frontier models already available. And those frontier models — with far more resources behind them — were already visibly struggling with sycophancy when they're being forced in therapeutic contexts.
Worse, people were already using those general models for therapy at scale, despite explicit warnings not to. The consequences have been serious. There are documented cases of AI-induced psychosis and tragically, people who have ended their lives in connection with these interactions. The models are powerful enough to affect people's moods, their relationships, and their grip on reality. That has to be the starting point for any ethical conversation about building in this space rather than an ethics check at the end.
The Bar We Should Be Holding Ourselves To
Think about how we handle therapeutic medications. Powerful tools that affect mood and mental state are not handed out without a licensed, specialized human in the loop. Someone who can assess, adjust, intervene, or force discontinuation. Maybe it's not a perfect analogy, but people are strongly affected by these relationships they're forming with their chatbots. One that can be addictive and life ruining without someone who is also observing and correcting bad responses. These AI tools are already demonstrably affecting people in comparable ways, and we don't have an equivalent safeguard structure in place.
Knowing when to push back on a patient's defense mechanisms is something many experienced human therapists find difficult. It requires reading the room, knowing the person's history, understanding what they can handle in a given moment. The rules are soft and human and contextual. Before we can expect machines to navigate that well, we need to be able to articulate clearly what the rules even are.
So should AI have any role in therapy at all? In my opinion, yes, but a specific and limited one. The most defensible application right now is augmenting existing therapeutic practices: taking the journaling exercises, reflection prompts, and homework assignments that therapists already assign and making them more interactive and responsive. It keeps a human practitioner in the loop and doesn't ask the model to do what it isn't ready to do. It also gives the user the level of interactivity and intimacy on their own time that people are craving chatbots for already.
But even that application needs hard checks on sycophancy built in from the ground up. It can't be an afterthought.
The business case for counseling and chatbots seems obvious. The real challenge is finding the market. Building an app that integrates LLM for counseling means designing a whole mental health app from the ground up right now. The LLM itself is not handling every detail of the interaction all at once, the app stores the history and assigns the reflection work. Then the LLM can handle some of that interaction case by case utilizing that stored history. That requires at least a traditional relational database plus a means to effectively compact that into a vectorized database for the LLM. That is a very different project than than fine tuning an LLM to be a general counselor.
Beyond project creep, there are already competing apps that stem from mindfullness and self care like Yuna or BetterMe who have taken the path I outlined. So despite looking like an unserved market to outside disruptors, it is one that is quite difficult for structural reasons and there are already players in this space.
A Note To Developers
If you're excited about building in this space — and the potential here is real, so that excitement is understandable — I'd ask you to sit with one question before you ship:
Is this tool more qualified to help users than what already exists, and have I proven that?
The principle of least harm should govern deployment decisions. A tool that hasn't demonstrated it can outperform existing options, and hasn't been stress-tested against the failure modes specific to mental health contexts, shouldn't be in front of vulnerable users. Full stop. Optimism about what the architecture could eventually do is not a substitute for evidence of what it does now.
The MoE concept is still sound. The vision of specialized expert models working in concert, overseen by a human practitioner, guided by principled data selection and hard sycophancy constraints, is still the right direction. But direction isn't destination. We have a long way to go, and the people who will be most affected by getting it wrong deserve better than a product built on forethought.
