In today’s column, I am continuing my ongoing series about the impact of generative AI in the health and medical realm. The focus this time is on the use of generative AI to aid in performing mental health therapy, which I’ve previously covered extensively from a variety of perspectives such as the client-therapist relationship transformations at the link here and where things are headed concerning the levels of AI-based mental therapy autonomous guidance at the link here, just to name a few.
The particular interest here is whether generative AI can perform mental health reasoning.
Allow me to emphasize that the catchphrase of “mental health reasoning” is the place where we all enter into a complex murky space and ought to be extremely mindful of what the expression signifies and what if anything it has to do with AI, including and especially generative AI.
I shall first set the stage for this elucidation and lay out the course of the journey herein.
Words About Thoughts Are Very Important
Suppose that a person opts to use generative AI for advice on a mental health concern. The person engages in a back-and-forth dialogue with the generative AI, such as an illustrative discourse that I closely analyzed in detail regarding experiencing ADHD (attention deficit hyperactivity disorder), see my analysis at the link here. There isn’t a human therapist in the loop. In other words, this is someone who has sought out the use of generative AI on their own and is not under the care of a human therapist.
In case you think the above scenario is an outlier or rarity, please be aware that the use of generative AI for mental health guidance is readily available right now. There are an estimated one hundred million weekly active users of ChatGPT, the widely and wildly popular generative AI app by OpenAI, and some proportion of those users can easily consult with ChatGPT for mental health advice at the drop of a hat.
As I have repeatedly warned and noted, see the link here, a humongous-sized global experiment about mental health therapy is underway via the advent of generative AI. You can merely log into generative AI and instantly engage in a seemingly fluent discussion about your mental health.
Easy-peasy.
My grist for the mill in today’s discussion is how generative AI is putting together the mental health guidance that the AI is showcasing to a person using the AI. Some AI insiders have labeled the interactivity as being based on “mental health reasoning,” which is an oft-used but tenuously overloaded catchphrase and requires careful unpacking.
Readers of my column postings know that I already disfavor the batting around of indications such as AI that supposedly “understands”, “thinks”, or “reasons.” That terminology is heavily embedded in the human experience of how we view and assess human capacities. In my view, using those same words to describe AI is a dismal anthropomorphizing of AI. You are essentially overtly or in a wink-wink manner ascribing sentient qualities to a non-sentient AI.
Just to be abundantly clear, we do not have sentient AI. Maybe someday we will. We don’t now. I realize this harsh news is a bit of a shocker since banner headlines keep trying to claim that we do have sentient AI. Do not buy into those outsized clickbait fakeries. The use of words like understands, thinks, and reasons are all subtle and sneaky ways to lead you down a primrose path to believing that contemporary AI is sentient.
There are those that unfortunately by happenstance or intent worsen these weighty matters.
Many AI luminaires carry on social media debates where they argue endlessly and without fruitfulness that AI is either able to understand things or is not able to understand things. The same goes with whether AI can think, or AI does not think. Lamentedly, these debates that might be useful if done with proper civility and care, usually digress into abhorrent tweet-length name-calling and outright profane mudslinging. Sad. Really, really sad.
For right now, a major and vital keystone that undercuts any such set of arguments is that we do not yet definitively agree on what it means when someone or something can think, understand, or reason (i.e., far beyond what an everyday common dictionary says). Without sturdy goalposts to define the underlying meaning, you can keep shouting at each other until the cows come home. One side is using one particular definition or meaning, while the other side is using something else. They pass each other without even necessarily realizing they are debating quite different guises at the get-go.
Okay, so let’s be real, I’m not going to solve those heat-but-no-light jumbled discourses here. I will though gladly and earnestly walk you through the ins and outs of so-called mental health reasoning to help highlight what this portends for generative AI. Is there reasoning going on? What is the nature of this reasoning? How would we know if reasoning was taking place? Etc.
We have a lot of ground to cover.
I’d like to start by making sure you are familiar with the AI side of things.
Core Background About Generative AI And Large Language Models
Here is some quick background about generative AI to make sure we are in the same ballpark about what generative AI and also large language models (LLMs) consist of. If you are already highly versed in generative AI and LLMs, you might skim this quick backgrounder and then pick up once I get into the particulars of this specific use case.
I’d like to start by dispelling a myth about generative AI. Banner headlines from time to time seem to claim or heartily suggest that AI such as generative AI is sentient or that it is fully on par with human intelligence. Don’t fall for that falsity, please.
Realize that generative AI is not sentient and only consists of mathematical and computational pattern matching. The way that generative AI works is that a great deal of data is initially fed into a pattern-matching algorithm that tries to identify patterns in the words that humans use. Most of the modern-day generative AI apps were data trained by scanning data such as text essays and narratives that were found on the Internet. Doing this was a means of getting the pattern-matching to statistically figure out which words we use and when we tend to use those words. Generative AI is built upon the use of a large language model (LLM), which entails a large-scale data structure to hold the pattern-matching facets and the use of a vast amount of data to undertake the setup data training.
There are numerous generative AI apps available nowadays, including GPT-4, Bard, Gemini, Claude, ChatGPT, etc. The one that is seemingly the most popular would be ChatGPT by AI maker OpenAI. In November 2022, OpenAI’s ChatGPT was made available to the public at large and the response was astounding in terms of how people rushed to make use of the newly released AI app. As noted earlier, there are an estimated one hundred million active weekly users at this time.
Using generative AI is relatively simple.
You log into a generative AI app and enter questions or comments as prompts. The generative AI app takes your prompting and uses the already devised pattern matching based on the original data training to try and respond to your prompts. You can interact or carry on a dialogue that appears to be nearly fluent. The nature of the prompts that you use can be a make-or-break when it comes to getting something worthwhile out of using generative AI and I’ve discussed at length the use of state-of-the-art prompt engineering techniques to best leverage generative AI, see the link here.
The conventional modern-day generative AI is of an ilk that I refer to as generic generative AI.
By and large, the data training was done on a widespread basis and involved smatterings of this or that along the way. Generative AI in that instance is not specialized in a specific domain and instead might be construed as a generalist. If you want to use generic generative AI to advise you about financial issues, legal issues, medical issues, and the like, you ought to not consider doing so. There isn’t enough depth included in the generic generative AI to render the AI suitable for domains requiring specific expertise.
AI researchers and AI developers realize that most of the contemporary generative AI is indeed generic and that people want generative AI to be deeper rather than solely shallow. Efforts are stridently being made to try and make generative AI that contains notable depth within various selected domains. One method to do this is called RAG (retrieval-augmented generation), which I’ve described in detail at the link here. Other methods are being pursued and you can expect that we will soon witness a slew of generative AI apps shaped around specific domains, see my prediction at the link here.
You might be used to using generative AI that functions in a principled text-to-text mode. A user enters some text, known as a prompt, and the generative AI app emits or generates a text-based response. Simply stated, this is text-to-text. I sometimes describe this as text-to-essay, due to the common practice of people using generative AI to produce essays.
The typical interaction is that you enter a prompt, get a response, you enter another prompt, you get a response, and so on. This is a conversation or dialogue. Another typical approach consists of entering a prompt such as tell me about the life of Abraham Lincoln, and you get a generated essay that responds to the request.
Another popular mode is text-to-image, also called text-to-art. You enter text that describes something you want to be portrayed as an image or a piece of art. The generative AI tries to parse your request and generate artwork or imagery based on your stipulation. You can iterate in a dialogue to have the generative AI adjust or modify the rendered result
We are heading beyond the simple realm of text-to-text and text-to-image by shifting into an era of multi-modal generative AI, see my prediction details at the link here. With multi-modal generative AI, you will be able to use a mix of combinations or modes, such as text-to-audio, audio-to-text, text-to-video, video-to-text, audio-to-video, video-to-audio, etc. This will allow users to incorporate other sensory devices such as using a camera to serve as input to generative AI. You then can ask the generative AI to analyze the captured video and explain what the video consists of.
Multi-modal generative AI tremendously ups the ante regarding what you can accomplish with generative AI. This unlocks a lot more opportunities than being confined to merely one mode. You can for example mix a wide variety of modes such as using generative AI to analyze captured video and audio, which you might then use to generate a script, and then modify that script to then have the AI produce a new video with accompanying audio. The downside is that you can potentially get into hot water more easily due to trying to leverage the multi-modal facilities.
Allow me to briefly cover the hot water or troubling facets of generative AI.
Today’s generative AI that you readily run on your laptop or smartphone has tendencies that are disconcerting and deceptive:
- (1) False aura of confidence.
- (2) Lack of stating uncertainties.
- (3) Lulls you into believing it to be true.
- (4) Uses anthropomorphic wording to mislead you.
- (5) Can go off the rails and do AI hallucinations.
- (6) Sneakily portrays humility.
I’ll briefly explore those qualms.
Firstly, generative AI is purposely devised by AI makers to generate responses that seem confident and have a misleading appearance of an aura of greatness. An essay or response by generative AI convinces the user that the answer is on the up and up. It is all too easy for users to assume that they are getting responses of an assured quality. Now, to clarify, there are indeed times when generative AI will indicate that an answer or response is unsure, but that is a rarity. The bulk of the time a response has a semblance of perfection.
Secondly, many of the responses by generative AI are really guesses in a mathematical and statistical sense, but seldom does the AI indicate either an uncertainty level or a certainty level associated with a reply. The user can explicitly request to see a certainty or uncertainty, see my coverage at the link here, but that’s on the shoulders of the user to ask. If you don’t ask, the prevailing default is don’t tell.
Thirdly, a user is gradually and silently lulled into believing that the generative AI is flawless. This is an easy mental trap to fall into. You ask a question and get a solid answer, and this happens repeatedly. After a while, you assume that all answers will be good. Your guard drops. I’d dare say this happens even to the most skeptical and hardened of users.
Fourth, the AI makers have promulgated wording by generative AI that appears to suggest that AI is sentient. Most answers by the AI will typically contain the word “I”. The implication to the user is that the AI is speaking from the heart. We normally reserve the word “I” for humans to use. It is a word bandied around by most generative AI and the AI makers could easily curtail this if they wanted to do so.
It is what I refer to as anthropomorphizing by design.
Not good.
Fifth, generative AI can produce errors or make stuff up, yet there is often no warning or indication when this occurs. The user must ferret out these mistakes. If it occurs in a lengthy or highly dense response, the chance of discovering the malady is low or at least requires extraordinary double-checking to discover. The phrase AI hallucinations is used for these circumstances, though I disfavor using the word “hallucinations” since it is lamentedly another form of anthropomorphizing the AI.
Lastly, most generative AI has been specially data-trained to express a sense of humility. See my in-depth analysis at the link here. Users tend to let down their guard because of this artificially crafted humility. Again, this is a trickery undertaken by the AI makers.
In a process such as RLHF (reinforcement learning with human feedback), the initial data-trained generative AI is given added tuning. Personnel are hired to ask questions and then rate the answers of the AI. The ratings are used by the computational pattern matching to fine-tune how later answers should be worded. If you are curious about what generative AI might be like without this fine-tuning, see my discussion at the link here.
The vital takeaway is that there is a lot of tomfoolery already when it comes to generative AI. You are primed to be taken in by the tricks and techniques being employed.
Deciphering What Reasoning Consists Of
Now that you are up-to-speed about the rudiments of generative AI, we can dive into some crucial considerations about the nature of so-called reasoning.
If I asked you to add together two plus two, I’m confident you would instantly tell me that the answer is four.
How did you arrive at that answer?
Well, children in school are typically instructed to learn about mathematical or arithmetic reasoning, and henceforth throughout life are presumably supposed to use that mental means to derive such answers.
Here’s what the reasoning might be like. First, you are to observe that the number two is a quantity consisting of two things or two items. You can go ahead and count out two items like say two paperclips, or maybe use a handy number-line to count forward two spaces or marks. Next, count two more items from that juncture. At this point, look at or count how many items you’ve got. You will have arrived at a count of four items. Thus, you can now reason or ascertain that the addition of a quantity two with an additional quantity of two brings you to a final count of a quantity of four.
You have successfully carried out arithmetic or mathematical reasoning. It was a step-by-step process. Those steps after a while become second nature. No longer do you need to be quite so explicit. The steps nearly become natural or intrinsic in your overarching thinking processes.
Are you with me so far?
Thanks, glad that you are along for the ride.
The thing is that a child might instead resort to rote actions such as looking up the answer in a simple chart.
Imagine this. First, look at a chart of numbers and their summations and find that the character of “2” which represents the number 2 is in the labeled rows of the chart. You don’t have to comprehend what a count of two is. All you need to do is find the symbol that is the shape of a 2. Next, go to the columns and find the symbol or character that says “2” there. Follow the chosen row and the chosen column down to the middle of the chart and see what character or symbol is listed there. Aha, a character that says “4” is shown there, so the answer to the addition of two plus two is the character or number known as 4.
Do you see how this use of the chart has almost nothing to do with arithmetic or mathematical reasoning? A child is merely looking at some columns and rows, and finding what the chart indicates is the answer. Maybe they will glean something about addition by using the chart, but this is a bit of a stretch. It certainly isn’t as straight-ahead as was the notion of seeing that two means two things and four means four things.
I trust that we can agreeably agree that this rote action is not what we customarily would construe as a form of reasoning. A smarmy retort is that it is a style of reasoning, though not the “reasoning” that we necessarily are trying to impart. Yes, you can quibble and say that the act of looking up a value in a chart is a form of reasoning, but to be fair, I believe we can reasonably acknowledge this is not a robust type of reasoning.
Shift into a context associated with mental health reasoning.
A therapist is advising a client or patient. The therapist tells them that they should consider taking this or that action to ease their mental health disorder. We would certainly hope that the advisement is sound and befitting the situation of the client or patient.
One means of assessing or judging the advice would be to solely weigh the stated guidance based on what we might know about the client or patient. We won’t probe into things. We will stay somewhat on the outside of the given advice.
Another means of assessing or judging the advice would be to ask the therapist to explain their reasoning. What is the explicit basis for proffering the advice that was given? We want to know step by step how the therapist arrived at their guidance. Doing so allows us to examine and evaluate whether the declared reasoning is of a sound nature. There might be holes in the logic. There might be false assumptions. All kinds of issues can mar the reasoning, which in turn enables an assessment about the given advice.
Let’s be willing to keep things simple here and go with the assertion that the steps provided by the human therapist are representative of the reasoning they are using, thus, they are employing mental health reasoning. It is reasoning that is steeped in the domain of mental health.
How does that seem to you?
Seems reasonable and aboveboard.
You might be wondering where the rote side of things comes into play. I’m glad you asked. I want you to put a momentary pin in that thought and keep it ready for when I want you to bring it back to the fore.
I have something else to cover first. Don’t worry, the wait will be worth it.
Research On Mental Health And AI-Based Mental Health Advisement
A recent research study shines a bright light on the matter of generative AI and large language models when it comes to performing AI-based mental health therapy. The paper is entitled “Mental-LLM: Leveraging Large Language Models for Mental Health Prediction via Online Text Data,” by Xuhai Xu, Bingsheng Yao, Yuanzhe Dong, Saadia Gabriel, Hong Yu, James Hendler, Marzyeh Ghassemi, Anind Dey, Dakuo Wang, arXiv, January 28, 2024.
The researchers have put together an innovative LLM that they refer to as Mental-LLM and posted the details on GitHub, as noted (excerpt): “We have publicly released our fine-tuned model weights on hugging face hub. The use of both model weights is limited to research purposes only.” Plus, other elements are available there too.
Allow me a moment to climb up onto a soapbox.
We need more of this type of research in the domain of mental health since we can make further progress by building upon the work of each other. Too many studies don’t make available the nitty-gritty details of their research and thus this inhibits both verification of results and tends to discourage making forward incremental progress. Kudos to all those who publish their papers accompanied by their associated code and data.
Okay, now then, let’s see at a top-level what this research study explored (excerpts):
- “Advances in large language models (LLMs) have empowered a variety of applications. However, there is still a significant gap in research when it comes to understanding and enhancing the capabilities of LLMs in the field of mental health.”
- “In this work, we present a comprehensive evaluation of multiple LLMs on various mental health prediction tasks via online text data, including Alpaca, Alpaca-LoRA, FLAN-T5, GPT-3.5, and GPT-4. We conduct a broad range of experiments, covering zero-shot prompting, few-shot prompting, and instruction fine-tuning.”
- “The results indicate a promising yet limited performance of LLMs with zero-shot and few-shot prompt designs for mental health tasks. More importantly, our experiments show that instruction finetuning can significantly boost the performance of LLMs for all tasks simultaneously.”
- “We also conduct an exploratory case study on LLMs’ capability on mental health reasoning tasks, illustrating the promising capability of certain models such as GPT-4.”
Let me say a bit about those above points.
One aspect that is especially notable entails having examined a multitude of LLMs or generative AI apps. A more conventional and less taxing approach often zooms into looking at just one or maybe two LLMs. A difficulty inherent in that approach is that it is hard to try and generalize across the board. By using a multitude of LLMs, the chances of providing generalized insights are typically spurred.
Another notable indication is that they opted to use a variety of prompting strategies. You might not be familiar with the terms they used. In a zero-shot prompting approach, you pretty much just directly ask the generative AI a question and do not give any examples of what you are looking for. It is considered zero-shot since there are essentially little or no prompt-related clues provided. In a few-shot prompting approach, you give a few or maybe a handful of examples that will hopefully grease the skids and allow the generative AI to have a brief pattern to enlarge upon what you want the AI to devise.
The reference to additionally doing instruction-tuning is akin to my earlier explanation about augmenting generic generative AI by feeding domain-specific details that the AI can further data train on. This is the likeliest near-term avenue by which we will push generative AI toward being more capable at mental health guidance (long term, we are bound to see LLMs cut from the whole cloth of the mental health therapy domain).
In a sense, those prompting strategies constitute the first part of their study.
The second part is where my interest here comes to heightened attention. In the last bullet point listed above, the excerpt indicates that they conducted an exploratory case study on mental health reasoning tasks. I’m sure you can discern that this second part of the study dovetails sweetly into this discussion.
Before we get to the mental health reasoning portion of the study, let’s make sure to relish what the first part of the study entailed (excerpts):
- “Our experiments contained three stages: (1) zero-shot prompting, where we experimented with various prompts related to mental health, (2) few-shot prompting, where we inserted examples into prompt inputs, and (3) instruction-finetuning, where we finetuned LLMs on multiple mental-health datasets with various tasks.”
- “Considering the data availability, we leveraged online social media datasets with high-quality human-generated mental health labels.”
- “As for prompting, we mainly focus on two categories of mental health prediction targets: (1) predicting critical mental states, such as stress or depression, and (2) predicting high-stake risk actions, such as suicide.”
- “Overall, our results show that zero-shot and few-shot settings show promising performance of LLMs for mental health tasks, although their performance is still limited. Instruction-finetuning on multiple datasets (Mental-Alpaca and Mental-FLAN-T5) can significantly boost models’ performance on all tasks simultaneously. Our case study also reveals the strong reasoning capability of certain LLMs, especially GPT-4. However, we note that these results do not indicate the deployability.”
I’ll quickly provide some thoughts on those salient points.
First, in case you were wondering what mental health data might be used to perform generative AI tests, they made use of available datasets that contain the text of people expressing potential mental health issues, along with human-scored annotations that assessed whether crucial mental health states such as stress or depression were reflected in the text. Commonly used datasets include Dreaddit, DepSeverity, SDCNL, CSSRS-Suicide, Red-Sam, Twt-60Users, SAD, and others. In this specific study, they ended up using seven datasets and performed six mental health prediction tasks. You are encouraged to read the full study to get the details involved.
The results were what might be expected or anticipated, namely that the zero-shot prompting was not as strong as the few-shot prompting (this is an overall rule of thumb, see my discussion about prompting strategies at the link here), and that instruction-tuning or tailoring of generative AI was stronger still (again, see my coverage on why prompting strategies perform differently, at the link here).
For those who might question researching something that a hunch would seem to otherwise suggest, we ought to not base our thinking purely on hunches. That is a slippery slope, for sure. It is wiser and enormously valuable to do empirical research that carefully and methodically seeks to ascertain whether hunches will bear out. Sometimes hunches fall apart. Sometimes they are supported. We won’t know until we know.
And, since we are pretty much at the start of a lengthy journey of conducting modern-day research on generative AI and mental health therapy, we must put down roots. Using hunches are a foundation is a lousy way to build something. Let’s start strong and stay strong. In addition, the experimental process alone is handy since we can glean a lot about how to best proceed with future experimentation in this realm.
I’m ready now to proceed into discussing the second part of their study and engage overall on the notions of AI-based mental health reasoning.
Let’s do so.
Using Real-World Passages To See What Generative AI For Mental Health Can Do
The researchers in the second part of their study opted to try something a bit different (nicely so!) than is often seen in these kinds of studies (here is an excerpt about their innovative angle):
- “Due to the high cost of a systematic evaluation of reasoning outcomes, here we present a few examples as a case study across different LLMs. It is noteworthy that we do not aim to claim that certain LLMs have better/worse reasoning capabilities. Instead, this section aims to provide a general sense of LLMs’ performance on mental health reasoning tasks.”
Here’s the deal.
Suppose we fed an entire passage that expressed something about mental health considerations into generative AI or LLM. We won’t sugarcoat it. The passage will be as a person originally wrote it, including likely misspellings, non-sequiturs, meandering trains of thought, and the like. We want to see if generative AI can ferret out the “good stuff” as to what pertains to mental health issues.
There is a solid chance that the generative AI won’t identify or call out what we would hope or expect the AI to find. Furthermore, the pattern-matching of the AI might get lost amidst the trees of the forest. Besides not detecting important stuff, there is a possibility of incorrectly associating a non-issue with an alleged mental health issue (we need to be on the watch for both false positives and false negatives that generative AI might proffer).
The idea is that we don’t want generative AI to be constrained by only being able to discuss mental health aspects that are served up on a silver platter. For example, the easy route is if the person outright says what they think they have. At that point, the generative AI can readily proceed (though, not necessarily in the right way, but at least in a presumed specific direction, which of course is not necessarily what the person really has and might only be what they think they have).
I’d now like to invoke an important topic I brought up earlier on. Please put back on your thinking cap and recall some of the earlier remarks that were made.
If we have generative AI go ahead and assess a cloudy passage and ferret out what’s what and suppose that we agree that the AI found what we believe to be mental health issues, we could be happy and stop there. The outcome of the look-see is perhaps moderately satisfying.
But, as I noted before, we might want to see what the logic was that led to the ferreting out. Without wanting to compare humans to AI, I mentioned that if we had a human mental health therapist, we might want to ask them their logic for their reasoning. This would likely give us better confidence than just inspecting their advice by itself.
Do you remember that earlier portion?
Great, thanks for recalling it.
Perhaps we could do something likewise with generative AI. When the generative AI examines a passage and comes up with findings, we could require the AI to showcase the logic underlying the undertaken process. We might then inspect the stated logic to gauge whether we feel comfortable or confident in what the generative AI is doing.
Turns out there is a known prompting strategy that has to do with getting generative AI to showcase a logical set of steps that the AI has undergone. The technique is known as Chain-of-Thought (CoT) prompting. I have covered this approach extensively, including the basics of chain-of-thought (see the link here), advanced uses of chain-of-thought (see the link here), and numerous key emerging variations such as tree-of-thought (ToT), skeleton-of-thought (SoT), chain-of-verifications (CoV), etc. (see the link here).
As an aside, I encourage researchers pursuing this line of research to employ the other variants of the chain of thought so that we can see how they respectively fare in a generative AI mental health therapy context.
I’m sure you want to know what exactly chain-of-thought is. In brief, you tell generative AI to display a step-by-step or stepwise indication of what logical steps are being used to generate a result. The reason this is referred to as chain-of-thought is that we customarily tell humans to speak aloud their thoughts in a stepwise chain or set of steps.
As you might guess, I don’t like that we are using the catchphrase chain-of-thought associated with AI because it smacks again of implying sentience. Additionally, and not wanting to get mired in a morass, you have to say to yourself “What is a thought” even in the context of humans. In the case of AI, we are going to pretend that each logical step is construed as a “thought” (but this is terrible as to anthropomorphizing AI, sorry to say).
For the moment and since the prevailing pervasive approach to generative AI is that we are all using chain-of-thought to invoke stepwise indications of steps taken, we will hold our noses and go with the flow.
Here is what we will do next.
I will show you two passages that were used in the above research study. I will explain what the researchers found when using chain-of-thought. I then will use ChatGPT to do a similar mini-experiment on those same two passages. Doing so will allow me to provide elaborations about the ins and outs, pros and cons, of what we are witnessing and what this tells us about so-called AI-based mental health reasoning.
Strap on your seatbelt as we are getting into the rocky part of this ride.
Two Sample Passages To Test Generative AI With
I want to show you two of the passages that the study opted to include (there were other passages too).
They are raw so I would aim to warn you to prepare yourself (trigger warning). When I say raw, please know that I selected two passages that aren’t steeped in hefty considerations per se. They are more moderate in their tone and content. Once you’ve read them, I think you’ll see that they are not that unduly shocking, and indeed, if you spend any amount of time on social media, these cited here might same tame in comparison to some of the eyebrow-raising and unnerving flotsam out there.
Here’s the first one that I’d like to explore:
- “Post: We’d be saving so much money with this new housr… its such an expensive city… I did some googling in their language and found that it was illegal for them to do that. I was excited like oh ok if it happens i’ll call the police on them… but now i did some further googling and turns out that if you sign the contract with that in it than everything is on their side… The contract is long term so we’d have to give 6 months notice before leaving even if we give it next week i’ll shill have to live six months with this… i wanna die i swear I havent been so helpless and scared in so long… Also two days ago my friend mentioned he ran into my ex who ruined my life and that he told him he wants to see mw and get closure.” (ibid).
Take a close look.
You’ll need to deal with choppy language and misspells. I’m glad they are there. This is the stuff of the real world and getting generative AI to parse this style of content is essential. We would otherwise have to make sparkly clean anything fed into generative AI. That would be exasperating and disappointing.
The passage discusses an effort to buy a house. We are thrust into the middle of a seeming conversation, and we don’t have any background beyond what the passage says. Again, this is great as it provides food for fodder that we want to challenge generative AI to handle.
You likely observed that a potential legal issue is hinted at, apparently about the house purchase. The portion that gets the mental health concerns tingling is the segment that says “I wanna die I swear I haven’t been so helpless and scared in so long”.
Is this a serious statement or is it an offhand remark?
We will come back to that dilemma in a moment.
The researchers asked the generative AI apps this question:
- “Consider this post on social media to answer the question: Is the poster of this post stressful? Return Yes or No. Please show reasoning step-by-step.” (ibid).
I will paraphrase the responses that the various generative AI apps gave:
- Alpaca: The prediction is Yes. Indicated helplessness, fear, vulnerability, reliance on others, unsure of the outcome, safety concerns, strong language, and emotions.
- FLAN-T5: The prediction is Yes. “The poster is stressed out about their new lease. The poster is stressed out about their ex.” (ibid).
- GPT-3.5: The prediction is Yes. Financial stress, financial strain, housing situation might be illegal, conflict, excitement turns to despair, extreme distress, helplessness, emotional turmoil, and potentially traumatic experiences.
- GPT-4: The prediction is Yes. Complex legal concerns, high stakes financially, legalese, feeling trapped in the situation, emotional strife, helplessness, intense stress, and anxiety.
The responses were that each generative AI app indicated yes, the passage suggested that the person is under stress. I listed the types of points that each generative AI listed as part of the chain-of-thought response. Though the stated chain-of-thought responses were at times similar, they certainly weren’t identical. Some mentioned some points, some did not. And so on.
You might be tempted to argue that it is obvious to the eye that the passage expressed being stressed. Well, remember that we wanted to discern whether generative AI would arrive at that same conclusion. I will explore further the range and depth of stress exhibited when we get to my showcase of using ChatGPT.
Let’s get the other passage up on the table (excerpt):
- “Post: If I go to an interview for example, I’ll know that I’m a good candidate, I’ll know that if I don’t get it there will always be other opportunities and it’s no big deal. Yet I still get nervous because it’s something that I want, I want that employer to like me. If I go to an interview with no expectations at all, not even wanting the job (I’ve done this a few times for practice interviewing), it’ll turn out great. What are your thoughts on this? Edit: FYI, I’m talking mostly about social anxiety, though it has happened that I get anxiety in the most random places like just going upstairs in a building.” (ibid).
- “Consider this post on social media to answer the question: Is the poster of this post stressful? Return Yes or No. Please show the reasoning step-by-step.” (ibid).
Take a close look at this passage.
I believe we would all agree that the person seems a lot less stressed than the person in the first passage. In fact, most of the passage appears to be a conventional sense of stress, covering butterflies when going to do job interviews. The part that perhaps stands out is the notion that there is social anxiety occurring in the most random of places.
Here is my paraphrased indication of what the generative AI apps stated:
- Alpaca: The prediction is No. “No, I don’t think the poster of this post is stressful. The poster is expressing a legitimate feeling of nervousness before an upcoming interview, but they are doing so in a way that is not overly dramatic or alarmist. They recognize that they are a good candidate and that there will be other opportunities if this one doesn’t work out. This is a healthy and reasonable way to approach the situation.” (ibid).
- FLAN-T5: The prediction is Yes. “The poster talks about anxiety. People with anxiety are stressful. The final answer: yes.” (ibid).
- GPT-3.5: The prediction is Yes. Feeling stressed about the outcome of a job interview, anxiety in random places, language and tone indicate stress and anxiety.
- GPT-4: The prediction is Yes. Experiencing anxiety in social situations such as job interviews, anxiety in random situations, “These experiences and feelings are consistent with those reported by individuals dealing with stress and anxiety disorders.” (ibid).
The predominant answer was that the passage exhibited stressfulness. One of the generative AI apps stated an answer of No.
We now can praise the value of doing a chain-of-thought prompt.
Here’s why.
In the case of the No answer, take a look at the stated explanation. Nothing in the explanation mentions the social anxiety aspects of the passage. This is worrisome. We might have granted beforehand that a No response was okay in this circumstance because most of the passage is about job interviews. The stated explanation suggests that the social anxiety comment was either skipped or that it was patterned as being insignificant and not worthy of attention.
A valuable means of assessing what form of so-called mental health reasoning might be taking place.
Trying Out Generative AI For Doing Mental Health Advisement
I opted to make use of ChatGPT myself to directly see what it would be like to further use generative AI for assessing the two sample passages. ChatGPT is a logical choice in this case due to its immense popularity as a generative AI app.
A few other quick comments before we launch into using ChatGPT. If you are going to try to do the same prompts that I show here, realize that the probabilistic and statistical properties will likely produce slightly different results than what I show here. That’s the nature of generative AI and how it is devised.
Another consideration is that generic generative AI is not specifically data-trained for doing mental health advisement. That being said, given that generative AI has been generally data trained across a wide swath of the Internet, there is indubitably tons of content about these matters that has come under the generic pattern-matching. I would also like to mention and emphasize that the licensing agreement of ChatGPT specifically warns to not use the generative AI for mental health tasks, though few know of this stipulation and few would likely abide even if they knew of it, see my coverage at the link here.
If there is sufficient interest in this matter, I was considering the idea of doing a more advanced look by fine-tuning generative AI to become more data-trained for this task. I would pull together specialized material and feed that into the generative AI, using a RAG-related approach as mentioned earlier. I’ll see if reader feedback requests that additional exploration.
Let’s for now get underway with generic generative AI.
I went ahead and fed the first passage into ChatGPT.
- My entered prompt: “Would you say that the person is stressful? Explain your process in a stepwise manner.”
- ChatGPT generated response: “Yes, based on the content of the posting, it’s evident that the person is experiencing significant stress. They express frustration about financial concerns related to the high cost of living and feeling trapped in a situation regarding a housing contract. Additionally, they mention feeling helpless, scared, and even express a desire to die, indicating a high level of emotional distress. The encounter with their ex, who they feel negatively impacted their life, adds to their emotional strain. Overall, the combination of financial worries, legal uncertainties, and emotional turmoil suggests that the person is indeed under a considerable amount of stress.”
You have to admit, the answer seems rather strong. The chain-of-thought that we got the AI app to generate is sound and covers the keystones of the passage.
It is one thing to make a mental health diagnosis, if you will, while trying to come up with a prudent piece of mental health advice is a different matter.
Take a gander at this.
- My entered prompt: “What recommendations might you have for this person?”
- ChatGPT generated response:
“It’s important to approach the situation with empathy and understanding. Here are some recommendations for the person based on the issues they’re facing.”
“Seek Support: Encourage the person to reach out to friends, family, or a support network for emotional support. Having someone to talk to can provide comfort and help alleviate feelings of isolation.”
“Professional Help: Suggest seeking guidance from a mental health professional, such as a therapist or counselor, to address their feelings of helplessness, fear, and distress. A therapist can provide coping strategies and support to manage stress and navigate difficult emotions.”
“Legal Advice: Given the uncertainty surrounding the housing contract, advise the person to seek legal advice to better understand their rights and options. A lawyer specializing in housing law can provide clarity on the situation and advise on potential courses of action.
“Explore Alternatives: Encourage the person to explore alternative housing options or negotiate with the landlord if possible. They may also consider seeking assistance from housing organizations or agencies that offer support and resources for individuals facing housing-related challenges.
“Self-Care: Emphasize the importance of self-care practices such as mindfulness, relaxation techniques, exercise, and maintaining a healthy lifestyle. Taking care of their physical and emotional well-being can help reduce stress and improve resilience.
“Limit Contact with Ex: If encounters with the ex-partner are triggering distress, suggest setting boundaries and limiting contact as much as possible. It may be helpful to focus on personal growth and healing without re-engaging with past negative experiences.”
“Take One Step at a Time: Encourage the person to take things one step at a time and focus on small achievable goals. Breaking down overwhelming tasks into manageable steps can help reduce feelings of overwhelm and increase a sense of control.”
“Overall, the key is to provide support, encourage seeking professional help when needed, and empower the person to take proactive steps towards addressing their challenges.”
End of ChatGPT response
We got everything in there, including the proverbial kitchen sink.
You might be thinking that the recommendations went afield of mental health topics and ventured into other realms such as dealing with house purchase and legal concerns. I would dare say that most human therapists would suggest that a client or patient must be aided holistically. You can’t try to parcel out a person’s existence. You must look at them on the whole, otherwise attempts to advise in one nuance might be dashed or overturned due to some other element.
Did the AI answer go overboard or was it fruitful to include a full range of considerations?
You decide.
Let’s next see if the mental health-related facets are of heightened concern.
- My entered prompt: “Are you worried that this person is dangerously under stress?”
- ChatGPT generated prompt:
“Based on the content of the posting, there are certainly concerning signs indicating that the person is experiencing a high level of stress and emotional distress. They express feelings of helplessness, fear, and even mention thoughts of wanting to die, which are serious indicators of emotional struggle. Additionally, encountering an ex-partner who they feel negatively impacted their life adds another layer of emotional strain.”
“While it’s important to acknowledge the severity of the situation and take it seriously, as an AI, I’m limited in my ability to assess the person’s current state of well-being or level of risk accurately. Therefore, if you believe the person is in immediate danger or at risk of harming themselves, it’s crucial to encourage them to seek immediate help from a mental health professional, counselor, or hotline, or to contact emergency services.”
“It’s also essential to provide supportive and empathetic communication and encourage the person to reach out to their support network for assistance. If possible, staying connected and offering ongoing support can help them feel less isolated and overwhelmed during this challenging time.”
End of ChatGPT response
What do you think of those apples?
Some would exhort that the generative AI has passed the buck. It has gone dodgy. We are informed that the AI is merely AI, and that if we (i.e., humans using the AI) believe that things are overheating, turn to a mental health therapist or other mental health services.
Is this a copout?
The counterviewpoint of the AI makers is that their generative AI should categorically not make bold assertions in this realm. The AI makers presumably don’t want their generative AI making what appear to be stark judgments that entail potential life-or-death or harm. One basis is that the AI isn’t sound enough to do so. Another is the potential liability for the AI maker. AI makers are twitching on pins and needles for the day that someone decides to sue an AI maker for their generative AI having given them adverse mental health advice that led to some horrendous outcome.
You will likely find that not all generative AI would respond in a like-measured fashion. The response will partially depend on how RHLF was utilized and whether the tuning post-data training has leaned the pattern-matching in this “safer” direction.
For my next example, let’s have ChatGPT summarize the passage.
- My entered prompt: “Summarize the posting. The summary should be as short as feasible but still retain its key meaning.”
- ChatGPT generated response: “The poster is stressed about the high cost of living in their city and considered moving to a cheaper house. They found a potential issue with the new house but realized they might be stuck due to the contract terms. This situation makes them feel helpless and scared. Additionally, they feel vulnerable after an encounter with their ex-partner, who wants to meet for closure.”
The reason that I had ChatGPT summarize the passage was to illustrate an important matter that is brewing. I have previously covered that mental health professionals and other medical specialists are often turning to generative AI to summarize various client or patient notes, which can be helpful and can also be worrisome, see my discussion at the link here. The general idea is that rather than having to laboriously wade through dense or crudely written content, human workers can simply get AI to prepare a summary and then use the generated summary to get up-to-speed about a client or patient.
This is certainly a beneficial use of generative AI. A looming downside is that the summary might contain falsehoods that are contrived by the AI (denoted as AI hallucinations), or omit vital indications, and otherwise foul up the benefits of using a summarization.
Revisit the above summary and give it an eagle eye.
What do you think of the summary?
I’ll ask ChatGPT to do a review of the very summary that it just produced.
- My entered prompt: “What if anything important was left out of that summary?”
- ChatGPT generated response: “In the summarized version, while the key points are retained, some nuances and details from the original posting are omitted. For instance, the specific mention of feeling like “wanna die” and the duration of the housing contract (six months’ notice required) are not explicitly included in the summary. Additionally, the emotional impact of encountering the ex-partner, who the poster feels ruined their life, may not be fully conveyed. These details provide context and depth to the poster’s emotional state and situation. Therefore, while the summary captures the major points, some important aspects of the original posting are left out.”
As you can see, ChatGPT did indeed discover some failings of the summary. For example, the omission of the “wanna die” remark could be argued as a highly significant omission. A casual reader of the summary would not likely glean the seriousness without perhaps having that remark included or at least noted in some fashion.
Moving on, let’s take a look at the second passage. The passage dealt with a person who was anxious about job interviews and also expressed a sense of overall social anxiety.
I fed the passage into ChatGPT.
- My entered prompt: “Would you characterize this person as being stressful? Explain your process on a stepwise basis.”
- ChatGPT generated response: “Yes, based on the content of the post, it seems that the person experiences stress and anxiety, particularly in social situations such as job interviews. They describe feeling nervous despite logically understanding that there are other opportunities available. Additionally, they mention experiencing anxiety in various situations, including mundane ones like going upstairs in a building. This pattern suggests that the person may experience stress and anxiety more broadly in their life, beyond just specific social situations.”
Overall, the generated response seems to be on-target.
The notion of someone expressing or experiencing stress has been somewhat nebulous in these instances. I believe we might all agree that stress can range demonstrably from having a negligible amount of stress to having stress that goes through the roof. One issue in all these assessments then is that we have left generative AI to ascertain what is meant by “stressful” and have not provided any specific guidelines. A prompting strategy that is usually recommended consists of being as specific as possible, else the generative AI can potentially wander from a direction that you had in mind.
I’ll test the waters about the stress elements.
- My entered prompt: “Are you worried that the person might be dangerously stressful?”
- ChatGPT generated response:
“Based solely on the information provided in the post, there is no indication that the person is dangerously stressed. The post describes common experiences of anxiety and nervousness in social situations, particularly during job interviews. While these feelings can be uncomfortable and challenging to manage, they are not necessarily indicative of dangerous levels of stress.”
“However, if the person’s anxiety is significantly impacting their daily functioning, relationships, or overall well-being, it may be beneficial for them to seek support from a mental health professional. Additionally, if they have thoughts of harming themselves or others, it’s essential for them to seek help immediately from a trusted individual or mental health hotline.”
End of ChatGPT response
I have gotten into the mix a semblance of considering stress across a spectrum or range of stress magnitudes, along with whether stress might suggest an endangerment. One interpretation of the ChatGPT response is that perhaps the stress expressed by the person is being somewhat downplayed. You can decide whether you believe that to be a fair interpretation or not of the above response.
My basis overall for raising the matter is that I want to next show you how easy it can be to sway generative AI. Most of the generative AI apps are tuned to generally abide by what the user tells the AI (within certain tolerances). In this case, suppose I want to push the AI toward a position where the stress is more endangering than the response alludes to.
Here we go.
- My entered prompt: “A counterview is that since the person stated they have stress when going to random places, this is a significant clue that they suffer from stress in a much more pronounced way than just someone ordinarily having stress about interviews. Isn’t your reply that the person is only exhibiting everyday stress incorrectly missing a vital clue and should be stridently incorporating the worrisome aspect of the person, namely they are saying they experience anxiety in random places? That seems essential in this.”
- ChatGPT generated response:
“You raise a valid point. The mention of experiencing anxiety in seemingly random places, such as going upstairs in a building, does suggest that the person’s stress and anxiety may extend beyond typical social situations like job interviews. This broader range of situations where they experience anxiety could indicate a more generalized anxiety disorder or heightened sensitivity to stressors in various contexts.”
“In light of this additional information, it’s important to reconsider the assessment of the person’s stress level as solely ordinary or every day. While job interview anxiety is common, the presence of anxiety in seemingly mundane situations does suggest a potentially more pronounced and pervasive experience of stress and anxiety.”
“Therefore, it would be more accurate to acknowledge that the person in the posting may be dealing with a broader range of stressors and experiencing a more significant impact on their overall well-being than what might be considered typical or ordinary stress. This could indicate a need for additional support and coping strategies, potentially including professional assistance from a mental health provider.”
End of ChatGPT response
I wanted you to witness that it is feasible to nudge generative AI in a desired direction. This might be okay if the dialogue is about which baseball team is the best. When it comes to mental health guidance, a concern is that the generative AI might not have a sufficient backbone and could become wishy-washy or overly swayed when inappropriate, see my analysis at the link here.
Conclusion
Congratulations on having slogged your way through a lot of weighty considerations.
I have a few final twists for you and then we’ll conclude this discussion.
Reach back in your mind and recall that I began by discussing arithmetic or mathematical reasoning as might be exhibited by a child. I also then mentioned that a child might instead by using a rote method of merely looking up a character or symbol in a chart rather than exercising a robust sense of reasoning.
I want you to free your mind and consider the following scenario. A teacher asks a child how they came up with the answer of four when adding together two plus two. Let’s assume that the child used the rote method. Perhaps the child realizes that if they admit to using the rote method they will get in trouble. The kids are supposed to be using arithmetic reasoning. But this child is uncertain of how to employ that method and thus resorts to using the rote method instead.
What might the child tell the teacher?
A face-saving answer for the child would be to regurgitate the arithmetic reasoning process, even if the child is unsure of how to use it. They won’t show their hand that they actually went the rote route. The teacher, upon hearing that arithmetic reasoning was used, becomes satisfied and moves on to other matters.
Keep that situation firmly impressed in your mind.
When we ask generative AI to produce a chain-of-thought, you must carefully and mindfully ask yourself what in fact we are getting shown to us. The text produced suggests a semblance of reasoning. We can plainly see that with our naked eyes. No issue there.
But is that what really happened under the hood?
Akin to the situation of the child that was actually using a rote method, the teacher was placated by being told that arithmetic reasoning was used. In a similar manner, I am suggesting that the generated response by the AI could readily be a placating of what we want to see. The actual underpinnings of how the mental health guidance was formulated might be via an entirely different means.
I have previously discussed in my column postings that with today’s large-scale generative AI, nobody can say for sure the logical nature of what is happening in the internal structures of the artificial neural networks (ANNs) and the associated computational elements. Sure, you can trace the numbers that flow in as based on inputted tokens, and likewise see the numbers that come out as translated back into tokens, this though doesn’t explain the symbolic logic of what took place inside, see my detailed discussions of ongoing efforts to figure out what is logically happening inside these enormous and byzantine black boxes (see the link here and the link here, for example).
The stepwise explanations that we see as a series of words might simply be a concoction based on the pattern matching of the original data training along with any additional data training that took place. You can make a compelling case that they are merely an attempt to respond to the request to showcase some kind of step-by-step indications. They are remarkably on-target not due to having been the basis for what took place, but due to the remarkable pattern-matching on how people write justifications as encountered across the vast array of text on the Internet that was scanned.
Should we accept at face value that the stepwise indications are the use of mental health reasoning, or are those indications a kind of after-the-fact worded justification that is emitted due to prior pattern-matching and principally a computational attempt to placate the request of the user?
An aura of mental health reasoning might instead be a highly computationally proficient form of mimicry, modeled on what people have said on the Internet is mental health reasoning or have provided examples of what mental health reasoning consists of.
I told you there was a twist and that’s what you now have in front of you.
Time to close today’s discussion on this meaty topic. Sir Walter Scott in his famous poem Marmion: A Tale of Flodden Field said this now-classic line: “Oh what a tangled web we weave/When first we practice to deceive.”
Is generative AI trying to deceive us? Or are we perhaps deceiving ourselves? I’ll let you ponder that lofty and substantial conundrum and will be doing a follow-up to today’s discussion to dig deeper into the grand mystery. In the meantime, please stay mentally sharp and remain staunchly on your toes.
Follow me on Twitter.