Developing a ‘Smart’ Recovery
Monitoring and Support System
John J. Curtin, Ph.D.
University of Wisconsin-Madison
April 9, 2025
About a decade ago I was approaching the middle of my career. I had developed a successful basic clinical science research program as a psychophysiologist running experiments to understand the effects of drugs and drug withdrawal on stress.
The work was intellectually stimulating, we were publishing it in good outlets, and we were getting grants, but my heart was increasingly not in it.
I’d become a clinical psychologist to help people struggling with alcohol and other SUDs.
My paternal grandmother died of complications secondary to alcoholism.
My dad has struggled with his use of alcohol for his entire adult life and during periods where he lost control it affected all of us.
My cousin, Stephen, has a severe substance use disorder and has been incarcerated several times for drug related offenses. He has had periods of stability but they have always ended in another relapse.
My Aunt Cathy and Stephen’s brother, Colin, had reached out to me on numerous occasions to ask what could be done to help Stephen.
And it was those conversations that really got me thinking about how I could re-direct my research program to help people like Stephen and my dad and my grandmother.
Precision Mental Health for Continuing Care
It was around that time that my colleague, Dave Gustafson, reached out to me. Dave directs a center at UW that develops digital therapeutics for substance use disorders. These are essentially smartphone apps that provide ongoing continuing care for patients during their recovery. He had just completed a large randomized controlled trial demonstrating that his app meaningfully decreased heavy drinking days and increased abstinence rates over the first year of recovery.
However, he also noticed many of the people who had relapsed hadn’t used the app in the days leading up to that relapse. And others who had relapsed hadn’t used the specific supports in the app that he would have thought would be most effective for them.
Precision Mental Health for Continuing Care
“Could you predict not only who might be at greatest risk for relapse … … but precisely when that relapse might occur … … and how best to intervene to prevent it?”
Dave knew that we were exploring the factors that motivated alcohol and other drug use and he asked us a simple question:
“Could you predict not only who might be at greatest risk for relapse but precisely when that relapse might occur and how best to intervene to prevent it”
… because if we could develop a system to do this, he could embed it into his app to guide people to the most effective supports at the most critical moments in their recovery
Precision Mental Health for Continuing Care
Precision mental health requires us to provide the right interventions and supports to the right people at the right time , every time
SUD continuing care requires
Long-term monitoring
Ongoing lifestyle adjustments and support
These questions that Dave was asking are at the heart of what we now call precision mental health. How can we provide the right interventions and supports to the right people at the right time , every time
And this focus on the right time is particularly important for recovery from substance use disorders. Substance use disorders are chronic relapsing conditions and therefore successful recovery requires lifelong monitoring and support to prevent relapse.
And, critically, the optimal supports for any specific individual can change month to month, day to day, and even from moment to moment.
Precision Mental Health for Continuing Care
Precision mental health requires us to provide the right interventions and supports to the right people at the right time, every timed
SUD continuing care requires
Long-term monitoring
Ongoing lifestyle adjustments and support
A “Smart” Recovery Monitoring and Support System can provide temporally precise, dynamic, personalized continuing care by combining:
Sensing
Artificial Intelligence/Machine learning
And it goes without saying that this is hard and its why lapses and relapse are so common for so many people in recovery.
But we believed that we could harness and combine two technologies that were emerging at that time, personal sensing and artificial intelligence algorithms, to develop a smart recovery monitoring and support system that could both predict lapses before they occurred and provide personalized support and recommendations to patients about how to prevent those lapses from occurring.
And what I’d like to do today is tell you a bit more about how we are doing this, what we have learned so far, and where we are going next with this system
Model Output: Lapses
Lapses
are clearly defined,
have a temporally precise onset, and
can serve as an early warning sign for relapse (precede and predict)
“Abstinence violation effects” can increase relapse risk
Even a single lapse can result in overdose and/or death for some drugs
[PAUSE]
OK - so how do we develop this Recovery Monitoring and Support System?
To start, we have begun to develop risk prediction models that focus both on predicting and explaining future lapses
Our focus is on future lapses rather than other clinically meaningful outcomes like substance use related problems or full blown relapse for several reasons.
Model Output: Lapses
Lapses
are clearly defined,
have a temporally precise onset, and
can serve as an early warning sign for relapse (precede and predict)
“Abstinence violation effects” can increase relapse risk
Even a single lapse can result in overdose and/or death for some drugs
To start, lapses are
Clearly defined and have a temporally precise onset
They can serve as an early warning sign for relapse because they both precede and predict it
Lapses are also important targets for intervention because we know that maladaptive thoughts and feelings following a lapse - often called abstinence violation effects - can start a downward spiral that leads to relapse by itself if not addressed
And sadly, for some drugs, even a single lapse can result in overdose and death
So for these reasons, we are developing risk models that predict the probability of a future lapse
Lapse Prediction for AUD
151 individuals with moderate to severe AUD
Early in recovery (1-8 weeks)
Committed to abstinence throughout study
Followed with sensing for up to 3 months
Ecological Momentary Assessments
Contextualized Geolocation
Contexualized Smartphone Communications
(also sensed physiology, sleep, coarse self-report)
So let me transition now to describing the progress we have made so far to develop this recovery monitoring and support system for SUD
As the first step, in 2020 we completed a first NIAAA funded project where we recruited 151 participants who were in early recovery from a moderate to severe alcohol use disorder.
These participants were committed to abstinence at the start of the study and we followed them for up to three months, using our three sensing methods and also recording any lapses back to alcohol use.
Lapse Prediction for AUD
151 individuals with moderate to severe AUD
Early in recovery (1-8 weeks)
Committed to abstinence throughout study
Followed with sensing for up to 3 months
Ecological Momentary Assessments
Contextualized Geolocation
Contexualized Smartphone Communications
(also sensed physiology, sleep, coarse self-report)
[PAUSE]
We are in the early stages of model building at this point and I will focus today primarily on results from preliminary models using only EMA.
However, we are actively working with GPS and cellular communications as well and I will give you a clear sense of how we are developing models with those signals too.
I’ll also give you more detail about how we think we can implement these models for clinical benefits.
Participant Characteristics
Let me start highlighting the characteristics of the sample we are using to develop and evaluate the AUD lapse models.
We had reasonble diversity across many participant charactersitics including age, sex at birth, martital status, education, and income
However, given the methods we used for recruiting, we have very little racial and ethinic diversity in the sample. The sample is predominately white and non-hispanic
I’ll return to this later both when we evaluate issues of algorithmic fairness and when I talk about another, larger NIDA funded project where we are correcting this issue of racial and ethnic representation in our models.
Participant Characteristics
All participants met criteria for moderate to severe AUD
Reported abstinence goals
I also want to highlight that this is a sample with clinically meaningful problems with their AUD use, consistent with their pursuit of abstinence goals
On the right, I am showing you a histogram of the DSM-5 AUD symptom counts to confirm that everyone reported 4 or more symptoms of alcohol use disorder consistent with a moderate to severe presentation
Ecological Momentary Assessments
Current/Recent Experiences
Craving
Emotional state
Recent past alcohol use
Recent risky situations
Recent stressful events
Recent pleasant event
Future Expectations
Risky situations
Stressful events
Abstinence Confidence
So let me tell you a bit more about the ecological momentary assessments (or EMAs) that we used as part of our sensing methods. These EMAs are brief surveys that participants completed on their smartphones. They take 20-30 seconds to complete and we collected them several times per day.
On each EMA, participants reported the date and time of any lapses back to alcohol use that they hadn’t previously reported. These lapse reports are used for the lapse outcomes that we train our models to predict. And these laspes were also confirmed by study staff during lab visits using a follow-back procedure.
All of the EMAs also asked participants about their current craving, emotional state, recent risky situations, and recent stressful and pleasant events since their last EMA.
And on the first EMA each day, they also reported any future risky situations and stressful events that they expected in the next week and their confidence that they would remain abstinent.
Modeling: Feature Engineering
Features based on recent past experiences (12, 24, 48, 72, 168 hours)
Min, max, and median response (all items)
History (count) of past lapses (item 1) and completed EMAs (compliance)
Raw scores and change scores (from baseline/all past responses)
We used the raw responses to the EMAs to engineer about 300 features to use in our models to predict future lapses
We formed features by aggregating EMA items over various past time periods ranging from 12 - 168 hours in the past relative to the window we want to predict into.
We calculated mins, maxes and medians for the EMA items in these time periods
We also calculated counts of past lapses and counts of past missing EMAs to index engagement with our monitoring system.
And we included these scores both in raw form and as change from baseline for the participant based on all their previous responses since the start of the study.
Modeling: Predictions
Predict hour-by-hour probability of future lapse
[PAUSE]
OK, now that we’ve talked about sensing and features, I want to transition to talk about how we train models to use these features to predict lapses.
For our purposes today I wont dive deep into the machine learning methods but let me highlight a few high level details
We used the features that I just described to make predictions about the hour-by-hour probability of a future lapse. We are developing separate models for three future lapse windows – lapses in the next week, lapses in the next day, and lapses in the next hour.
Modeling: Predictions
Predict hour-by-hour probability of future lapse
For example, if I was in recovery from an AUD, I could use these models to generate the probability that I would lapse after I leave the presentation today at 7pm. One model would generate the probability I would lapse at some point between 7 pm today and 7 pm next Wednesday, the second would predict the probability of a lapse between 7 pm today and 7 pm tomorrow and the third and most temporally precise model would provide the probability of a lapse in the next hour, between 7pm today and 8pm today.
And of course, all of the models would only use data collected prior to 7 pm today so that they are “predicting”, in the full sense of the word, into the future and not just demonstrating an association.
Modeling: Algorithms and Resampling
XGBoost - Boosted decision trees
Also considered:
ElasticNet GLM (e.g., LASSO, ridge regression)
Random Forest
KNN
Using grouped (by participant), nested, repeated k-fold CV
30 “held-out” test sets
New participants and observations not used for training
We are evaluating machine learning model configurations that differ by common statistical algorithms that I can talk more about later if there is interest.
And we are rigorously evaluating the performance of these models using grouped, nested, repeated k-fold cross validation.
For our purposes, what this means is that we evaluate model performance in 30 separate held out test sets and each of these sets contain new observations from new participants that were not used to train the models.
And again, this is consistent with what we mean by prediction. We don’t care how our models perform with the participants we used to train them. We want to know how well the models will work when we implement them with new people in the future.
Predicted Lapse Probabilities: Next Week Model
Model predicts probability of lapse in next week for “new observations in test sets
Can panel predictions by Ground Truth (i.e., true lapse vs. no lapse observations
Want high probabilities for true lapses and low probabilities for true no lapses
OK, lets begin to explore how well we can do.
Lets start with the model that provides the coarsest level of temporal specificity – 1 week, and let me take a moment to make the predictions that this machine learning model provides more concrete for you
On the right, you are looking at histograms of the lapse probability predictions that the model makes for all the weeks for all the patients in the held out folds.
I’ve paneled these histograms by whether a lapse did or did not happen in reality for each predicted week. The top panel is for weeks with lapses and the bottom panel is for weeks with no lapses.
Ideally, you want the predicted probabilities to be very high for weeks when there was a lapse and very low for weeks when there was no lapse.
And this is exactly what we see for the one week lapse window model
Understanding the Models
[PAUSE]
Of course, if we want to implement these models in a real world system, we need to understand how they work and what features are driving the predictions. And in recent years, the field of interpretable AI has made big strides in developing tools to help us look under the hood, so to speak, of these models to better understand them.
One of the more promising of these tools is SHAP or Shapley Additive Explanations. SHAP is a method for interpreting the output of machine learning models that is based on cooperative game theory. It provides a principled way to assign each feature or category of features an importance value for a particular prediction.
We can use this approach to understand why the model makes a specific prediction for a specific participant at a specific moment in time. And I will do this a bit later when we talk about how to make personalized support recommendations.
But we can also use SHAP values to understand the global feature importance of each feature across all participants and observations for any of our models, so lets take a look at this first.
Understanding the Models: Next Hour Model
All EMA items impact lapse probability
[PAUSE]
The plot on the right shows feature categories and their associated importance as indexed by mean absolute SHAP values across all the predictions made by the Next Hour Model.
The bar width shows the relative importance of each feature, globally across all participants and observations. From this we see a few important characteristics of the NEXT HOUR model.
First, all of EMA items affect predictions about lapse probability across observations.
As you might expect, history of past lapses has a big influence on the probability of a future lapse. But self reported abstinence efficacy, craving, emotional state, history of stressful events, and other features from the EMA all make meaningful contributions to lapse probability in the next hour across observations.
Understanding the Models: Next Hour Model
All EMA items impact lapse probability
Lapse day and lapse hour are useful
We also see that we can use lapse day and lapse hour to make predictions of the lapse probability in the next hour.
Not surprisingly, people are more likely to lapse on weekends and during evening hours and the hour level model can use that information to improve its lapse predictions.
These features likely contribute to the superior performance of the hour model relative to the day and week interval prediction models.
Understanding the Models: Next Hour Model
All EMA items impact lapse probability
Lapse day and lapse hour are useful
Demographics not particularly important
And finally, we see that demographics were not particularly important for predicting lapses. In other words, the frequency of observed lapses did not differ meaningfully by these demographic characteristics.
This shouldn’t be too surprising either, because these demographic characteristics are stable for an individual over time and therefore can’t explain changes in lapse probability within an individual over time.
Interium Summary and Next Steps
Very strong overall performance
Temporally precise models for immediate future lapse risk
EMA risk features are intepretable and sensible
So lets pause here for a quick re-cap of where we are so far
We have models that predict exceptionally well
These models have a high degree of temporal specificity, even down to hour level resolution
The risk features from EMA map sensibly onto known lapse risks and we have interventions and supports designed to address many of these risks
So, obviously, we are really excited about the potential capabilities of a recovery monitoring and support system that includes these models to provide personalized continuing care.
BUT, we still have some very important work to do with respect to several issues that are critical to resolve before we can implement these models effectively and without causing potential harm.
Next Steps: Algorithmic Fairness
To start, our models have some serious but unfortunately not unexpected problems given what I told you earlier about the some of the demographic limitations of our training data.
I’ve already shown you that our models perform exceptionally well when evaluated across the full sample. However, when we evaluate model performance, it is critical that we look at performance in subgroups that experience health disparities. And too often, these analyses are not done or reported.
Its only very recently that we have begin to take this seriously and we must. If we hope to use our system to address existing disparities in SUD outcomes then our models must perform well with all groups, regardless of their privilege or the use of these models may exacerbate rather than reduce existing mental healthcare disparities.
Next Steps: Algorithmic Fairness
As one example of this issue, the data we collected to train these models from this first NIAAA grant did not include much racial or ethnic diversity among the participants. We collected those data at a time when we weren’t yet thinking as carefully about issues of representation as we try to do now.
Next Steps: Algorithmic Fairness
Given that, we were dismayed but not surprised to find that those models perform substantially worse (with auROCs that are .19 units lower) when predicting lapses for anyone who wasn’t white and non-hispanic.
Algorithmic Fairness
And these fairness issues were not limited to just race and ethnicity. We see worse performance for participants with incomes below the poverty line and to a lessor extent for female participants.
This is obviously unacceptable and we are working now to correct this.
Next Steps: Algorithmic Fairness
NIDA project recruited ~ 400 patients in recovery from Opioid Use Disorder
National sample (size; diversity: demographics, location)
More variation in stage of recovery (1 – 6 months at start)
Sensing for 12 months
For example, we have just recently completed data collection for a NIDA funded project that collected a more racially diverse sample using nationwide recruiting techniques.
This sample also includes much needed geographic diversity because the factors that predict lapse in urban settings may be different from those that predict lapse in rural settings.
I should also note that in this project, we also administered EMAs only once per day to reduce measurement burden given that we tracked participants for up to a full year.
Next Steps: Algorithmic Fairness
Excellent performance: auROC ~ 0.94
[PAUSE]
We have just begun to train models using EMA features from this new dataset so these next results and data figures should be considered preliminary but we are excited to see that even with only 1 EMA per day, we are now getting the best performance we have seen to date when predicting lapses in the next day.
An auROC of .94
Next Steps: Algorithmic Fairness
Excellent performance: auROC ~ 0.94
And more importantly, this next day prediction model appears much fairer with respect to its performance in all three of the subgroups where our original models were deficient.
[PAUSE]
But let me explicitly say that issues related to fairness and the potential to exacerbate health disparities through the use of a sensing and AI based monitoring and support system are much more complicated than what I’ve had time to present here.
I’d be happy to dive a bit deeper into this during the discussion period if people are interested to engage more with this.
Next Steps: Sensing Geolocation and Communcations
[PAUSE]
As our monitoring and support system continues to mature, we will also want a richer, broader set of lapse risk features so that we can distinguish better between different situations that require different supports. We can do this by engineering features from our location and communication signals, which tap into different experiences than what we measure by EMA.
And as an added benefit, the use of passive sensing rather than EMA, which requires active input from the user, may also lower the patient burden of using these systems long term.
Let’s take a look at what we can get from geolocation and communications signals to provide you with some intuition about how we think this will work.
Next Steps: Sensing Geolocation and Communcations
Here is a wide view of my moment-by-moment location detected by a GPS app over a month when we were first experimenting with this sensing method. The app recorded the paths that I traveled, with movement by car in green and running in blue.
The red dots indicate places that I stopped to visit for at least a few minutes.
And although not displayed here, the app recorded the days and exact times that I was at each of these locations.
From these data, you can immediately see that I am runner, with long runs leaving from downtown Madison and frequent trail runs on the weekends in the county and state parks to the west and northwest.
Next Steps: Sensing Geolocation and Communcations
Zooming in to the Madison isthmus, these data show that I drove my children halfway around the lake each morning to their elementary school. And from these data we might be able to detect those stressful mornings when getting my young kids dressed and fed didn’t go as planned and we were late, sometimes very late , to school!
The app recorded my daily running commute through downtown Madison to and from my office. From this, we can observe my longs days at the office and also those days that I skipped out.
Looking at the red dots indicating the places I visit, the app can detect the restaurants, bars, and coffee shops where I eat, drink and socialize. We can use public map data to identify these places and make inferences about what I do there.
…Imagine my text messages…
In addition to geolocation, we also collected my smartphone communications logs and even the content of my text messages.
And no such luck, I don’t plan to show you my actual text messages!
[PAUSE]
But imagine what we could learn about me from the patterns of my communications - Who I was calling, when I made those calls, and even the content of what I sent and received by text message.
Context is Critical
We believe we can improve the predictive strength of these geolocation and communication signals even further by identifying the specific people and places that make us happy or sad or stressed, those that we perceive support our mental health and recovery and those who undermine it.
Context is Critical
For example, consider the implications of this brief text message thread between a hypothetical patient and their drinking buddy for what you might predict for the probability that they might lapse back to drinking in the coming hours.
[PAUSE]
… And how would your prediction change if this wasn’t their drinking buddy but instead, their mom who was a big supporter of their recovery.
This interpersonal context matters!!!
Context is Critical
We gather this contextual information quickly by asking a few key questions about the people and places we interact with frequently over the first couple of months that we record these signals. And we can identify these frequent contacts and locations directly from these signals.
In our current projects, we target people and places that we interact with at least twice a month or more for more detailed follow-up to gather context. And it turns out that this really isn’t that burdensome. Most of us are creatures of habit and if we set a threshold for 2x monthly interactions, we typically only have 10-30 people and places that meet this threshold. And it’s the same people and places each month so we can build this context up when the person first starts to use the system and after that it only needs to be updated occasionally when we go somewhere new or make a new friend.
Contextualized Geolocation
Location type (e.g., home, home of friend, bar, restaurant, liquor store, work, health care, AA/recovery meeting, gym/fitness center)
Is alcohol available at this location
Have you drank alcohol at this location?
Is your experience at this location generally pleasant, unpleasant, mixed or neutral?
This location is (high risk, moderate risk, low risk, no risk ) for my recovery
HIGHLIGHT THE CONTEXT INFO WE ARE COLLECTING
NOTE THAT SOME CAN COME FROM PUBLIC DATA
Contextualized Communications
Have you drank alcohol with this person?
What is their drinking status (e.g., drinker, non-drinker)?
Would you expect them to drink in your presence ?
Are they currently in recovery from alcohol or other substances?
Do they know about your recovery goals and if so, are they supportive ?
Are your experiences with them typically pleasant, unpleasant, mixed or neutral?
TALK FIRST ABOUT COMMUNICATIONNS CONTEXT
THEN TALK ABOUT PRELIMINARY ANALYSES FROM CLAIRE, COCO, AND KENDRA
Next Steps: Clinical Uses
[PAUSE]
OK, now that you have a sense of how we have developed our prediction models and their capabilities, I’d like to spend the remaining time unpacking how we are thinking about implementing and evaluating this system for patients.
Next Steps: Clinical Uses
Do NOT provide model output to clinicians
Clinicians are over-burdened
Not ready for new data streams
When we started this work, we believed we were building this system to inform clinicians about their caseload.
Today’s digital therapeutics have clinician dashboards built into them and we, perhaps naively, thought clinicians could use this information to prioritize their resources to patients who had the greatest need.
But as we talked to clinicians, it became very clear that they do not want any more info at this point. Post-pandemic, they are barely keeping their heads above water and are definitely not ready to add new systems and data streams in place. This may change in the future, but we’ve moved away from this idea for now.
In contrast, our work with participants suggested that they did see potential value in monitoring their recovery using our monitoring and support system. So we have pivoted to considering what information might be most useful to provide directly to them.
Next Steps: Clinical Uses
Do NOT predict class labels (lapse vs. no-lapse)
Iatriogenic effects?
Information loss
Lets start first with how we should NOT use this system with individuals with SUDs.
I have been intentionally focusing on the lapse probabilities that are natively output from our prediction models. However, its common for others to use these models to predict formal class labels. In other words, to specifically predict that a lapse will happen or not. Basically, a threshold is set, for example, 0.5, and if the probability of a lapse exceeds that threshold the model predicts that a lapse will occur. Otherwise, it predicts that no lapse will occur
But there are several reasons that we DO NOT want to predict dichotomous class labels.
Next Steps: Clinical Uses
Do NOT predict class labels (lapse vs. no-lapse)
Iatriogenic effects?
Information loss
To start, there may be concerns about possible iatriogenic effects associated with telling a person that they are going to lapse. Or at least this should be a risk that is carefully considered if we provide these blunt dichotomous predictions to individuals.
Second, these days, I think we have all come to understand that taking scores that are natively quantitative like probabilities are, and artificially dichotomizing them results in substantial loss of information that is potentially valuable.
Next Steps: Clinical Uses
DO use lapse probability
auROCs range from 0.90 - 0.94
Instead of using the models to predict class labels, we believe that there is potentially high value in using the original lapse probabilities directly output by the model.
I’ve already shown you that these probabilities can discriminate very well between lapse and no-lapse observations, correctly assigning a higher probability to lapses more than 90% of the time.
Next Steps: Clinical Uses
DO use lapse probability
auROCs range from 0.90 - 0.94
Probabilities are calibrated and ordinal
Provides fine gradations of relative risk for clinical decision-making
And critically, these probabilities are very well calibrated and at least ordinal in their relationship with the true probability that a lapse will occur.
On the right, I am showing you a simple calibration plot. On the x-axis, I’ve binned predicted lapse probabilities into bin widths of 10 percent and for each of these bins, I display the actual observed probability of lapses for observations in that bin.
If the probabilities were perfectly calibrated, the bin means would all fall on the dotted line with the bin from 0 - .1 having an observed probability of .05, the bin from .1 - .2 having a probability of .15, and so on. And this is essentially what we see for our models.
Given this, we believe that the lapse probabilities can provide precise, fine gradations of risk for clinical decision making.
Next Steps: Personalized Daily Support Recommendations
SHAP values from the NEXT DAY model can identify the most important risk features for a specific individual on each day
These features can be used to personalize daily support recommendations
But we can get much more than just lapse probabilities from these models.
Previously I showed you how we can use global SHAP values to determine which features are important in the aggregate across all people and timepoints.
We can also use SHAP values to understand which features contributed most strongly to any single prediction for a specific person at a specific moment in time.
This allows us to understand not only WHEN a lapse might occur but also WHY and therefore potentially how best to intervene.
Next Steps: Personalized Daily Support Recommendations
For example, in this first plot, I am showing you SHAP values for someone who would be predicted to have a high lapse probability on day 30 of their recovery because they have been reporting high craving. For that person, we could recommend urge surfing techniques or remind them that distracting activities can help get them through short periods of craving that day.
Next Steps: Personalized Daily Support Recommendations
In contrast, a second person might have similarly high lapse probability on day 30 of their recovery, but instead because they have lapsed a few times in recent weeks. They could be encouraged and assisted to complete activities designed to increase their motivation for abstinence.
Next Steps: Personalized Daily Support Recommendations
And at a later point in time, on day 70, that same person may have improved their abstinence motivation but now be at increased risk for a lapse because of a string of recent past and anticipated stressors.
Now they could be provided with guided stress reduction or relaxation techniques that they could use each day.
In this way, we can provide personalized recommendations to patients that are tailored to their unique risk profile at that moment in time.
Next Steps: Personalized Daily Support Recommendations
SHAP values from the NEXT DAY model can identify the most important risk features for a specific individual on each day
These features can be used to personalize daily support recommendations
We can also eventually learn which interventions are best for which risk
As a starting point, the mapping between between important local risk features and specific interventions or supports can be created using clinical domain expertise. In other words, what would a clinician tell their patient to do in those circumstances. We can simply hard code these clinically derived mappings between risk features and support recommendations in our monitoring and support system. And this is what we are doing currently.
However, with enough training data, reinforcement learning can also be applied to this problem to allow the model to learn the best intervention to recommend given a set of risk features to reduce subsequent lapse probability.
Next Steps: Advanced Warning
Previous models only predict immediate future
Advanced warning needed for some types of supports
[PAUSE]
Now, you might have noticed that up until this point, I’ve only talked about predictions for the immediate future - the next week, the next day, or the next hour. And in these past few examples, I’ve been talking about using one of those immediate models - the next day model - to provide personalized support recommendations that the patient can implement that next day.
However, for some types of support, people may need some advanced warning to put those supports into place. For example, if you need to meet with your AA sponsor or get in to see your therapist, you will need some lead time to schedule those meetings or appointments.
Next Steps: Advanced Warning
Previous models only predict immediate future
Advanced warning needed for some types of supports
Can lag model up to two weeks into the future
Performance drops but still remains good
Given this, we have begun to explore various methods for predicting lapse probabilities further into the future.
As a first step, we took the model which predicted lapses in the next week and lagged it by different periods from one day up to two weeks into the future. For example, the two week lagged model predicts the probability of a lapse that will occur during a one week period but that period begins two weeks from now.
And as you can see from the plot on the right, this approach does result in a drop in performance as we move further into the future. But even with a two week lag, we still have an auROC of .85 which is still potentially clinically useful.
And we are currently exploring other methods for predicting lapse probabilities further into the future that may perform better than this simple lagging approach and I’d be happy to talk more about those during the discussion period if anyone is interested.
Next Steps: Optimize System Feedback to Patients
Sensing EMA and geolocation for four months
Model updated each night for next day
Lapse probability predictions
Important risk features
Risk relevant support recommendations
Participants receive daily messages varying combinations of these components
Measure trust, engagement, and clinical outcomes
And that brings us to perhaps the most important next step to implementing our Smart Recovery Monitoring and Support System.
Just because our prediction models perform well, doesn’t guarantee that they will provide meaningful clinical benefits.
We need to be able to provide feedback from these models to patients such that they trust the system and its feedback, find it useful, and engage with it over time in a way that improves their recovery. Questions about what information to provide, how to present it, and when to present it are all critical to the success of this system.
And we were just awarded another grant from the NIAAA to do exactly this.
Next Steps: Optimize System Feedback to Patients
Sensing EMA and geolocation for four months
Model updated each night for next day
Lapse probability predictions
Important risk features
Risk relevant support recommendations
Participants receive daily messages varying combinations of these components
Measure trust, engagement, and clinical outcomes
In this project, we have embedded a prediction model that uses inputs from both EMA and geolocation within our Smart recovery monitoring and support system.
Participants will use this system for 4 months. Each day, the model will
make daily lapse probability predictions for each participant
identify current personalized lapse risks contributing to that prediction,
and map those risks to behavioral and support recommendations that are specific to each person each day.
We can then manipulate what information we include in daily messages from the system to participants to increase their trust and engagement with the system as well as formally evaluating its clinical benefits.
Obviously, we are very excited to get started on this project because it will bring us one step closer to providing meaningful support to individuals in recovery.
Thanks for your time. I am eager to hear your reactions to all of this and I’d be happy to answer any questions.
CRediTs
Acknowledgements (recent projects first)
Acknowledgements (alphabetized)