van wickle

ABS 019: Large Language Models to Infer Depression in Patients with Neurological Conditions: A Proof-of-Concept Study

Maya Julian-Kwong ¹, Shane Poole, BS ¹, Kyra Henderson, BA ¹, Jaeleene Wijangco, BS ¹, Nikki Sisodia, BS ¹, Jeffrey Gelfand, MD ¹, Chu-Yueh Guo, MD ¹, Riley Bove, MD, MMSc ¹

¹ University of California San Francisco Weill Institute for Neurosciences, Neurology, San Francisco, CA, USA

The Van Wickle Journal (2026) Volume 2, ABS019

Introduction: AI-based large language models (LLMs) are rapidly growing in capability and use, expanding into numerous academic and professional fields due to their ability to analyze human-generated unstructured data. Over the past few years, applications of these LLMs to the medical field have included automated patient interactions, aids for medical education, analysis of data and participants for clinical trials, and extraction of discrete variables from unstructured electronic health record (EHR) data for use in clinical research and care. In healthcare settings such as in the United States, discrete data may be scattered across EHRs if patients receive care for different conditions in different health systems. Therefore, the ability to extract informative data from narrative notes could yield large amounts of discrete data that could contribute to diagnosis and help expand recognition of undiagnosed mental health disorders.

In the care of individuals with multiple sclerosis (MS), treatment of depression represents a vexing problem. Depression is common in people with MS and contributes to worsening MS disability, yet it tends to be under-ascertained in clinical practice, emphasizing the need for novel diagnostic approaches. Many discrete measures can be extracted from EHRs to indicate a patient’s mental health, such as ICD-9 codes for depression or prescriptions for antidepressants; however, these measures may be incomplete or unavailable across healthcare systems.

The objective of this study was to utilize a LLM to develop a prompt capable of inferring whether a patient is depressed from a singular clinical note by their MS neurologist. After initial cross-sectional validation of the prompt, we hypothesized that longitudinal analysis of depression ascertained by the LLM prompt compared to the neurologist’s impression could provide an early proof of concept of its relevance for depression detection. With ongoing refinements, the ultimate use of such a prompt would be as a clinical alert system to screen for a patient’s depressed mood, promoting timely diagnosis, recognition, and early treatment.

Methods: This single-center retrospective study analyzed prospectively collected EHR notes. In Phase I, an institutionally secure ChatGPT-4 prompt was iteratively refined to infer the presence of depression using the neurologist's note and compared with manual annotation of the neurologist's impression (depression: present, absent, no mention) and patient-reported outcomes (PROs): Hospital Anxiety and Depression Scale (HADS-D) or Patient Health Questionnaire-9 (PHQ-9). In Phase II, longitudinal analysis compared timing of depression detection by the prompt and the neurologist across 5 years of notes for 250 patients.

Results: In Phase I (n=278 adults with MS), the LLM prompt detected depression in 60.4% of notes (168/278). When compared with neurologist impression in the clinical notes, the prompt achieved high 97.3% sensitivity and 84.4% accuracy. Specificity was more modest (68.3%): when neurologists did not mention depression, the prompt inferred depression based on symptoms, history, and medications. When PRO and neurologist impression disagreed, the prompt aligned with PROs 61.9% of the time. In Phase II, the LLM inferred depression earlier than the neurologist in 18.8% of patients, at an average of 2.45 (SD 1.54) years earlier.

Discussion: The prompt was highly sensitive to neurologist documentation of depression in clinical notes; it inferred both present/treated depression from other note components. Potential applications include quality improvement initiatives aiming to improve depression care on a cohort level.

Volume 2, The Van Wickle Journal

Computational Applications, ABS 019

April 04th, 2026

Other Articles in Computational Applications

van wickle

ABS 019: Large Language Models to Infer Depression in Patients with Neurological Conditions: A Proof-of-Concept Study

Maya Julian-Kwong ¹, Shane Poole, BS ¹, Kyra Henderson, BA ¹, Jaeleene Wijangco, BS ¹, Nikki Sisodia, BS ¹, Jeffrey Gelfand, MD ¹, Chu-Yueh Guo, MD ¹, Riley Bove, MD, MMSc ¹

Contact