Data Science & AI Projects and Publications
Projects
Project Title: CVC Tracking and Predictive Tool
Project Team:
Principle Investigator: Dr. Joseph Miller
Study Team: Dr. Molly Easterlin, Dr. Srikumar Nair, Melody Li, Mohit Mehra
Synopsis: The CVC (Central Venous Catheter) project is a three-fold initiative with the first aim dedicated to establishing a comprehensive retrospective tracking system for central venous catheters placed at CHLA over the last five years, capturing a wide array of patient, device, technical, and clinical data. Aim 2 focuses on monitoring a defined set of complication-associated outcome metrics within the tracked patient population, including reasons for catheter removal, mispositioning, required interventions, and thrombotic events. Finally, Aim 3 aims to develop a predictive tool that identifies factors from the collected demographics in Aim 1 contributing to the outcomes in Aim 2, with the stretch goal of creating an electronic tool for prospective comparison of central venous access options in patients requiring such procedures.
Project Title: Down Syndrome Regression Disorder (DSRD)
Project Team: Dr. Jonathan Santoro, Dr. Natasha Lepore, Dr. Ramon Durazo-Arvizu, Dr. Joaquin Espinosa, Dr. Graham Noblit
Synopsis: Down Syndrome Regression Disorder (DSRD) is a late-onset neuropsychiatric disorder that significantly reduces patients’ quality of life1 Evidence indicates that the presence of certain neurodiagnostic abnormalities may predict response to specific forms of treatment, however as of yet no validated therapeutic algorithms exist for DSRD. This project collects new neurodiagnostic and therapeutic response data permitting better mechanistic understandings of DSRD that will in turn guide Phase I-III clinical trials and ultimately help generate personalized treatment plans for individuals with DSRD. We use clustering techniques to reveal the presence of DSRD subtypes driven by distinct physiological mechanisms as well as regularization and dimensionality reduction techniques to identify which diagnostic tests help to predict both the severity of DSRD and treatment outcomes.
Project Title: Clinical Aid using EEG data
Project Team:
Principle Investigator: Sahana Nagabhushan Kalburgi
Study Team: Delara Aryan, Melody Li, Mohit Mehra, Stephan Erberich
Synopsis: This project is a three-fold initiative, with the first aim dedicated to evaluating various EEG analysis toolboxes using research EEG data to determine the most reliable option for extended use. By comparing functionalities, feature extractions, and results across different toolboxes, it seeks to establish a robust framework for processing EEG data efficiently and accurately. Aim 2 focuses on gaining access to clinical EEG data from CHLA and creating an EEG datalake. Finally, Aim 3 focuses on developing a clinical aid tool that identifies unusual patterns indicative of neurological disorders at earlier stages using the processing pipeline established in Aim 1 on clinical EEG data obtained from Aim 2.
Project Title: Kubernetes Cluster
Project Team: Dr. Eamon Doyle, Dr. Stephan Erberich
Synopsis: To support data science endeavors at CHLA, we maintain a 17-node (2 edge nodes, 15 workers) Kubernetes cluster. This system provides CPU and GPU computing resources on each node, a nearly 1 petabyte distributed file system for redundant, high-performance data delivery across the cluster and to satellite nodes. User management integrated with the IS-supported Active Directory provides a seamless interface for users to upload data, execute containerized workloads, and download results.
Project Title: Large Language Model Applications at CHLA
Project Team: Dr. Graham Noblit, Dr. Stephan Erberich, Mohit Mehra
Synopsis: Clinical notes and reports within a patients’ electronic health record (EHR) contain valuable information that permits clinicians and researchers to better recruit for clinical trials, phenotype disease, and produce more fine-grained diagnostic and prognostic models; however, this information is inaccessible because it is locked away in free text. Modern natural language techniques use large language models, such as OpenAI’s ChatGPT, to process these clinical notes, both directly extracting the information contained within them and classifying documents directly. We compare a variety of approaches for extracting information from clinical reports with a specific focus on techniques that avoid generating expensive labels for model-training including text generation; using self-supervised training to create models capable of searching through clinical reports based on their semantic content; and compressing knowledge in larger, more unwieldly, and slower models into smaller, quicker models.
Project Title: Quantitative Susceptibility Mapping
Project Team:
Principle Investigators: Dr. John Wood, Dr. Benita Tamrazi
Study Team: Dr. Matthew Borzage, Hashem Zamanian
Synopsis: The QSM project is a comprehensive pipeline designed for Quantitative Susceptibility Mapping analysis. The project involves retrieving raw phase and magnitude images from the PACS (Picture Archiving and Communication System), followed by preprocessing, neural network inference, and post-processing. Notably, a UNet deep learning model has been trained for the neural network inference step. The main functionalities of the pipeline include brain extraction, phase unwrapping, background field removal, neural network inference, and the subsequent saving of QSM DICOM files. Furthermore, the QSM pipeline produces a comprehensive report to present susceptibility values and their corresponding Regions of Interest (ROIs) for diagnostic purposes.
Project Title: Rod Clustering Analysis
Project Team:
Principle Investigator: Dr. David Cobrinik
Study Team: Bhavana Bhat, Kevin Stachelek, Dr. Graham Noblit
Synopsis: A recent analysis of fetal retinal cells by the Cobrinik lab produced an unexpected result: the presence of two distinct clusters of rod cells as defined by scRNA gene expression data. Rod cells are the photoreceptor cells of the retina primarily responsible for night vision. The presence of two clusters of rod cells is surprising because to date, multiple types of rod cells have not been discussed in the literature. This project seeks to validate and locate the reason behind this result. The multiple clusters of rod cells may reflect developmental differences and corresponding heterogeneous gene expression, an artifact of scRNA analysis, or some other alternative.
Publications
How to cite the data science team in publications:
“The data science and ML/AI work was performed by the Data Science, AI and Biomedical Informatics Program at The Saban Research Institute, Children's Hospital Los Angeles.”