Prediction of post-transplant hospitalization among kidney transplant recipients with clinical notes and electronic healthcare record data
Michael Arenson1, Julien Hogan1, Liyan Xu2, Raymond Lynch1, Hana Lee1, Jinho Choi2, Jimeng Sun3, Andrew Adams1, Rachel Patzer1,4.
1Division of Transplantation, Emory University School of Medicine, Atlanta, GA, United States; 2Department of Computer Science, Emory University, Atlanta, GA, United States; 3College of Computing, Georgia Institute of Technology, Atlanta, GA, United States; 4Department of Epidemiology, Rollins School of Public Health Emory University, Atlanta, GA, United States
Background: Rehospitalization after kidney transplant is costly to patients and healthcare systems and is associated with poor outcomes. Prediction models have previously been used to identify patients at risk of rehospitalization with limited success. Few studies have examined the inclusion of free-text data from clinical notes in the electronic medical record (EMR) and natural language processing (NLP) techniques to enhance the prediction of rehospitalization.
Methods: In this study, we aimed to include EMR clinical notes in predictive models of 30-day rehospitalization (30DR) post-kidney transplant in a retrospective, observational study of first-time recipients of kidney transplant at a large, urban hospital in the Southeastern United States between January 2005 and December 2015 using both structured (EMR) and unstructured (i.e. clinical notes) data. We used NLP techniques on eight types of clinical notes, which were mined for possible new predictive features of 30DR post-kidney transplant and included in predictive models built with unsupervised machine-learning approaches and text mining using Term Frequency-Inverse Document Frequency (TF-IDF) methods. We built several predictive models, including structured data only, and combinations of structured data with clinical notes. The area under the curve (c-statistic) was used to determine and compare model accuracy, and 5-fold cross-validation was used to test model performance.
Results: Among 2,060 kidney transplant recipients, 30.7% were readmitted within 30 days. The mean age was 51 years and 47% were Black or African American. TF-IDF identified words that most frequently appear in one clinical note but least frequently in all other documents (Figure). Predictive models had similar performance when considering structured data from the EMR only (c-statistic 0.6821; 95% CI 0.6644, 0.6998) and combined structured + progress notes (c-statistic: 0.6902; 95% CI 0.6699, 0.7105). Predictive models built with clinical notes alone performed worse than models using structured data. Notes that improved model performance the most were more heavily clinical, including progress notes, consultation notes, and discharge summaries (Table).
Conclusions: Future multi-center studies should use more advanced NLP techniques to create novel predictors from social worker and other non-medical but important predictive notes. Researchers should also consider pooling data from multiple institutions to increase sample size.
Reducing Disparities among Kidney Transplant Recipients (R01MD011682) funded by National Institute on Minority Health and Health Disparities (NIMHD).