My statistical research focuses on the development and evaluation of risk prediction models as well as the analysis of high dimensional data. During my doctoral studies, I developed novel statistical methodology that utilizes machine-learning techniques to quantify a subject’s risk for disease based on a large number of genetic markers as well as environmental and clinical predictors. As a post-doctoral research fellow at the Fred Hutchinson Cancer Research Institute, I collaborated with geneticists and epidemiologists on the analysis of a vast amount of genomic and epidemiological data collected by multiple institutes to study colorectal cancer pathogenesis. I have also collaborated with researchers at Harvard Medical School and Brigham and Women’s Hospital in Boston, MA, on analyses of diverse types of high dimensional data such as electronic medical records, as well as multiple large genome-wide association studies. My connections and collaborations with clinicians and scientists in the Knight Cardiovascular Institute and Knight Cancer Institute encourage my interest in factors contributing to cardiovascular disease, stroke and cancer, as well as disease prevention and treatment strategies.
Statistical Methodology for Genetic Risk Prediction and High Dimensional
Analysis of high-dimensional data often seeks to identify a subset of important features and to assess the effects of these features on outcomes. Traditional statistical inference procedures based on standard regression methods often fail in the presence of high-dimensional features. In recent years, regularization methods have emerged as promising tools for analyzing high-dimensional data. Statistical inference on these models is challenging. In my graduate studies I developed methods for estimating the distribution of regularized regression coefficients. This method, justified by asymptotic theory, provides a simple way to estimate the covariance matrix and confidence regions, thereby providing accurate inference on these commonly used models. In addition to inference problems, I have worked on methods for risk prediction and classification based on a large number of predictors. One setting for which these models are useful is the burgeoning field of genomics. Genetic studies of complex traits have uncovered only a small number of risk markers explaining a small fraction of heritability and adding little improvement to disease risk prediction. Standard single marker methods may lack power in selecting informative markers or estimating effects. Most existing methods also typically do not account for non-linearity, which could result in loss of prediction accuracy when the underlying effects are non-linear. With my collaborators, I have developed risk prediction models that bridge high dimensional statistical methodology with powerful and flexible machine learning models. Our models relate genetic markers to disease risk by taking advantage of known gene-set structures. We provide a prediction model framework that is flexible for many types of genomics studies.
Prediction and Classification with Electronic Medical Records
Electronic Medical Record (EMR) data marts provide a rich and vast set of data from which to characterize population and individual disease progression. It is challenging to harness this data in a meaningful way and hence building accurate prediction models with this data is difficult. My collaborators at Harvard Medical School and Brigham and Women’s Hospital in Boston, MA, have developed natural language processing (NLP) methods to glean informative features from EMR data. Together with my collaborators and my PhD advisor, Professor Tianxi Cai, we developed statistical methods to incorporate NLP terms with codified and clinical data to accurately predict disease risk. We incorporated regularized regression and resampling techniques to build the models. These models were successfully validated in larger data marts in order to identify eligible cohorts for future epidemiological studies.
Statistical Genetics and Genetic Epidemiology
The field of genetics and genomics is growing rapidly. In order to answer scientific questions of interest it is imperative to form an interdisciplinary team including geneticists, epidemiologists, and biostatisticians. As a member of such teams, I have studied of genetic phenomenon as genetic instability and gene-environment interactions. I have collaborated with research teams at Harvard and the Fred Hutchinson Cancer Research Center to study these topics in various large genomic data sets. My main contributions were providing statistical guidance in study design, performing data analyses, and conducting relevant simulation studies to answer the research questions of interest.
Ongoing Research Support
R21-NS094833-01, DHHS/NIH/IGNITE Stenzel-Poore (PI)
Development of Poly ICLC for neuroprotection against ischemic brain injury
This proposal seeks to advance PolyICLC, a compound that has shown efficacy as a prophylactic treatment in mice, through our novel nonhuman primate stroke model, allowing for eventual translation into patients at risk of brain ischemic injury.
R01-HL123762, NIH/NHLBI Lindner (PI)
Augmentation of Tissue Perfusion in PAD with Ultrasound-mediated Cavitation
The major goal of this project is to explore how ultrasound-mediated cavitation of microbubble contrast agents can produce augmentation in tissue perfusion, and to assess the mechanism by which flow increases.
14NSBRI1-0025, NSBRI Lindner (PI)
Biomarker assessment for identifying heightened risk for cardiovascular complications during long-duration space missions
With the proposed work we will explore new paradigms for: (a) predicting risk for developing atherosclerotic complications in astronauts, (b) monitoring in-flight changes in risk profile that may occur in the setting of deep space exploration, and (c) identifying endothelial susceptibility to the detrimental effects of space radiation. I oversee data collection and processing, coordinate with other statisticians and bioinformaticians on managing the data and performing statistical analyses.
U01 CA185094, NIH/NCI Peters (PI)
Colorectal Tumor Risk Prediction in the PLCO Trial
The study will use detailed risk-factor information for colorectal cancer (CRC), including genome-wide genetic data as well as lifestyle and environmental risk factors, to build and validate a comprehensive risk-prediction model using data from a large CRC consortium and the PLCO trial as an independent validation study. As Co-I, I participate in the development of the models and the statistical advisory of the study.
P30 CA69533-16, NIH/NCI Druker (PI)
OHSU Knight Cancer Institute
To support the Cancer Institute at Oregon Health & Science University, its programs, shared resources, and administration. Shared resources include; cancer pathology, flow cytometry, molecular biology, transgenic/gene knockout, gene array, clinical research management, biostatistics and informatics. The instruments of the Institute foster interdisciplinary coordination and collaboration of cancer research faculty at OHSU in basic, clinical, and population research.
Completed Research Support
T32 AI007358-23 Pagano (PI)
Biostatistics/Epidemiology Training Grants in AIDS
T32 AI007358-22 Pagano (PI)
Biostatistics/Epidemiology Training Grants in AIDS