I am a Senior Researcher at Microsoft Research and an Affiliate Assistant Professor at the University of Washington. My work focuses on explainable artificial intelligence and its application to problems in medicine and healthcare. This has led to the development of broadly applicable methods and tools for interpreting complex machine learning models that are now used in banking, logistics, manufacturing, cloud services, economics, sports, and other areas. I did my Ph.D. studies at the Paul G. Allen School of Computer Science & Engineering of the [University of Washington] working with Su-In Lee.
PhD in Computer Science, 2019
University of Washington
MS in Computer Science, 2008
Colorado State University
BS in Computer Science, 2005
Colorado State University
A popular package that uses SHAP values (theoretically grounded feature attributions) to explain the output of any machine learning model. It is actively used by thousands of data scientists representing a diverse set of organizations, including startups, non-profits, major tech companies, NBA teams, banks, and medical providers. It has high speed algorithm integrations with XGBoost, LightGBM, CatBoost, scikit-learn, TensorFlow and PyTorch.
Explainable machine-learning predictions for the prevention of hypoxaemia during surgery. Prescience is a machine-learning-based system that predicts the risk of hypoxaemia and provides explanations of the risk factors in real time during general anaesthesia. The system improved the performance of anaesthesiologists by providing interpretable hypoxaemia risks and contributing factors.
Consistent individualized feature attribution for tree ensembles. While computing the classic Shapley values from game theory is NP-hard in general, we show how to exactly compute them in low order polynomial time for tree ensembles. This enables us to provide explanations of individual machine learning predictions that come with strong theoretical guarantees and no sampling variability.
Learning the human chromatin network from all ENCODE ChIP-seq data. A cell’s epigenome arises from interactions among regulatory factors—transcription factors and histone modifications—co-localized at particular genomic regions. We developed ChromNet to infer a network of these interactions, the chromatin network, by inferring conditional-dependence relationships among a large number of ChIP-seq data sets from the ENCODE project.
Understanding why a model makes a certain prediction can be as crucial as the prediction’s accuracy in many applications. However, the highest accuracy for large modern datasets is often achieved by complex models that even experts struggle to interpret, such as ensemble or deep learning models, creating a tension between accuracy and interpretability. In response, various methods have recently been proposed to help users interpret the predictions of complex models, but it is often unclear how these methods are related and when one method is preferable over another. To address this problem, we presented a unified framework for interpreting predictions, SHAP (SHapley Additive exPlanations).