Selected Projects

Explain any machine learning model

A popular package that uses SHAP values (theoretically grounded feature attributions) to explain the output of any machine learning model. It is actively used by thousands of data scientists representing a diverse set of organizations, including startups, non-profits, major tech companies, NBA teams, banks, and medical providers. It has high speed algorithm integrations with XGBoost, LightGBM, CatBoost, scikit-learn, TensorFlow and PyTorch.


Explainable machine-learning predictions for the prevention of hypoxaemia during surgery. Prescience is a machine-learning-based system that predicts the risk of hypoxaemia and provides explanations of the risk factors in real time during general anaesthesia. The system improved the performance of anaesthesiologists by providing interpretable hypoxaemia risks and contributing factors.

Exact game theoretic explanations for trees

Consistent individualized feature attribution for tree ensembles. While computing the classic Shapley values from game theory is NP-hard in general, we show how to exactly compute them in low order polynomial time for tree ensembles. This enables us to provide explanations of individual machine learning predictions that come with strong theoretical guarantees and no sampling variability.


Learning the human chromatin network from all ENCODE ChIP-seq data. A cell’s epigenome arises from interactions among regulatory factors—transcription factors and histone modifications—co-localized at particular genomic regions. We developed ChromNet to infer a network of these interactions, the chromatin network, by inferring conditional-dependence relationships among a large number of ChIP-seq data sets from the ENCODE project.

Unifying Explanation Methods

Understanding why a model makes a certain prediction can be as crucial as the prediction’s accuracy in many applications. However, the highest accuracy for large modern datasets is often achieved by complex models that even experts struggle to interpret, such as ensemble or deep learning models, creating a tension between accuracy and interpretability. In response, various methods have recently been proposed to help users interpret the predictions of complex models, but it is often unclear how these methods are related and when one method is preferable over another. To address this problem, we presented a unified framework for interpreting predictions, SHAP (SHapley Additive exPlanations).

Publications and Patents

Explainable AI for Trees: From Local Explanations to Global Understanding. submitted for review, 2019.  


AIControl: Replacing matched control experiments with machine learning improves ChIP-seq peak identification. to appear in Nucleic Acids Research, 2019.  

Preprint Code

Explainable machine-learning predictions for the prevention of hypoxaemia during surgery. Nature Biomedical Engineering, volume 2, pages 749–760, 2018.  (selected to be the cover article)

PDF Code

Consistent Individualized Feature Attribution for Tree Ensembles. arXiv, 2018.  

PDF Code

A machine learning approach to integrate big data for precision medicine in acute myeloid leukemia. Nature communications, 2018.  


A unified approach to interpreting model predictions. NeurIPS, 2017.  (selected for oral presentation)

PDF Code Errata Video

Anesthesiologist-level forecasting of hypoxemia with only SpO2 data using deep learning. NeurIPS Workshop ML4H: Machine Learning for Health, 2017.  


Hybrid Gradient Boosting Trees and Neural Networks for Forecasting Operating Room Data. NeurIPS Workshop ML4H: Machine Learning for Health, 2017.  


An unexpected unity among methods for interpreting model predictions. NeurIPS Workshop on Interpretable Machine Learning in Complex Systems, 2016.  (best paper award)

PDF Code

CloudControl: Leveraging many public ChIP-seq control experiments to better remove background noise. in Proceedings of the 7th ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics, ACM, 2016.  


ChromNet: Learning the human chromatin network from all ENCODE ChIP-seq data. Genome Biology, 2016.  (F1000Prime recommended)

Preprint Code

Method for Lossy Compression of Point Clouds with Pointwise Error Constraints. U.S. Patent # US8811758, 2014.