I am a fifth year PhD candidate studying computational biology and machine learning in the Paul G. Allen School of Computer Science and Engineering at the University of Washington with Su-In Lee. I love working on the application of machine learning to personalized health. Over the next few decades I believe automated data analysis will lead to significant advances in our understanding and treatment of health and disease. Before UW I had the opportunity to study graph theory at Colorado State University, and lead research projects at Numerica for several years.

My current work focuses on actionable machine learning in both basic biology and predictive medicine in the hospital. In both areas a combination of interpretable models and transparent visualizations of the learned structure is important. This has lead to our development of broadly applicable methods and tools for interpreting complex machine learning models.

## Open source software

• SHAP – A unified approach to explain the output of any machine learning model. Under certain assumptions it can be shown to be the optimal linear explanation of any model’s prediction. It includes an implementation of an exact polynomial time algorithm for tree models such as random forests or gradient boosted trees, making it particularly useful for these types of models.
• ChromNet.jl – A network learning method that ingests BAM/BED files and other pre-processed data bundles (such as the one provided for all human ENCODE ChIP-seq data).

For a full list of open source packages see GitHub

## Websites

• ChromNet – An online network visualization of the chromatin network estimated from ENCODE ChIP-seq data, or custom network users upload.

## Publications

### Previous work

1. Eddie Herman says:

Hi Scott,
Can you direct me towards the supplementary proof of Theorem 2 up to 10 dimensions in your paper:
“
S. Lundberg, S. Lee “A unified approach to interpreting model predictions,” NIPS 2017 (selected for oral presentation)
“
It is not attached to the arxiv paper.

Thanks,
Eddie Herman

2. Eitan Anzenberg says:

Hi Scott, we met at NIPS 2017. I have a question about SHAP. It is possible to run the model-agnostic SHAP without a background sample?

Good question. The issue is that since SHAP values rely on conditional expectations there needs to be some definition of the background input feature distribution, even if that is just a single reference input that is defined by the user. Without any definition of the input feature distribution it would be impossible to know if a feature value was increasing or decreasing the model output (since increasing and decreasing are always relative to something else).

3. Zihao Zhou says:

Hello,

I saw your post Interpretable Machine Learning with XGBoost online. I am still quite confused about Shap contribution dependence plot. I plot a similar plot in R as well. I just have a hard time understanding change in log odd. It feels like partial dependence plot, but it is not quite the same.

Thanks
Zihao Zhou

Hey! Would you mind posting this question as a github issue? I am trying to keep all the discussion in one place 🙂

4. Ujjwal says:

Hi Scott

Can you provide an explanation of the Tree SHAP algorithm where the time complexity is reduced.

Thanks

If you could highlight what is unclear about the description in the arXiv paper that would be helpful. You can also see a python implementation on github meant to make it easy to play with.

5. Olive Gu says:

Hi Scott,

I got a question when I was reading your paper “Consistent Individualized Feature Attribution for Tree Ensembles”. I am trying to understand how the E[f(x)|x_s] is defined. In the text part, you define E[f(x)|x_s] by “the expected value of the function conditioned on a subset S of the input features”. Based on this sentence, I assume E[f(x)|x_s] is the prediction mean of the observations in the training sample that match the conditioning set S. If I understood it correctly, then how do you calculate E[f(x)|x_s] for an instance whose x_s has never appeared in the training data set?

Hey! Could you post this as a github issue? That’s where I have been keeping all the discussions on SHAP. Thanks.

6. Hey Scott,

I have some question regarding SHAP and its calculation. Specifically, is there any way I can calculate the feature contribution for each individual observation’s prediction? For example, which feature makes some sample’s prediction very high or very low, etc.

If you could send me an email, I could send more details. Appreciate your response.

Hey! Could you post this as a github issue and reference one of the examples there?

Hi Scott,

I have a question about your Corollary 1 (Linear SHAP) on page 6 of your paper “A Unified Approach to Interpreting Model Predictions.” Is the formula phi(i) correct? is phi(i) the SHAP value for an individual observation in that case? if it is, shouldn’t the notation be: phi(i) (f, x) = w(j) (x(i) – E[x(j)])? or was phi(i) really phi(j), that is, the SHAP value for feature j? then in that case the notation is phi(j) (f,x) = w(j) (x(j) – E [x(j)]) with each element of the formula being a vector, not an individual observation?

Thank you,

Missed this comment when you first sent it. Yes, thanks! That is a mistake which I have now noted in an errata doc linked to from my paper list.

8. Liat says:

Hi Scott,
I am handling a data with 5000 features and the shap values are all zeros. With 1000 features I get nice results.
Do you know why it happens? I am using Kernel SHAP.
Thanks,
Liat

Missed this on my WP comment section, my best guess that the L1 penalty is not set correctly. By default it uses BIC, but with 5k features it might just be getting turned up too high. I would try setting it manually and/or increasing the number of samples.

9. Emrecan says:

Hello Scott,

I have been wondering , is there any way to use SHAP’s DeepExplainer without Keras API.
Since, I built a custom auto-encoder architecture with TensorFlow, only types that I can provide are either the or .
In summary, I cannot provide the expected “model” type.
Is it too early to expect workarounds for alternative approaches? :))

Just getting to some page comments I found (late) here on WP. The short answer is yes, yes pass a pair of TF tensors, the first being the input and the second being the output. If you have more questions post a github issue.

10. Keewon Lee says:

Hi Scott,

I am a BME student in Case Western Reserve University, and I am trying to do a journal review on your latest work, “Explainable machine-learning predictions for the prevention of hypoxaemia during surgery,” in one of my health informatics class, but the access for the publication is still not allowed. I am wondering if I could gain access to the pdf file for this article. It would be greatly appreciated (and also on the behalf of my peers) if I could receive the published pdf version of this work from you! Thanks..

-Keewon