# About

I am a fifth year PhD candidate studying computational biology and machine learning in the Paul G. Allen School of Computer Science and Engineering at the University of Washington with Su-In Lee. I love working on the application of machine learning to genomics and personalized health. Over the next few decades I believe these fields will have a large impact on our daily lives, and that much of this impact will be made possible by automated data analysis. Before UW I had the opportunity to study graph theory at Colorado State University, and lead research projects at Numerica for several years.

My current work focuses on actionable machine learning in both basic biology and predictive medicine in the hospital. In both areas a combination of interpretable models and transparent visualizations of the learned structure is important.

Check out my publications and blog posts for more details on my work.

## Open source software

• SHAP – Explains the output of any machine learning model using expectations and Shapley values. Under certain assumptions it can be shown to be the optimal linear explanation of any model’s prediction.
• ChromNet.jl – A network learning method that ingests BAM/BED files and other pre-processed data bundles (such as the one provided for all human ENCODE ChIP-seq data).
• SimplePlot.jl – A wrapper for Julia plotting based on Matplotlib. It allows natural layer based compositing and simple keyword parameter distribution to make simple plots simple and complex plots understandable.

For a full list of open source packages see GitHub

## Websites

• ChromNet – An online network visualization of the chromatin network estimated from ENCODE ChIP-seq data, or custom network users upload.

## 4 thoughts on “About”

1. Eddie Herman says:

Hi Scott,
Can you direct me towards the supplementary proof of Theorem 2 up to 10 dimensions in your paper:
“
S. Lundberg, S. Lee “A unified approach to interpreting model predictions,” NIPS 2017 (selected for oral presentation)
“
It is not attached to the arxiv paper.

Thanks,
Eddie Herman

2. Eitan Anzenberg says:

Hi Scott, we met at NIPS 2017. I have a question about SHAP. It is possible to run the model-agnostic SHAP without a background sample?

1. admin says:

Good question. The issue is that since SHAP values rely on conditional expectations there needs to be some definition of the background input feature distribution, even if that is just a single reference input that is defined by the user. Without any definition of the input feature distribution it would be impossible to know if a feature value was increasing or decreasing the model output (since increasing and decreasing are always relative to something else).