research overview

for statisticians

The Statistical Diversity Lab (est. 2018), is my research group in the Department of Biostatistics at the University of Washington. We apply our statistical training to develop methods for the analysis of biodiversity data, with a particular emphasis on the microbiome. Microbes play a critical role in a wide variety of human diseases and environmental outcomes, and manipulating the microbiome can be as easy as washing your hands or taking a probiotic. We believe that the microbiome is the most exciting frontier of human and environmental health research, and that's why we focus it's the focus of our methods, theory and collaborative work.

My research background is in boundary value problems, hierarchical modeling, statistical inference on non-Euclidean metric spaces, discrete data models, and applied probability. The more recent work of my research group focuses on modeling relative abundance, diversity, taxonomic/phylogenetic uncertainty and batch effects, and developing statistical inference procedures for amplicon, whole-genome and metabolomic datasets. We believe that there are many un/underused data structures in microbial sequencing studies that have the potential to greatly improve microbiome modeling. We develop and sustain long-term collaborative relationships with outstanding microbial ecologists to keep our methodological and theoretical work relevant. Methodological development for microbial ecology demands expertise in compositional data analysis, networked data, high dimensional data, constrained estimation, and missing data, and we love learning from our colleagues in statistics, probability and applied math to expand our skills in these areas.


We care about producing high-quality software and making our papers reproducible. Please check out our github page to find data and code for our papers. If you cannot find the resources you need to reproduce our results, contact us!

~ Amy Willis, Ph.D., Assistant Professor

for biologists

The Statistical Diversity Lab (est. 2018), is the research group of Amy Willis, Ph.D., an Assistant Professor in the Department of Biostatistics at the University of Washington. We aim to bring our skills in statistics and computational biology to improve the way that microbial ecologists analyse microbiome data. We want to develop tools that enable biological understanding while ensuring that the findings of our methods are reproducible.

The methods that are typically used to analyse microbiome data were borrowed from ecology. In classical ecology experiments, a scientist would sit portion off a section of a rainforest and count the number of frogs in the section. The species of each frog could be easily identified, most species of frog were probably represented in the portion, the scientist could return the next day and find a fairly similar result, and if a new frog showed up it could be brought back to the lab to be study more closely. In contrast, a microbial ecologist cannot directly observe the microscopic organisms in an ecosystem. She has to take a sample, amplify the genetic information (which changes the sample), sequence that genetic information (incurring errors along the way), determine which microbes probably contributed that genetic information (possibly getting some of that wrong)... and that's just to construct the data for analysis! Most microbes cannot be grown in the lab, and so validating surprising findings can be challenging, or it can be impossible.

Classical ecology tools (such as analysing biodiversity and community composition) didn't require complex error modelling. However, using these tools for microbiome data is fraught with problems. Instead of samples of 100 or 200 specimens, high throughput sequencing generates hundreds of thousands or millions of sequences. However, since complex error models don't exist, standard errors calculated using classical ecology models are almost always zero. The result is that ecologists see incredibly small p values, but replicating the results of an experiment is uncommon. To deal with this, permutation based testing is typically used, but since observations are not independent, and test statistics are highly correlated, permutation testing may be no better than parametric approaches.

We therefore see a pressing need for new statistical methods to answer the questions asked by biologists and clinicians studying microbiomes. By understanding the data generating mechanism and building this understanding into statistical methods, we aim to make the results of microbiome experiments consistent across independent experiments. Coupled with methodological development, we see outreach and collaboration as an important part of this mission.

We care about producing high-quality software and making our papers reproducible. We generally post data and code on our github page, but if you cannot find the resources you need to reproduce our results, contact us.

for the public

Coming soon...


Department of Biostatistics
F-657, Box 357232
Health Sciences Building, 1959 NE Pacific St
University of Washington, Seattle, WA 98195

Website Design