I'm a data scientist focusing on computational inference and statistics, with an emphasis on infrastructures for scientific discovery and the integration of computation and data into social research ecosystems.

Several research questions interest me...

When computation is used in research, it becomes part of the methods used to derive a result. What information is needed to verify and replicate data science and computational findings? How should these steps be made available to the community? How can datasets and software be repurposed to catalyze new discoveries?

What actions are needed for research communities to leverage data and computational tools, while maintaining (or improving) standards of reproducibility and interpretability of results, and other values such as inclusivity and equity?

What characteristics of tools and computational environments enable data science? We have an opportunity to think about data science as a life cycle -- from experimental design and databases through algorithms and methodology to the identification and dissemination of scientific findings -- and design tools and environments that enable reliable scientific investigation and inference, in other words enable the science aspect of data science in silico.