In this chapter we elucidate four main themes. The rst is that modern data analyses, including "Big Data" analyses, often rely on data from dierent sources, which can present challenges in constructing statistical models that can make eective use of all of the data. The second theme is that although data analysis is usually centralized, frequently the nal outcome is to provide information or allow decision-making for individuals. Third, data analyses often have multiple uses by design: the outcomes of the analysis are intended to be used by more than one person or group, for more than one purpose. Finally, issues of privacy and condentiality can cause problems in more subtle ways than are usually considered; we will illustrate this point by discussing a case in which there is substantial and eective political opposition to simply acknowledging the geographic distribution of a health hazard.

A researcher analyzes some data and learns something important. What happens next? What does it take for the results to make a dierence in people{\textquoteright}s lives? In this chapter we tell a story - a true story - about a statistical analysis that should have changed government policy, but didn{\textquoteright}t. The project was a research success that did not make its way into policy, and we think it

provides some useful insights into the interplay between locally-collected data, statistical analysis, and individual decision making.

We describe a general approach using Bayesian analysis for the estimation of parameters in physiological pharmacokinetic models. The chief statistical difficulty in estimation with these models is that any physiological model that is even approximately realistic will have a large number of parameters, often comparable to the number of observations in a typical pharmacokinetic experiment (e.g., 28 measurements and 15 parameters for each subject). In addition, the parameters are generally poorly identified, akin to the well\known ill-conditioned problem of estimating a mixture of declining exponentials. Our modeling includes (a) hierarchical population modeling, which allows partial pooling of information among different experimental subjects; (b) a pharmacokinetic model including compartments for well-perfused tissues, poorly perfused tissues, fat, and the liver; and (c) informative prior distributions for population parameters, which is possible because the parameters represent real physiological variables. We discuss how to estimate the models using Bayesian posterior simulation, a method that automatically includes the uncertainty inherent in estimating such a large number of parameters. We also discuss how to check model fit and sensitivity to the prior distribution using posterior predictive simulationY We illustrate the application to the toxicokinetics of tetrachloroethylene (perchloroethylene [PERC]), the problem that motivated this work.

}, keywords = {bayesian methods, hierarchical models, informative prior distributions, markov chain simulation, pharmacokinetics, posterior predictive checks, sensitivity analysis, tetrachloroethylene, toxicokinetics}, doi = {10.2307/2291566}, author = {Andrew Gelman and Fr{\'e}d{\'e}ric Y. Bois and Jiming Jiang} } @article {11029, title = {Population toxicokinetics of tetrachloroethylene}, journal = {Archives of Toxicology}, volume = {70}, year = {1996}, pages = {347-355}, abstract = {In assessing the distribution and metabolism of toxic compounds in the body, measurements are not always feasible for ethical or technical reasons. Computer modeling offers a reasonable alternative, but the variability and complexity of biological systems pose unique challenges in model building and adjustment. Recent tools from population pharmacokinetics, Bayesian statistical inference, and physiological modeling can be brought together to solve these problems. As an example, we modeled the distribution and metabolism of tetrachloroethylene (PERC) in humans. We derive statistical distributions for the parameters of a physiological model of PERC, on the basis of data from Monster et al. (1979). The model adequately fits both prior physiological information and experimental data. An estimate of the relationship between PERC exposure and fraction metabolized is obtained. Our median population estimate for the fraction of inhaled tetrachloroethylene that is metabolized, at exposure levels exceeding current occupational standards, is 1.5\% [95\% confidence interval (0.52\%, 4.1\%)]. At levels approaching ambient inhalation exposure (0.001 ppm), the median estimate of the fraction metabolized is much higher, at 36\% [95\% confidence interval (15\%, 58\%)]. This disproportionality should be taken into account when deriving safe exposure limits for tetrachloroethylene and deserves to be verified by further experiments.

}, keywords = {human metabolism, pharmacokinetics, population toxicokinetics, tetrachloroethylene}, author = {Fr{\'e}d{\'e}ric Y. Bois and Andrew Gelman and Jiming Jiang and Don Maszle and Lauren Zeise and George Alexeeff} } @article {10963, title = {Bayesian Prediction of Mean Indoor Radon Concentrations for Minnesota Counties}, journal = {Health Physics}, year = {1995}, month = {12/1996}, chapter = {Chapter}, abstract = {Past efforts to identify areas with higher than average indoor radon concentrations by examining the statistical relationship between local mean concentrations and physical parameters such as the soil radium concentration have been hampered by the variation in local means caused by the small number of homes monitored in most areas. In this paper, indoor radon data from a survey in Minnesota are analyzed to minimize the effect of finite sample size within counties, to determine the true county-to-county variation of indoor radon concentrations in the state, and to find the extent to which this variation is explained by the variation in surficial radium concentration among counties. The analysis uses hierarchical modeling, in which some parameters of interest (such as county geometric mean radon concentrations) are assumed to be drawn from a single population, for which the distributional parameters are estimated from the data. Extensions of this technique, known as random effects regression and mixed effects regression, are used to determine the relationship between predictive variables and indoor radon concentrations; the results are used to refine the predictions of each county{\textquoteright}s radon levels, resulting in a great decrease in uncertainty. The true county-to-county variation of geometric mean radon levels is found to be substantially less than the county-to-county variation of the observed geometric means, much of which is due to the small sample size in each county. The variation in the logarithm of surficial radium content is shown to explain approximately 80\% of the variation of the logarithm of geometric mean radon concentration among counties. The influences of housing and measurement factors, such as whether the monitored home has a basement and whether the measurement was made in a basement, are also discussed. The statistical method can be used to predict mean radon concentrations, or applied to other geographically distributed environmental parameters.

}, author = {Phillip N. Price and Anthony V. Nero and Andrew Gelman} }