Policymakers are encouraging the development of standardized and consistent methods to quantify the electric load impacts of demand response programs. For load impacts, an essential part of the analysis is the estimation of the baseline load profile. In this paper, we present a statistical evaluation of the performance of several different models used to calculate baselines for commercial buildings participating in a demand response program in California. In our approach, we use the model to estimate baseline loads for a large set of proxy event days for which the actual load data are also available. Measures of the accuracy and bias of different models, the importance of weather effects, and the effect of applying morning adjustment factors (which use data from the day of the event to adjust the estimated baseline) are presented. Our results suggest that (1) the accuracy of baseline load models can be improved substantially by applying a morning adjustment, (2) the characterization of building loads by variability and weather sensitivity is a useful indicator of which types of baseline models will perform well, and (3) models that incorporate temperature either improve the accuracy of the model fit or do not change it.