Building energy simulation is widely used to help design energy efficient building envelopes and HVAC systems, develop and demonstrate compliance of building energy codes, and implement building energy rating programs. However, large discrepancies exist between simulation results from different building energy modeling programs (BEMPs). This leads many users and stakeholders to lack confidence in the results from BEMPs and building simulation methods. This paper compared the building thermal load modeling capabilities and simulation results of three BEMPs: EnergyPlus, DeST and DOE-2.1E. Test cases, based upon the ASHRAE Standard 140 tests, were designed to isolate and evaluate the key influencing factors responsible for the discrepancies in results between EnergyPlus and DeST. This included the load algorithms and some of the default input parameters. It was concluded that there is little difference between the results from EnergyPlus and DeST if the input values are the same or equivalent despite there being many discrepancies between the heat balance algorithms. DOE-2.1E can produce large errors for cases when adjacent zones have very different conditions, or if a zone is conditioned part-time while adjacent zones are unconditioned. This was due to the lack of a strict zonal heat balance routine in DOE-2.1E, and the steady state handling of heat flow through interior walls and partitions. This comparison study did not produce another test suite, but rather a methodology to design tests that can be used to identify and isolate key influencing factors that drive the building thermal loads, and a process with which to carry them out.