Description of augmented Energy Information Administration generation unit dataset



This page is now obsolete. Please see the E4ST website instead.



The data file available here is the 2011 Energy Information Administration (EIA) generation unit dataset (the GeneratorsY2011.xls dataset available at augmented with additional columns, as described below. All EIA datasets mentioned below are available at the website just mentioned, and all are from the same year as the EIA dataset mentioned above.

Flue ID. This column specifies the flue(s) associated with the boiler(s) that is associated with the generation unit, according to the EIA’s “EnviroAssoc” dataset. It excludes flues for which the association is only “theoretical.” For some generation units, the EIA does not report a boiler associated with them.

Various characteristics of the boiler(s) associated with the generation unit, from the EIA’s “EnviroEquip” dataset. Most of the characteristics we report have to do with the emissions of the generation unit.

Various characteristics of the generation unit from the Environmental Protection Agency’s (EPA’s) Air Markets Program annual summary data for each unit, available at Through experimentation, we developed computer code to identify which unit in the EPA’s Air Markets Program Data is the same as which unit in the EIA’s generation unit dataset. While plants can be matched up easily because the EIA and EPA use the same plant numbering system, matching up units within the plants was quite challenging, for a few reasons: The unit names often bear no resemblance to each other in the two datasets, the EPA dataset does not include many of the units in the EIA list, many units are combined differently in the two datasets, the EPA does not report generation capacity, and the two datasets sometimes report different fuel types for the same unit. As a result, through experimentation we have developed and coded an algorithm for matching the units based on fuel type, unit type, and similarity scores that are based on maximum generation and on unit names. We calculate the maximum generation of most EPA units from a separate dataset that reports each unit’s fuel use and gross generation in each hour of the year, because we found that estimates of generation capacity based on the Air Markets Program annual summary data produce extremely inaccurate estimates of generation capacity in some cases, perhaps in part because some units are capable of much higher heat input during upramping than they would use in steady state operation. Based on extensive checking, we judge that the resulting matchup of EIA and EPA generation units is not perfect but is highly accurate, on the order of 95% accurate. In most cases in which the matchups seem to be incorrect, the EIA unit was matched with an EPA unit that is of the same vintage, type, and approximate capacity as the correct match, or else is the best available proxy when the correct unit does not appear in the EPA dataset.

nox,” “so2,” and “hr.” These are the estimated nitrogen oxide emission rate, sulfur dioxide emission rate, and heat rate of the generation unit, respectively. For EIA generation units that we were able to match with EPA units, we calculate them from EPA annual or hourly data. For other EIA generation units, we estimate these values using regression analysis, based on the known characteristics of the unit and the characteristics and rates of the units with known rates. We used a separate set of regressions for each combination of fuel type and generator (“prime mover”) type, except in the case of combinations containing too few units to support separate regressions. In such cases, we used one regression in which the combination of fuel type and generator type was an explanatory variable. We constrained all estimated rates to be no higher than the highest, and no lower than the lowest, rate reported by EPA for a unit with that combination of fuel type and generator type. For some combinations with no rates at all available from EPA data, we have used rates from the literature. For a small number of generators with unusual combinations of fuel type and generator type, we have not included rates.
The “Read me.xls” file provides additional information about these columns.

Citation information
If you use or refer to these data, please cite the following paper:

Daniel L. Shawhan, John T. Taber, Di Shi, Ray D. Zimmerman, Jubo Yan, Charles M. Marquet, Yingying Qi, Biao Mao, Richard E. Schuler, William D. Schulze, and Daniel J. Tylavsky, “Does a Realistic Model of the Electricity Grid Matter? Estimating the Impacts of the Regional Greenhouse Gas Initiative,” Resource and Energy Economics (2013),

We gratefully acknowledge the US Department of Energy's Center for Electric Reliability Technology Solutions for funding this work.