Several factors make sharing utilities' energy data difficult: privacy concerns; the data has substantial business value; the volume of raw data can be quite large. As a result, for studies that have been conducted using utility information, data sets have not been released and cannot be obtained by researchers looking to build on the research. Researchers on this project, the Reference Energy Disaggregation Data Set (REDD), were unable to identify a single independent study that used a common data set prior to the initial release of REDD.
The project collected such a data set, standardized the collection process, and made REDD free and publicly available. With common evaluation metrics, scientists can use it to develop algorithms designed to separate an aggregate, or whole-home, energy signal into its component appliance/electronic contributions, as well as to develop other energy related analytics. REDD has been downloaded by numerous academic researchers and commercial developers. The related paper has been cited by 85 other published studies.
The data were acquired from approximately 40 homes in the Boston and San Francisco metropolitan areas. In California, monitoring devices were installed in 30 homes over eighteen months. All of the data were collected for 48 different circuit breakers, with the collection period for each home typically ranging from two to four weeks. The researchers developed a process for data collection that details recruitment, consultation, hardware specifications, equipment installation and removal, data evaluation, and a one-hour exit interview with residents who participated.
In each home, researchers monitored whole-home voltage and current at high frequencies, (16kHz), to record the actual AC waveforms of the aggregate electricity signals in the homes. Because the raw 16kHz data is quite large, and consists mainly of repetitions of identical waveforms, researchers report only those times where the waveforms chang. They also monitored per-circuit electrical power at approximately one measurement every three seconds. Circuits are labeled with human-readable descriptions, with some of the major loads present on the circuits listed. The investigators also monitored per-plug electrical power consumption at medium frequency, (often once a second, though some homes have only once-a-minute monitoring), for about 20 select plug loads in each home. In all cases the appliance being monitored is labeled in the data set.
The data gives a powerful snapshot as to what has happened regarding the energy in the house over the period. The whole-home signal provides high frequency data where devices can be identified by the waveforms, while the per-circuit and per-plug signals provide the ground truth as to what actually occurred in the home.
Publications and Presentations
REDD is available at: http://redd.csail.mit.edu. Those interested need to email Zico Kolter, (firstname.lastname@example.org), to receive a username and password. Researchers can download an initial version of the data, containing several weeks of power data for 30 different homes, and high-frequency current/voltage data for the main power supply of two of these homes. The data and the hardware used to collect it are described more thoroughly in the "Readme" on the main page and in the links provided below.
REDD: A public data set for energy disaggregation research (893KB PDF)
Proceedings of the SustKDD workshop on Data Mining Applications in Sustainability
Kolter, J. Z., & Johnson, M. J. (2011)
The energy reduction motivational interview: Pilot research (2.2MB PDF)
Behavior, Energy & Climate Change Conference
Chadwick, S.J., Plano, L., Flora, J., Armel, C. (2012)
A second paper, "REDD 2.0: The Expanded Reference Energy Disaggregation Data Set" is in the process of being published.
Overall, this work is aimed at spawning the development of algorithms, more standardized testing of algorithms, and the collection of new datasets. Such a database could also provide a foundation for open competitions to seed innovation. Finally, in the future, these types of data collection procedures could be tightly integrated into commercial devices developed by companies.
In addition to this work providing data for algorithm developers and testers, researchers anticipate that this database and protocol will allow professionals in other geographic regions and climate zones to collect and store data to facilitate the continued advancement of disaggregation algorithms. Of relevance to continued work is the lesson to date that even in homogenous suburban areas, home energy systems, appliance stock and consumption patterns are extremely diverse.