Can data science actually be beneficial for energy modelers?

Can data science actually be beneficial for energy modelers, or is it mostly hype?

– Hesitantly Hyped

Dear Hesitantly Hyped,

Data science process involves data collection, data cleaning, exploratory data analysis, model building, and model deployment. This process utilizes subject matter expertise, computer science, math and statistics, which includes machine learning. Machine learning allows our computers to learn from historical data and classify data into similar characteristics and make predictions by using methods from other fields such as statistics. Data engineers make sure the reliable and quality data is curated and prepared for the analysis. Data analysts explore the data and get meaningful insights for the business problem at hand. Machine learning engineers select or build classification or prediction models, train, test and cross validate the models, and deploy them in the cloud or in a local machine. This separation of roles is very vague in practice and often collectively called data science. In practice, lots of time is spent in data gathering, filtering, cleaning, and transforming. This process is called ‘data wrangling’

So, how does this data science process benefit energy modelers? It helps us find new inputs for energy modeling, allows us to use the most up-to-date data, and reduces the performance gap between the designed model and the actual building energy consumption. We can typically do this by using data science techniques such as regression, classification, clustering, and segmentation of data. Sometimes more advanced machine learning techniques are used such as deep learning (e.g., computer vision, natural language processing, etc.). Data can come from BMS/BAS, IoT devices, or other unconventional sources (e.g., images, audio, video, geolocations, maps, graphs, demographics, socio-economic data, research papers, etc.).

Recent research based in the UK looked at satellite imagery of night time light intensity, volume of Twitter posts, and country-wide demographic data and identified the combination of those as a strong predictor for building electricity consumption. This is one example of how unconventional data sources can be used in combination with machine learning to benefit energy modelers.

Additional examples of building energy related data science explorations include those hosted by ASHRAE , Women in Data Science (WiDS) , and RTEM. For example, in WiDS Datathon 2022, participants were tasked to predict energy consumption of buildings using facility type, year built, floor area, location, weather, and Energy Star rating. The dataset was composed of approximately 10,000 observations, which included the site EUI of buildings collected over 7 years. While the top winning solutions identified the facility type, floor area and Energy Star rating as some of the most relevant variables for accurate prediction, which align with the expectation from typical energy modeling approaches, they did not use building science physics to draw this result. Instead, they used a simple regression to a combination of multiple decision tree based algorithms. Notice that there was no information available about the building orientation, window-to-wall ratio, building envelope, occupant density, lighting power density, or mechanical & electrical equipment types in the input dataset.

What does this tell us about energy modeling? Data science has the potential to make equally good or even better predictions about building/site energy consumption without knowing a lot about the physical characteristics of the building. That’s why large amounts of funding are being allocated to research projects involving both building physics and data science to develop energy models that would suggest the best retrofit options for different building archetypes based on historical building performance data at the urban scale (see example: University of Victoria ).

However, there are limitations to using data science for energy modeling, too. We may get excited about getting a lot of historical sensor data from hundreds of buildings. However, often, the data collection processes are not carefully designed. So, we may end up with lots of holes (i.e. missing or misclassified data) in the dataset, which can make it difficult to come up with a generalized model from it. Also, there could be a waste of time due to duplicated effort required to preprocess the data by everyone who attempts to use it. So, if we are smart about this, in the future, I expect there will be a public entity who will support the gathering of “quality, continuous” data from buildings and provide access to the clean data for efficiency and maximized benefit.

In my assessment, data science can be beneficial for energy modelers and will hopefully become one of the tools in their tool box. With the existence of quality data and data science techniques, our energy models can go beyond using standards and assumptions for model inputs and use more accurate approximations of reality.