The DeepSolar Project

DeepSolar is a deep learning framework that analyzes satellite imagery to identify the GPS locations and sizes of solar photovoltaic (PV) panels. Leveraging its high accuracy and scalability, DeepSolar constructed a comprehensive high-fidelity solar deployment database for the contiguous U.S.

We demonstrated its value by discovering that residential solar deployment density peaks at a population density of 1000 capita/mile2, increases with annual household income asymptoting at ~$150K, and has an inverse correlation with the Gini index representing income inequality. We uncovered a solar radiation threshold (4.5 kWh/m2/day) above which the solar deployment is “triggered”. Furthermore, we built an accurate machine learning-based predictive model to estimate the solar deployment density at the census-tract level. We offer DeepSolar as a publicly-available database for researchers, utilities, solar developers and policymakers to further uncover solar deployment patterns, build comprehensive economic and behavioral models, and ultimately support the adoption and management of solar electricity.

The work has been accepted and published by Joule in December, 2018.


Solar photovoltaics (PV) adoption is rapidly growing worldwide due to its reducing costs and environmental benefits. With deep penetration of solar energy resources, the electric grid in the U.S. is also undergoing a transformation towards a cleaner energy network. However, a complete database containing the accurate locations and size information of PV installations, especially of distributed rooftop/residential solar panels, is still unavailable in the U.S., making power grid monitoring and operation difficult. Moreover, current socioeconomic analyses on solar adoption in the U.S. are based on data from specific regions or limited groups of residents. Considering nationwide heterogeneity, conclusions drawn from such local samples can be misleading for policymakers and solar companies.

Recent attempts to build a large-scale solar project database, such as the OpenPV Project, have relied on voluntary surveys and self-reports, which are often incomplete, outdated, low-resolution (zip-code level) and with no guarantee on completeness or absence of duplication. However, the availability of high-resolution satellite imagery covering the majority of the U.S., which is annually updated, offers a rich source of data for collecting solar installation information. On the other hand, the recent breakthroughs of deep learning enables automatic and accurate image classification and segmentation. Combining satellite imagery and deep learning, we aimed to develop a framework to automatically construct, maintain, and update the solar installation database and realize the next-level visibility on renewable energy deployment.

DeepSolar Model

DeepSolar model incorporates both image classification and semantic segmentation in a single Convolutional Neural Network. Classification is to localize the solar panels and segmentation is to estimate their sizes. The classification branch is developed based on Google Inception V3, which is pretrained on ImageNet and then fine-tuned on our dataset containing 360K images. The output of the classifcation branch is a class indicating either "positive" (containing solar panel) or "negative" (not containing solar panel). The precision and recall of classification are both around 90% for residential and non-residential areas.

Only if an image is classified as "positive", the segmentation branch is executed. The segmentation does not need another foward pass. Instead, it leverages the intermediate results from the classification branch and generates the Class Activation Maps (CAMs) by aggregating feature maps learned through the convolutional layers. The segmentation results are then obtained by setting a threshold to the CAMs. The following figure shows the examples of original satellite images, the corresponding CAMs and segmentation results. Such segmentation method never used ground truth segmentation results for training, but only required ground truth class label ("positive" or "negative") for minimizing the classification error instead. Therefore, it is "semi-supervised", which is quite useful when the ground truth segmentation labeling is extremely expensive to get, as is in our case.

To train the segmentation branch, we proposed a "greed layerwise training" scheme to greedily extract features and improve the segmentation quality. In our case, we decompose the training into two steps. For each step, we greedily train a newly-added convolutional layer in the segmentation branch to minimize the classification error. See our paper and supplemental information for details of the DeepSolar model.

DeepSolar Database

Leveraging the DeepSolar model, we have constructed a comprehensive solar installation database covering the 48 contiguous states in the U.S. The database includes location, size, and type (residential/non-residential) information for each recorded solar power system. The dataset will be continuously updated to generate a time-history of solar installations and increase coverage to include all of North America and non-contiguous US states. Such database can provide valuable resources for grid minotoring and operation, socioeconomic analysis for solar adoption, and to provide insight for energy policy making.

Related Links

Media Coverage