Accessing the U.S. Wind Turbine Database API for Location Data Visualization


Contact: Chris Garrity | U.S. Geological Survey | cgarrity@usgs.gov
Database: U.S. Wind Turbine Database | API Access: USWTDB API

The United States Wind Turbine Database (USWTDB) provides the locations of land-based and offshore wind turbines in the United States, corresponding wind project information, and turbine technical specifications. Wind turbine records are collected and compiled from various public and private sources, digitized and position-verified from aerial imagery, and quality checked. The USWTDB is available for download in a variety of tabular and geospatial file formats, to meet a range of user/software needs. In the following examples, we'll be accessing the wind turbine data through the USWTDB API. Accessing raw data through an API lets users stay in sync with the database without the need to download static versions of the data. Learn more about the USWTDB and USWTDB API https://eerscmap.usgs.gov/uswtdb/.

The following Jupyter Notebook examples are targeted for users who are new to Jupyter and notebook environments in general. A notebook integrates code and code output into a single document that combines visualizations, narrative text, mathematical equations, and other media types. This type of workflow promotes iterative and efficient development, making notebooks an increasingly popular choice for contemporary data science and analysis. Throughout this notebook we'll provide exhaustive narrative text for each step, tailored to those just starting development in the Jupyter Notebook environment. Learn more about Project Jupyter.

heldkdsdsdas

Dependencies Used in These Examples

The examples in this notebook require the installation of two additional python packages. These packages can easily be installed using pip, a well-known standard package manager for Python*. pip allows you to install and manage additional packages that are not part of the Python standard library. The Python installer installs pip by default, so it should be ready for you to use.

pip install mapboxgl

mapboxgl allows you to build Mapbox GL JS data driven visualizations natively in Jupyter Notebooks. mapboxgl is a high-performance, interactive, WebGL-based data visualization tool that leverages Mapbox Vector Tiles. Learn more about the MapBox platform https://www.mapbox.com/maps/.

pip install pandas

pandas provides high-performance, easy-to-use data structures and data analysis tools for the Python programming language. In the following examples we will be leveraging pandas.DataFrame, generally the most commonly used pandas object. A dataframe is a 2-dimensional labeled data structure with columns of potentially different types. You can think of a dataframe like a traditional spreadsheet or SQL table, composed of rows and columns supporting a variety of tabular data types. pandas provides several methods for reading data in different formats. In these examples, we’ll request our source data through the USWTDB API which returns raw data in JSON format using standard http protocols. Learn more about pandas https://pandas.pydata.org/pandas-docs/stable/index.html.

* `Conda` is another widely-used packaging tool/installer that, unlike `pip`, handles library dependencies outside of strictly Python packages. `Conda` package and environment manager is included in all versions of Anaconda and Miniconda. Those using `conda` can simply swap `pip` with `conda` for the installs above. Learn more about `conda` https://docs.conda.io/en/latest/

Handle the Notebook Imports

Python provides a flexible framework for importing modules and specific members of a module. In the examples in this notebook, we'll import pandas and give it the alias pd. We'll also install the viz submodule and the utils submodule from the mapboxgl package, as well as operating system dependent functionalities via the os import.

In [34]:
import os

import pandas as pd
from mapboxgl.utils import *
from mapboxgl.viz import *
from IPython.display import IFrame

Example 1 - Create a Clustered Turbine Location Map of the Conterminous U.S.

Map clustering algorithms typically find map markers (points) that are near each other and denotes them with a cluster symbol representing the overall density of aggregated map markers. By default, the new symbols are labeled with the number of map markers they contain. We can apply symbol scaling and custom color ramps to the rendered cluster symbols to better help us visualize density of the dataset in our map window. As we zoom in, the algorithm re-calibrates clustering on the fly based on the number of markers in our map view. Map clustering can be a powerful visualization tool when mapping large numbers of marker data and helps users visualize patterns of points without the traditional issues of marker overlap. In this example, we'll build a simple cluster map to visualize the locations of turbines at a national scale throughout the United States. Due to the location proximity inherit in the dataset (i.e. turbines typically occur in groups, or 'wind farms' throughout the country), a cluster map becomes a useful tool to help us visualize the overall density of turbines when zoomed out at a national level.

Step 1. Add a National Geologic Map Database Vector Tile Basemap Service

The mapboxgl package leverages a public MapBox token to access MapBox hosted basemap styles. To avoid requiring users of this notebook to have a MapBox account, we'll add a USGS hosted vector tile basemap from the National Geologic Map Database (NGMDB) to our notebook. We can call custom vector tile styles using the style parameter when we generate our map visualization and simply omit the token parameter. Below, we'll pass two styles from the NGMDB as variables to use in our notebook exercises. For those with an existing MapBox account, swap the style parameter above with your MapBox token and MapBox style. Learn more about the NGMDB https://ngmdb.usgs.gov.

In [35]:
# NGMDB monochorome style designed to provide a basemap that highlights the data overlay.
ngmdbLight = "https://ngmdb-tiles.usgs.gov/styles/ngmdb-light/style.json"

# NGMDB full-color style that contains standard cartographic basemap layers, contour lines, and hillshading.
ngmdbBasemap = "https://ngmdb-tiles.usgs.gov/styles/ngmdb-tv/style.json"

Step 2. Connect to the U.S. Wind Turbine Database API and Preview the Response

As noted previously, the USWTDB API allows for programmatic access to the U.S. Wind Turbine Database by the USGS and partner agencies. Creation of the USWTDB API was meant to extend USWTDB visibility, expand user base, and create more productive internal workflows. The API supports filtering table rows by appending specific attributes, the filter operator, and the filter value to the request. Filters can exclude table rows using simple operators that compare against specified key values. Applying filters to the request allows for more efficient, faster API responses because unneeded data is withheld by the server prior to API return. This is particularly useful when users are interested in a subset of data from the USWTDB. See additional USWTDB API filter operations here https://eerscmap.usgs.gov/uswtdb/api-doc/#operators.

In the first example, we'll make a customized http request to the API and return turbines that (1) have a capacity greater than 0 kW to exclude any zero or null capacity values and (2) limit the turbine attributes in the response to case_id (unique ID), t_manu (manufacturer), t_cap (capacity), xlong (longitude), and ylat (latitude). This is done by appending URL parameters ?&t_cap=gt.0&select=case_id,t_manu,t_cap,xlong,ylat to the root level API https://eersc.usgs.gov/api/uswtdb/v1/turbines/. Once a successful request is made, we'll parse the JSON response and preview the first 5 records of the pandas.DataFrame.

Note: There are many more attributes related to the USWTDB that can be leveraged in the API request. Feel free to experiment with the URL parameters to build your own custom maps using the USWTDB.

In [36]:
# Call the USWTDB API and apply custom URL parameters to the request. Parameters allow us to filter the data return.
data_url = "https://eersc.usgs.gov/api/uswtdb/v1/turbines?&t_cap=gt.0&select=case_id,t_manu,t_cap,xlong,ylat"

# Parse the JSON response from the API return and populate the dataframe
dfClusterMap = pd.read_json(data_url)

# Preview the first five records of our dataframe based on the custom URL paramters in the API request
dfClusterMap.head(5)
Out[36]:
case_id t_manu t_cap xlong ylat
0 3009410 Vestas 95 -118.36809 35.07589
1 3072663 Vestas 95 -118.36820 35.07570
2 3072662 Vestas 95 -118.36839 35.07563
3 3072695 Vestas 95 -118.36441 35.07744
4 3073403 Vestas 95 -118.35222 35.08899

Step 3. Create a GeoJSON Object from the Dataframe

The mapboxgl package supports both vector tile sources and the GeoJSON format for rendering map visualizations. GeoJSON is a common, open standard, geospatial data interchange format based on JSON. It's designed for representing geographical features, along with their non-spatial attributes and spatial extents. Learn more about the GeoJSON format.

The conversion from our pandas.dataframe to GeoJSON is handled by df_to_geojson *. There are a variety of parameters we can pass to the function, but for the scope of this example, we'll stick to (1) passing the dataframe columns (attributes) to be passed to our GeoJSON object, (2) define the precision of the turbine latitude/longitude values, and (3) map the names of the dataframe columns to the required latitude and longitude parameters of the function.

* There are a variety of other geospatial entensions like `GeoPandas` to make working with geospatial data in Python easier. GeoPandas extends the datatypes used by pandas to allow spatial operations on geometric types. Learn more about the `GeoPandas` project https://geopandas.org/index.html.

In [37]:
# Create GeoJSON object with selected attributes. Define coordinates to three decimal places. Map required lat lon.
turbineClusterGeoJson = df_to_geojson(
    dfClusterMap,
    properties=["case_id", "t_cap", "t_manu"],
    precision=3,
    lat="ylat",
    lon="xlong",
)

Step 4. Build the Cluster Map Visualization of Turbine Locations in the Conterminous U.S.

To create the turbine cluster map, we'll create color 'stops' (or cutoffs) based on the density of the turbine locations (proximity to one another). For our cluster map, we'll apply a 6-step diverging color ramp from ColorBrewer (diverging color schemes highlight the largest and smallest ranges) and create stops for proximity bins with counts of 10, 50, 100, 500, 1000, 5000. Next, we'll define the sizes of our cluster makers. Finally, we'll call ClusteredCircleViz and apply our color ramp along with some custom parameters for our visualization. In the code cell below, we provide a brief explanation for each custom parameter used for rendering our cluster map. An exhaustive list of parameters can be found in the `mapboxgl-jupyter` documentation.

In [44]:
# Define our color stops based on a 6-step divergent color ramp
turbine_color_stops = create_color_stops(
    [10, 50, 100, 500, 1000, 5000], colors="Spectral"
)

# Define the radius (sizes) of the cluster markers
turbine_radius_stops = [[1, 5], [10, 10], [1000, 15], [5000, 20]]

# Define the parameters for our cluster map
# Call our NGMDB style as basemap, set the max zoom level for clusters to show
# Set cluster label sizze and cluster symbol opacity
# Handle initial zoom/center of visualization
turbineClustersMap = ClusteredCircleViz(
    turbineClusterGeoJson,
    color_stops=turbine_color_stops,
    radius_stops=turbine_radius_stops,
    style=ngmdbLight,
    cluster_maxzoom=10,
    label_size=10,
    opacity=0.6,
    center=(-95, 40),
    zoom=3.25,
)

# Render the cluster map visualization
turbineClustersMap.show()

Example 2 - Create a Graduated Symbol Map of Wind Turbines - San Fransisco, California

Graduated symbols show a quantitative difference between mapped elements by varying the size of the map markers. Attribute values are classified into ranges that are assigned a symbol size representing that range. Symbol size is an effective way to represent differences in magnitude of a selected attribute, because larger markers are naturally associated with a greater amount of something. Using graduated symbols gives you granularity over the size of each symbol in their respective bin ranges, and unlike proportional symbols, are not scaled directly to the absolute min and max of the attribute values.

In this example, we'll call the USWTDB API with some advanced parameters to fine tune the data returned to the dataframe. We'll then render a graduated symbol map of a windfarm north of San Fransisco with symbols sizes based on turbine capacity values (in kW). We'll also apply a color scheme to our graduated symbol map based on turbine height (in m), effectively visualizing the relationship between turbine capacity and turbine height.

Before we build the map visualization, we'll preview USWTDB data for California in some simple plots using the pandas.plotting module. The plots will help us visualize relationships between a variety of turbine attributes like installation year, capacity, and total height.

Step 1. Call the USWTDB API with New Parameters and Preview the Data in Plots

In this example, we'll make another customized http request to the API and return turbines that (1) are only located in California (2) have t_cap (capacity) values that are not null and (3) limit the turbine attributes in the response to t_cap (which we will cast as "Capacity"). This is done by appending URL parameters ?&t_state=eq.CA&t_cap=not.is.null&select=Capacity:t_cap to the root level API https://eersc.usgs.gov/api/uswtdb/v1/turbines/. Once a successful request is made, we'll parse the JSON response and generate a histogram showing the frequency of turbine capacities for wind turbines in California.

Note: There are other operators related to the USWTDB that can be leveraged in the API request. Feel free to experiment with other URL operators to build your own custom plots using the USWTDB.

In [39]:
# Call the USWTDB API and apply custom URL parameters to the request
caCapHist_url = 'https://eersc.usgs.gov/api/uswtdb/v1/turbines?&t_state=eq.CA&t_cap=not.is.null&select=Capacity:t_cap'

# Parse the JSON response from the API return and populate the dataframe
capHist = pd.read_json(caCapHist_url)

#Display the number of turbines in our API return
display(capHist.count())

#Preview the first 5 records of the return. Data should only include the single attribute "Capacity" as defined by our API request
display(capHist.head(5))

# Generate a histogram showing frequencies of wind turbine capacities. Include number of bins, and size of the plot
capHist.plot.hist(bins=10,
                  figsize=(20,5))
Capacity    4968
dtype: int64
Capacity
0 95
1 95
2 95
3 95
4 95
Out[39]:
<AxesSubplot:ylabel='Frequency'>

From our output, we see that the number of turbines returned from this API request is 5,528 (first output line). Based on our histogram with a bin level of bins=10, we see that the majority of turbines in California have capacities of less than 500 kW. The next largest frequency appears to be turbines with a capacity between 1500-1800 kW. We can increase the number of bins and rerun the cell to further refine capacity ranges in the histogram return. Try running the cell with a bin level of bins=40.

Next, let's generate a scatter plot to help us visualize relationships between turbine installation year, turbine capacity, and turbine total height. We'll make another customized request to the USWTDB API for turbines that (1) are located in California (2) have t_cap (capacity) and t_ttlh values that are not null and (3) limit the turbine attributes in the response to p_year (year the turbine project was completed), t_manu (turbine manufacturer), p_name (name of the turbine project), t_ttlh (which we will cast as "Height"), and t_cap (which we will cast as "Capacity"). We will also add xlong (longitude) and ylat (latitude) so we can use the request to generate our graduated symbol map later. All this is done by appending URL parameters ?&t_state=eq.CA&t_cap=not.is.null&t_ttlh=not.is.null&select=p_year,t_manu,p_name,Capacity:t_cap,Height:t_ttlh,xlong,ylat' to the root level API https://eersc.usgs.gov/api/uswtdb/v1/turbines/.

In [40]:
# Call the USWTDB API and apply custom URL parameters to the request.
caTurbines_url = 'https://eersc.usgs.gov/api/uswtdb/v1/turbines?&t_state=eq.CA&t_cap=not.is.null&t_ttlh=not.is.null&select=p_year,t_manu,p_name,Capacity:t_cap,Height:t_ttlh,xlong,ylat'

# Parse the JSON response from the API return and populate the dataframe
caTurbines = pd.read_json(caTurbines_url)

# Display the number of turbines in our API return
display(caTurbines.count())

# Preview the first 5 records of the return. Data should only include the attributes defined by our API request
display(caTurbines.head(5))

# Generate a scatter plot with x-axis=year, y-axis=capacity, colorized (c) by height using 'viridis' matplotlib colormap
caTurbines.plot.scatter(x='p_year',
                         y='Capacity',
                         c='Height',
                         colormap='viridis',
                         figsize=(20,5),
                         sharex=False)
p_year      4099
t_manu      4099
p_name      4099
Capacity    4099
Height      4099
xlong       4099
ylat        4099
dtype: int64
p_year t_manu p_name Capacity Height xlong ylat
0 1995 Vestas Alta Mesa 225 53.5 -116.66619 33.94590
1 1995 Vestas Alta Mesa 225 53.5 -116.65779 33.93980
2 1995 Vestas Alta Mesa 225 53.5 -116.65701 33.93803
3 2008 Vestas Alite Wind Farm 3000 125.0 -118.33849 35.03659
4 2008 Vestas Alite Wind Farm 3000 125.0 -118.34409 35.03590
Out[40]:
<AxesSubplot:xlabel='p_year', ylabel='Capacity'>

From our output, we see that the number of turbines returned based on our new API request is 4,099. Based on our scatter plot, we see there is a general trend of increasing capacity and height over time for wind turbines in California. If we wanted to see if this general trend was the same at a national level, we would simply remove the parameter &t_state=eq.CA from our API request and re-run the cell. This is a simple example highlighting the efficiency of data delivery through an API. We only request the data we need for our analysis by modifying and/or appending simple URL parameters to the root level API endpoint. We also stay in sync with the latest version of the data because we're pulling from the source, not a static flat-file that may be out of date.

Step 2. Create a GeoJSON Object from the Dataframe

Just like before we'll create a GeoJSON object from the dataframe we just defined. We'll start by (1) passing the dataframe columns (attributes) to be added to our GeoJSON object, (2) define the precision of the turbine latitude/longitude values, and (3) map the names of the dataframe columns to the required latitude and longitude parameters of the function*.

*As seen in examples above, we could simply cast lat:ylat and lon:xlong in the API request to avoid having to map lat='ylat', lon='xlong' in df_to_geojson.

In [41]:
# Create GeoJSON object from our 'caTurbines' dataframe
turbineGradSymGeoJson = df_to_geojson(caTurbines,
                          properties=['p_name','Capacity','t_manu', 'Height'], 
                          precision=3,lat='ylat', lon='xlong')

Step 3. Build the Graduated Symbol Map Using Multiple Turbine Attributes

Let's say we wanted to show a graduated symbol map that had marker symbols with sizes based on turbine capacity, and a color ramp represesnting turbine height (where hotter colors depicted increased turbine height). To create the map, we'll first create color 'stops' (or cutoffs) based on the turbine height ranges in the database. In this example, we'll hard code in RGB values (warm to hot) at each of our five defined stop. Next, we'll define the sizes of our markers based on turbine capacity. Again, we'll define five stops representing a range of turbine capacities. Finally, we'll call GraduatedCircleViz and apply our color and radius bins, along with some custom parameters for our visualization. In the code cell below, we provide a brief explanation for each custom parameter used for rendering our cluster map. An exhaustive list of parameters can be found in the `mapboxgl-jupyter` documentation.

In [45]:
# Assign color breaks based on turbine height ranges (in m)
tubine_height_color_bins= [[25, 'rgb(43,131,186)'],  
            [50, 'rgb(171,221,164)'], 
            [100, 'rgb(255,255,191)'], 
            [120, 'rgb(253,174,97)'], 
            [180, 'rgb(215,25,28)']]

# Assign marker radius size based on turbine capacity ranges (in kW)
turbine_radius_bins = [[0, 0],
                       [1000, 3],
                       [2000, 6],
                       [3000, 9],
                       [4000, 12]]

# Define the parameters for our graduated symbol map 
# Call our NGMDB style as basemap, apply our color and radius stops 
# Set cluster symbol opacity and stroke, add scalebar and scalebar styles
# Handle initial zoom/center of visualization
turbineGradSymbolMap = GraduatedCircleViz(turbineGradSymGeoJson, 
                style= ngmdbBasemap,
                color_property='Height',
                color_function_type='interpolate',
                color_stops=tubine_height_color_bins,
                radius_property='Capacity',
                radius_stops=turbine_radius_bins, 
                radius_function_type='interpolate', 
                radius_default=1,
                opacity=0.75,                          
                stroke_color='black',
                stroke_width=0.15,
                scale=True,
                scale_unit_system='imperial',
                scale_background_color='#0000ff00',
                center=(-121.8, 38.13),
                zoom=10.75)

# Generate map labels of turbine project names and adjust label properties
turbineGradSymbolMap.label_property = "p_name"
turbineGradSymbolMap.label_size = 5

#Render the map
turbineGradSymbolMap.show() 

Step 4. Export the Notebook Maps as Standalone Web Maps

You may want to view/share the maps generated in your notebook as standalone web maps. Standalone web maps can be displayed on web and mobile devices without the need for notebook dependencies. They look exactly like the inline maps in your notebook and carry with them all the interactivity and control parameters defined in your code. The web map will include your data packaged in the HTML file. You can generate a standalone web map from mapboxgl by calling create_html() with standard Python protocol. The standalone web maps will be written to your Jupyter notebook home directory.

In [43]:
# Generate a standalone web map of the USWTDB cluster map 
with open('uswtdbClusterMap.html', 'w') as f:
    f.write(turbineClustersMap.create_html())

# Generate a standalone web map of the California USWTDB graduated symbol map    
with open('uswtdbGradSymbolMap.html', 'w') as f:
    f.write(turbineGradSymbolMap.create_html())

USWTDB Attribution and Disclaimer

The creation of the USWTDB was jointly funded by the U.S. Department of Energy (DOE) Wind Energy Technologies Office (WETO) via the Lawrence Berkeley National Laboratory (LBNL) Electricity Markets and Policy Group, the U.S. Geological Survey (USGS) Energy Resources Program, and the American Wind Energy Association (AWEA). The database is being continuously updated through collaboration among LBNL, USGS, and AWEA. Wind turbine records are collected and compiled from various public and private sources, digitized or position-verified from aerial imagery, and quality checked. Technical specifications for turbines are obtained directly from project developers and turbine manufacturers, or they are based on data obtained from public sources.

Map services and data downloaded from the U.S. Wind Turbine Database are free and in the public domain. There are no restrictions; however, we request that the following acknowledgment statement be included in products and data derived from our map services when citing, copying, or reprinting: "Map services and data are available from U.S. Wind Turbine Database, provided by the U.S. Geological Survey, American Wind Energy Association, and Lawrence Berkeley National Laboratory via https://eerscmap.usgs.gov/uswtdb"

Although this digital spatial database has been subjected to rigorous review and is substantially complete, it is released on the condition that neither the USGS, LBNL, AWEA nor the United States Government nor any agency thereof, nor any employees thereof, makes any warranty, expressed or implied, or assumes any legal liability or responsibility for the accuracy, completeness, or usefulness of any information contained within the database.