Data Access


A primary goal of the Greenhub SDK is to provide easy access to the collected and aggregated data on Greenhub.ai. The straightforward idea for data access is that each aggregated data source can be requested with a dedicated function. Using the parameters of each function, you can further specify which data you are interested in. This allows you to select, among other things, the country, spatial resolution, or year.

In the following sections, each individual data source will be described, and the corresponding function will be explained, including all parameters that can be used.

Important: Before using the following functionalities, the Greenhub pip package must be installed, and the initialization must be successfully completed.


Vegetation Index data

data = gh.get_vi_data(
    country: str,
    start_year: int,
    end_year: int,                // optional
    spatial_resolution: str       // optional
)

This function returns all available VI feature data on greenhub.ai for a specific country (e.g., US, BR) and year (start_year). By default, only VI data aggregated at the country level is returned, but with the spatial_resolution parameter, one can choose between 'country', 'state', and 'municipality'. Additionally, multiple years can be requested at once by specifying a time period by defining start_year and end_year. A pandas dataframe containing the loaded and concatenated data is returned.

All parameter:

  • country: country for which data is loaded, represented by a two-letter country code (e.g. US, AR, BR, DE)
  • start_year: year for which data is loaded; if end_year is also defined, then start_year is the first year of the time interval for which data is loaded
  • end_year: if defined, end_year is the last year of the time interval, defined by start_year and end_year, for which data is loaded
  • spatial_resolution: spatial resolution on which the data is aggregated; set to 'country' by default; can be set to 'country', 'state', or 'municipality'


Climate data

data = gh.get_climate_data(
    country: str,
    start_year: int,
    end_year: int,                // optional
    spatial_resolution: str,      // optional
    time_resolution: str          // optional  
)

This function returns all available climate feature data on greenhub.ai for a specific country (e.g., US, BR) and year (start_year). By default, the climate data is aggregated at the country level and on a monthly basis, but with the spatial_resolution parameter, one can choose between 'country', 'state', and 'municipality' and the time_resolution parameter can be set to 'monthly', 'weekly', or 'daily'. Additionally, multiple years can be requested at once by specifying a time period by defining start_year and end_year. A pandas dataframe containing the loaded and concatenated data is returned.

All parameters:

  • country: country for which data is loaded, represented by a two-letter country code (e.g. US, AR, BR, DE)
  • start_year: year for which data is loaded; if end_year is also defined, then start_year is the first year of the time interval for which data is loaded
  • end_year: if defined, end_year is the last year of the time interval, defined by start_year and end_year, for which data is loaded
  • spatial_resolution: spatial resolution on which the data is aggregated; set to 'country' by default; can be set to 'country', 'state', or 'municipality'
  • time_resolution: time resolution on which the data is aggregated; set to 'monthly' by default; can be set to 'monthly', 'weekly', or 'daily'


180d Forecast data

data = gh.get_180d_forecast_data(
    country: str,
    start_year: int,
    start_month: int,
    spatial_resolution: str,     // optional
    time_aggregation: str        // optional
)

This function returns the 180 days climate forecast data on greenhub.ai for a specific country (e.g., US, BR), year (start_year) and month (start_month). The prediction was made for the next 180 days starting from the selected start_year and start_month. For example, if one request data for January 2023, one will receive forecast data from month 1 (January) to month 6 (June), covering the next 180 days (6 months).

By default, the forecast data is aggregated at country level and on a monthly basis, but with the spatial_resolution parameter, one can choose between 'country', 'state', and 'municipality' and the time_aggregation parameter can be set to 'monthly', 'weekly', or 'daily'. A pandas dataframe containing the loaded and concatenated data is returned.

All parameters:

  • country: country for which data is loaded, represented by a two-letter country code (e.g. US, AR, BR, DE)
  • start_year: start_year and start_month determine the point in time at which the 180-day prediction was made.
  • start_month: start_year and start_month determine the point in time at which the 180-day prediction was made.
  • spatial_resolution: spatial resolution on which the data is aggregated; set to 'country' by default; can be set to 'country', 'state', or 'municipality'
  • time_aggregation: time resolution on which the data is aggregated; set to 'monthly' by default; can be set to 'monthly', 'weekly', or 'daily'


Soil data

data = gh.get_soil_data(
    country: str,
    spatial_resolution: str,     // optional
    layer: str                   // optional
)

This function returns all available soil feature data on greenhub.ai for a specific country (e.g., US, BR) and year (start_year). By default, only data aggregated at the country level is returned, but with the spatial_resolution parameter, one can choose between 'country', 'state', and 'municipality'. Further, all 7 layers are returned by default. If one is only interested in one single layer, one can set the layer parameter to the specific layer, like for example 'D5'. The 7 layers 'D1' through 'D7' correspond with depth layers of 20 cm increments up until 100 cm ('D1': 0-20 cm, 'D2': 20-40 cm, ..., 'D5': 80-100 cm), followed by layers of 50 cm increments until 200 cm ('D6': 100-150 cm, 'D7': 150-200 cm). A pandas dataframe containing the loaded and concatenated data is returned.

All parameters:

  • country: country for which data is loaded, represented by a two-letter country code (e.g. US, AR, BR, DE)
  • spatial_resolution: spatial resolution on which the data is aggregated; set to 'country' by default; can be set to 'country', 'state', or 'municipality'
  • layer: layer for which data is loaded; can be set to 'D1' up to 'D7'; set to None which loads all layers


Historical Yield data

data = gh.get_historical_yield_data(
    crop: str,
    year: int,
    country: str,
    spatial_resolution: str    // optional
)

This function returns all available historical yield data on greenhub.ai for a specific crop (e.g.Soybeans, Wheat-Winter, Wheat), year (since 2000) and country (e.g., US, BR). By default, only historical yield data at the country level is returned, but with the spatial_resolution parameter, one can choose between 'country', 'state', and 'municipality'. A pandas dataframe containing the loaded and concatenated data is returned.

All parameters:

  • crop: crop for which data is loaded
  • year: year for which data is loaded
  • country: country for which data is loaded, represented by a two-letter country code (e.g. US, AR, BR, DE)
  • spatial_resolution: spatial resolution on which the data is aggregated; set to 'country' by default; can be set to 'country', 'state', or 'municipality'

Data Sources and Formats

The data sources return frames with the following structure:

VI

Column Description
FPAR Fraction of absorbed photosynthetically active radiation values
EVI Enhanced Vegetation Index values
NDVI Normalized Difference Vegetation Index values
Municipality Municipality
State State
CountryCode ISO 3166-1 alpha-2 country code
Year Year
Month Month

Climate

Column Description
Municpality Municipality
State State
CountryCode ISO 3166-1 alpha-2 country code
Spatial Resolution Municipality, State or Country
Min Temperature [K] Minimum temperature in K
Max Temperature [K] Maximum temperature in K
Mean Temperature [K] Mean temperature in K
Total Precipitation [mm] Total precipitation in mm
Year Year
Month Month
Day Day

Forecast

Column Description
time Timestamp
temperature_max Forecast maximum temperature in K
temperature_min Forecast minimum temperature in K
temperature_mean Forecast mean temperature in K
precipitation_total Forecast total precipitation in mm
country Country Name
state State
municipality Municipality

Soil

Column Description
Drain Dominant FAO soil drainage class
TopDep Depth of the top of the layer (cm)
BotDep Depth of the bottom of the layer (cm)
CFRAG Coarse fragments (vol. % > 2mm), mean
SDTO Sand (mass %), mean
STPC Silt (mass %)
CLPC Clay (mass %)
BULK Bulk density (kg dm³, g cm³)
TAWC Available water capacity (cm m³, -33 to 1500 kPa)
CECS Effective CEC (cmolc kg⁻¹ of fine earth fraction)
BSAT Base saturation as a percentage of CEC soil
ESP Exchangeable sodium percentage
CECc CECclay, corrected for contribution of organic matter (cmolc kg⁻¹)
PHAQ pH measured in water
TCEQ Total carbonate equivalent (g C kg⁻¹)
GYPS Gypsum content (g kg⁻¹)
ELCO Electrical conductivity (dS m⁻¹)
ORGC Organic carbon content (g kg⁻¹)
TOTN Total nitrogen (g kg⁻¹)
CNrt C/N ratio
ECEC Effective CEC (cmolc kg⁻¹)
ALSA Aluminum saturation (as % of ECEC)
TP-FAO90-soil-unit Aggregated proportion of FAO90-soil-unit in map unit

Historical

Column Description
Year Year
Crop Crop
SpatialResolution Municipality, State or Country
Country ISO 3166-1 alpha-2 country code
State State
Municipality Municipality
Value Yield in t/ha

References

Data Category Source
VI https://developers.google.com/earth-engine/datasets/catalog/MODIS_061_MOD13A2#description // https://developers.google.com/earth-engine/datasets/catalog/MODIS_061_MCD15A3H#code-editor-javascript
Climate https://cds.climate.copernicus.eu/datasets/reanalysis-era5-single-levels?tab=overview
Forecast https://cds.climate.copernicus.eu/datasets/seasonal-original-single-levels?tab=overview
Soil https://data.isric.org/geonetwork/srv/api/records/dc7b283a-8f19-45e1-aaed-e9bd515119bc
Historical Yield respective sources for each country