Data Access

A primary goal of the Greenhub SDK is to provide easy access to the collected and aggregated data on Greenhub.ai. The straightforward idea for data access is that each aggregated data source can be requested with a dedicated function. Using the parameters of each function, you can further specify which data you are interested in. This allows you to select, among other things, the country, spatial resolution, or year.

In the following sections, each individual data source will be described, and the corresponding function will be explained, including all parameters that can be used.

Important: Before using the following functionalities, the Greenhub pip package must be installed, and the initialization must be successfully completed.

Vegetation Index data

data = gh.get_vi_data(
    country: str,
    start_year: int,
    end_year: int,                // optional
    spatial_resolution: str       // optional
)

This function returns all available VI feature data on greenhub.ai for a specific country (e.g., US, BR) and year (start_year). By default, only VI data aggregated at the country level is returned, but with the spatial_resolution parameter, one can choose between 'country', 'state', and 'municipality'. Additionally, multiple years can be requested at once by specifying a time period by defining start_year and end_year. A pandas dataframe containing the loaded and concatenated data is returned.

All parameter:

country: country for which data is loaded, represented by a two-letter country code (e.g. US, AR, BR, DE)
start_year: year for which data is loaded; if end_year is also defined, then start_year is the first year of the time interval for which data is loaded
end_year: if defined, end_year is the last year of the time interval, defined by start_year and end_year, for which data is loaded
spatial_resolution: spatial resolution on which the data is aggregated; set to 'country' by default; can be set to 'country', 'state', or 'municipality'

Climate data

data = gh.get_climate_data(
    country: str,
    start_year: int,
    end_year: int,                // optional
    spatial_resolution: str,      // optional
    time_resolution: str          // optional  
)

This function returns all available climate feature data on greenhub.ai for a specific country (e.g., US, BR) and year (start_year). By default, the climate data is aggregated at the country level and on a monthly basis, but with the spatial_resolution parameter, one can choose between 'country', 'state', and 'municipality' and the time_resolution parameter can be set to 'monthly', 'weekly', or 'daily'. Additionally, multiple years can be requested at once by specifying a time period by defining start_year and end_year. A pandas dataframe containing the loaded and concatenated data is returned.

All parameters:

country: country for which data is loaded, represented by a two-letter country code (e.g. US, AR, BR, DE)
start_year: year for which data is loaded; if end_year is also defined, then start_year is the first year of the time interval for which data is loaded
end_year: if defined, end_year is the last year of the time interval, defined by start_year and end_year, for which data is loaded
spatial_resolution: spatial resolution on which the data is aggregated; set to 'country' by default; can be set to 'country', 'state', or 'municipality'
time_resolution: time resolution on which the data is aggregated; set to 'monthly' by default; can be set to 'monthly', 'weekly', or 'daily'

180d Forecast data

data = gh.get_180d_forecast_data(
    country: str,
    start_year: int,
    start_month: int,
    spatial_resolution: str,     // optional
    time_aggregation: str        // optional
)

This function returns the 180 days climate forecast data on greenhub.ai for a specific country (e.g., US, BR), year (start_year) and month (start_month). The prediction was made for the next 180 days starting from the selected start_year and start_month. For example, if one request data for January 2023, one will receive forecast data from month 1 (January) to month 6 (June), covering the next 180 days (6 months).

By default, the forecast data is aggregated at country level and on a monthly basis, but with the spatial_resolution parameter, one can choose between 'country', 'state', and 'municipality' and the time_aggregation parameter can be set to 'monthly', 'weekly', or 'daily'. A pandas dataframe containing the loaded and concatenated data is returned.

All parameters:

country: country for which data is loaded, represented by a two-letter country code (e.g. US, AR, BR, DE)
start_year: start_year and start_month determine the point in time at which the 180-day prediction was made.
start_month: start_year and start_month determine the point in time at which the 180-day prediction was made.
spatial_resolution: spatial resolution on which the data is aggregated; set to 'country' by default; can be set to 'country', 'state', or 'municipality'
time_aggregation: time resolution on which the data is aggregated; set to 'monthly' by default; can be set to 'monthly', 'weekly', or 'daily'

Soil data

data = gh.get_soil_data(
    country: str,
    spatial_resolution: str,     // optional
    layer: str                   // optional
)

This function returns all available soil feature data on greenhub.ai for a specific country (e.g., US, BR) and year (start_year). By default, only data aggregated at the country level is returned, but with the spatial_resolution parameter, one can choose between 'country', 'state', and 'municipality'. Further, all 7 layers are returned by default. If one is only interested in one single layer, one can set the layer parameter to the specific layer, like for example 'D5'. The 7 layers 'D1' through 'D7' correspond with depth layers of 20 cm increments up until 100 cm ('D1': 0-20 cm, 'D2': 20-40 cm, ..., 'D5': 80-100 cm), followed by layers of 50 cm increments until 200 cm ('D6': 100-150 cm, 'D7': 150-200 cm). A pandas dataframe containing the loaded and concatenated data is returned.

All parameters:

country: country for which data is loaded, represented by a two-letter country code (e.g. US, AR, BR, DE)
spatial_resolution: spatial resolution on which the data is aggregated; set to 'country' by default; can be set to 'country', 'state', or 'municipality'
layer: layer for which data is loaded; can be set to 'D1' up to 'D7'; set to None which loads all layers

Historical Yield data

data = gh.get_historical_yield_data(
    crop: str,
    year: int,
    country: str,
    spatial_resolution: str    // optional
)

This function returns all available historical yield data on greenhub.ai for a specific crop (e.g.Soybeans, Wheat-Winter, Wheat), year (since 2000) and country (e.g., US, BR). By default, only historical yield data at the country level is returned, but with the spatial_resolution parameter, one can choose between 'country', 'state', and 'municipality'. A pandas dataframe containing the loaded and concatenated data is returned.

All parameters:

crop: crop for which data is loaded
year: year for which data is loaded
country: country for which data is loaded, represented by a two-letter country code (e.g. US, AR, BR, DE)
spatial_resolution: spatial resolution on which the data is aggregated; set to 'country' by default; can be set to 'country', 'state', or 'municipality'

Data Sources and Formats

The data sources return frames with the following structure:

VI

Column	Description
FPAR	Fraction of absorbed photosynthetically active radiation values
EVI	Enhanced Vegetation Index values
NDVI	Normalized Difference Vegetation Index values
Municipality	Municipality
State	State
CountryCode	ISO 3166-1 alpha-2 country code
Year	Year
Month	Month

Climate

Column	Description
Municpality	Municipality
State	State
CountryCode	ISO 3166-1 alpha-2 country code
Spatial Resolution	Municipality, State or Country
Min Temperature [K]	Minimum temperature in K
Max Temperature [K]	Maximum temperature in K
Mean Temperature [K]	Mean temperature in K
Total Precipitation [mm]	Total precipitation in mm
Year	Year
Month	Month
Day	Day

Forecast

Column	Description
time	Timestamp
temperature_max	Forecast maximum temperature in K
temperature_min	Forecast minimum temperature in K
temperature_mean	Forecast mean temperature in K
precipitation_total	Forecast total precipitation in mm
country	Country Name
state	State
municipality	Municipality

Soil

Column	Description
Drain	Dominant FAO soil drainage class
TopDep	Depth of the top of the layer (cm)
BotDep	Depth of the bottom of the layer (cm)
CFRAG	Coarse fragments (vol. % > 2mm), mean
SDTO	Sand (mass %), mean
STPC	Silt (mass %)
CLPC	Clay (mass %)
BULK	Bulk density (kg dm³, g cm³)
TAWC	Available water capacity (cm m³, -33 to 1500 kPa)
CECS	Effective CEC (cmolc kg⁻¹ of fine earth fraction)
BSAT	Base saturation as a percentage of CEC soil
ESP	Exchangeable sodium percentage
CECc	CECclay, corrected for contribution of organic matter (cmolc kg⁻¹)
PHAQ	pH measured in water
TCEQ	Total carbonate equivalent (g C kg⁻¹)
GYPS	Gypsum content (g kg⁻¹)
ELCO	Electrical conductivity (dS m⁻¹)
ORGC	Organic carbon content (g kg⁻¹)
TOTN	Total nitrogen (g kg⁻¹)
CNrt	C/N ratio
ECEC	Effective CEC (cmolc kg⁻¹)
ALSA	Aluminum saturation (as % of ECEC)
TP-FAO90-soil-unit	Aggregated proportion of FAO90-soil-unit in map unit

Historical

Column	Description
Year	Year
Crop	Crop
SpatialResolution	Municipality, State or Country
Country	ISO 3166-1 alpha-2 country code
State	State
Municipality	Municipality
Value	Yield in t/ha

References

Data Category	Source
VI	https://developers.google.com/earth-engine/datasets/catalog/MODIS_061_MOD13A2#description // https://developers.google.com/earth-engine/datasets/catalog/MODIS_061_MCD15A3H#code-editor-javascript
Climate	https://cds.climate.copernicus.eu/datasets/reanalysis-era5-single-levels?tab=overview
Forecast	https://cds.climate.copernicus.eu/datasets/seasonal-original-single-levels?tab=overview
Soil	https://data.isric.org/geonetwork/srv/api/records/dc7b283a-8f19-45e1-aaed-e9bd515119bc
Historical Yield	respective sources for each country