[1-3] Prepare and Upload Your Model


Screenshot


The run method

Prior to uploading the model to Greenhub.ai, it must be packaged in a standardized way so that the Greenhub.ai system understands how to run the model.

To give the modle creator as much freedom as possible in how their model is structured and executed in detail, we actually only expect one specific file with one function so that our system knows how to access the model: a run(year=int, month=int) method in a file named run.py.

This method is called by our system in every month when the model is to be executed. For example, if it is an 'End of Season' model with the 'Prediction Month' set to October, the model will be called every year in October. If the model is to be run for the year 2020, the run method will be called as follows: run(year=2020, month=10).

If the model is configured as an 'In Season' model, it could be called multiple times a year defined by the 'Model Execution Months' range. For example, if the prediction month is set to October and the 'Model Execution Months' range is set from June to August, the model will be called three times during the year, from June to August, but not in October. In 2020, the run method will be called a total of five times as follows:

  • in june: run(year=2020, month=6)
  • in july: run(year=2020, month=7)
  • in august: run(year=2020, month=8)

A more detailed, visual explanation of the difference between 'In Season' and 'End of Season' can be found here.

Therefore, the run method in the run.py file must be configured to return predictions specific to the passed year and month parameters. It is important to note that if the model is later set to run for the past during the model upload, for example, when executing run(year=2008, month=8), it should only use the data that was available at that time, ensuring the simulation of past predictions is as accurate and reliable as possible.


In the run method itself, the model creator has full flexibility. The Greenhub.ai Python SDK can be used to fetch various feature data for the model invocation, an ONNX file can be called, additional data from another CSV file can be imported and utilized, and more. The only restrictions are that no internet access is allowed during the execution of the run method, except for the use of the Greenhub SDK package.


Now it is clear how the run method should look like, what parameters it expects, and how much flexibility it provides to execute the model. But what should the run method return at the end? This is actually quite simple. The predictions generated for the specific year and month should be returned in a pandas dataframe. The structure of the dataframe depends on the model's 'spatial resolution':

Spatial Resolution: Country

In this case, only a single prediction is expected, so the pandas dataframe should be returned in the following format.

Example:

country predictedYield
US 18.43


Spatial Resolution: State

For each state where a prediction was generated, a row should be returned, including the state name and the prediction.

Example:

state predictedYield
MISSOURI 5.43
TENNESSEE 4.67
WEST VIRGINIA 7.12


Spatial Resolution: Municipality

Here, we require not only the prediction and state columns but also a separate column containing the name of the municipality.

Example:

state municipality predictedYield
MISSOURI BELTON 2.35
MISSOURI HANNIBAL 4.92
WEST VIRGINIA PARKERSBURG 5.22


Note: we really only need the pandas dataframe to be returned, as we already have all other necessary information from the model creation information form from step 0.


The following is a summary of how a run method should be structured and an example of how it might be executed.

import greenhub as gh
import pandas as pd

def run(year: int, month: int):

    # Initialize GreenHub SDK
    # TODO: You can find your API key in your Account dialog at the top right of the GreenHub page
    gh.initialize("YOUR_API_KEY")

    # Fetch and setup feature vector
    features = ...  # TODO

    # Load model
    model = ...  # TODO

    # Run model
    prediction = ...  # TODO

    # Format to expected GreenHub output
    output = ...  # TODO

    return output

Tip: A boilerplate template that already provides a possible basic structure for the run() method can be downloaded from our 'Model Creation' page in step 1 under 'Download Template.'"



Package the model

Once the run method in the run.py file has been created and tested, everything that the run method needs (csv files, onnx files, other python files, etc.), including the run.py file itself, should be zipped together for upload to Greenhub.ai. It’s important to note that you should zip all the files at onces, not the folder containing them. After zipping you can now select the zip file for upload inn Step 3 on the 'Model Creation' page.

Note: if you want to be certain that your run method is in the correct format before uploading the model, feel free to use the greenhub command-line tool to test it in advance.



Important points to keep in mind

Here's a quick summary of the most important points to keep in mind: - A run method must exist, which expects the parameters year=int and month=int, and is located in a run.py file. - The year parameter indicates the year in which the model is to be executed, while the month parameter specifies the current month for which the model generates its predictions (the execution timing depends on whether it’s an 'In Season' or 'End of Season' model). - Everything invoked or loaded within the run method must be included in the zip file for upload. - Internet access is not allowed during the execution of the run method, except for the use of the Greenhub.ai SDK. - The run method is expected to return a pandas dataframe containing all predictions. Besides predictions, a model with spatial resolution "state" requires an additional column with the state of the predictions, while a model with spatial resolution "municipality" requires a state column and an additional column for the "municipality".