[1-3] Prepare and Upload Your Model
The run
method
Prior to uploading the model to Greenhub.ai, it must be packaged in a standardized way so that the Greenhub.ai system understands how to run the model.
To give the modle creator as much freedom as possible in how their model is structured and executed in detail,
we actually only expect one specific file with one function so that our system knows how to access the model:
a run(year=int, month=int)
method in a file named run.py
.
This method is called by our system in every month when the model is to be executed. For example,
if it is an 'End of Season' model with the 'Prediction Month' set to October, the model will be called every year in
October. If the model is to be run for the year 2020, the run method will be called as follows:
run(year=2020, month=10)
.
If the model is configured as an 'In Season' model, it could be called multiple times a year defined by the 'Model Execution Months' range. For example, if the prediction month is set to October and the 'Model Execution Months' range is set from June to August, the model will be called three times during the year, from June to August, but not in October. In 2020, the run method will be called a total of five times as follows:
- in june:
run(year=2020, month=6)
- in july:
run(year=2020, month=7)
- in august:
run(year=2020, month=8)
A more detailed, visual explanation of the difference between 'In Season' and 'End of Season' can be found here.
Therefore, the run
method in the run.py
file must be configured to return predictions specific to the passed
year
and month
parameters. It is important to note that if the model is later set to run for the past
during the model upload, for example, when executing run(year=2008, month=8)
, it should only use the data
that was available at that time, ensuring the simulation of past predictions is as accurate and reliable as possible.
In the run method itself, the model creator has full flexibility. The Greenhub.ai Python SDK can be used to fetch
various feature data for the model invocation, an ONNX file can be called, additional data from another CSV file can
be imported and utilized, and more. The only restrictions are that no internet access is allowed during the execution
of the run
method, except for the use of the Greenhub SDK package.
Now it is clear how the run
method should look like, what parameters it expects, and how much flexibility it
provides to execute the model. But what should the run
method return at the end? This is actually quite simple.
The predictions generated for the specific year
and month
should be returned in a
pandas dataframe.
The structure of the dataframe depends on the model's 'spatial resolution':
Spatial Resolution: Country
In this case, only a single prediction is expected, so the pandas dataframe should be returned in the following format.
Example:
country | predictedYield |
---|---|
US | 18.43 |
Spatial Resolution: State
For each state where a prediction was generated, a row should be returned, including the state name and the prediction.
Example:
state | predictedYield |
---|---|
MISSOURI | 5.43 |
TENNESSEE | 4.67 |
WEST VIRGINIA | 7.12 |
Spatial Resolution: Municipality
Here, we require not only the prediction and state columns but also a separate column containing the name of the municipality.
Example:
state | municipality | predictedYield |
---|---|---|
MISSOURI | BELTON | 2.35 |
MISSOURI | HANNIBAL | 4.92 |
WEST VIRGINIA | PARKERSBURG | 5.22 |
Note: we really only need the pandas dataframe to be returned, as we already have all other necessary information from the model creation information form from step 0.
The following is a summary of how a run method should be structured and an example of how it might be executed.
import greenhub as gh
import pandas as pd
def run(year: int, month: int):
# Initialize GreenHub SDK
# TODO: You can find your API key in your Account dialog at the top right of the GreenHub page
gh.initialize("YOUR_API_KEY")
# Fetch and setup feature vector
features = ... # TODO
# Load model
model = ... # TODO
# Run model
prediction = ... # TODO
# Format to expected GreenHub output
output = ... # TODO
return output
Tip: A boilerplate template that already provides a possible basic structure for the
run()
method can be downloaded from our 'Model Creation' page in step 1 under 'Download Template.'"
Package the model
Once the run
method in the run.py
file has been created and tested, everything that the run
method needs
(csv files, onnx files, other python files, etc.), including the run.py
file itself, should be zipped together
for upload to Greenhub.ai.
It’s important to note that you should zip all the files at onces, not the folder containing them.
After zipping you can now select the zip file for upload inn Step 3 on the 'Model Creation' page.
Note: if you want to be certain that your
run
method is in the correct format before uploading the model, feel free to use the greenhub command-line tool to test it in advance.
Important points to keep in mind
Here's a quick summary of the most important points to keep in mind:
- A run
method must exist, which expects the parameters year=int
and month=int
, and is located in a run.py
file.
- The year
parameter indicates the year in which the model is to be executed, while the month parameter specifies
the current month for which the model generates its predictions (the execution timing depends on whether it’s an
'In Season' or 'End of Season' model).
- Everything invoked or loaded within the run
method must be included in the zip file for upload.
- Internet access is not allowed during the execution of the run
method, except for the use of the Greenhub.ai SDK.
- The run method is expected to return a pandas dataframe containing all predictions.
Besides predictions, a model with spatial resolution "state" requires an additional column with the state of the
predictions, while a model with spatial resolution "municipality" requires a state column and an additional
column for the "municipality".