Skip to content

Logging Training Data

To log training data, you first need to create a model version. If you haven't done that yet, see Integrate your ML Model.

If you created the model version as part of your training code, you can use the Model object returned by aporia.create_model_version:

import aporia

apr_model = aporia.create_model_version(...)

Otherwise, you can get a reference to the model object:

import aporia

apr_model = aporia.Model(model_id="my-model", model_version="v1")

Expected Format

The log_training_set and log_test_set functions receive pandas DataFrames, in which each column corresponds to a field defined during model version creation.

NOTE: Aporia doesn't actually store your training data - we save aggregations of it instead.

For example, if our model version was defined as follows:

aporia.create_model_version(
  model_id="my-model",
  model_version="v1",
  model_type="binary",
  features={
    "Age": "numeric",
    "Annual_Premium": "numeric",
    "Previously_Insured": "boolean",
    "Vehicle_Age_< 1 Year": "boolean",
    "Vehicle_Age_> 2 Years": "boolean",
    "Vehicle_Damage_Yes": "boolean",
  },
  predictions={
    "will_buy_insurance": "boolean",
  },
)

Then the log_training_set function would receive something similar to this:

import pandas as pd

training_features = pd.DataFrame({
  "Age": [31, 20, 53],
  "Annual_Premuim": [11234, 534534, 859403],
  "Previously_Insured": [False, True, True],
  "Vehicle_Age_< 1 Year": [False, True, False],
  "Vehicle_Age_> 2 Years": [True, False, True],
  "Vehicle_Damage_Yes": [True, False, False],
})

training_labels = pd.DataFrame({
  "will_buy_insurance": [True, False, True],
})

Logging Training Data

Use the log_training_set function to report the features and labels your model was trained on:

apr_model.log_training_set(
  features=training_features,
  labels=training_labels,
)

Logging Test Data

Use the log_test_set function to report the features, predictions and labels you used to test your model:

apr_model.log_test_set(
  features=test_features,
  predictions=test_predictions,
  labels=test_labels,
)

In both log_training_set and log_test_set, it's possible to log the raw inputs that were used to generate the training features, using the raw_inputs parameter.