Why monitor data drifts?
Data drifts are one of the top reasons why model accuracy degrades over time. Data drift is the change in model input data that leads to model performance degradation. Monitoring data drift helps detect these model performance issues.
Causes of data drift include:
- Upstream process changes, such as a sensor being replaced that changes the units of measurement from inches to centimeters.
- Data quality issues, such as a broken sensor always reading 0.
- Natural drift in the data, such as mean temperature changing with the seasons.
- Change in relation between features, or covariate shift.
For this monitor type, you can select the following detection methods:
- Anomaly Detection - The distribution of the inspected data is compared to the distribution of a time period before the data was collected.
- Compared To Segment - The distribution of the inspected data is compared to the distribution of a different data segment.
- Compared To Training - The distribution of the inspected data is compared to the distribution of the reported training data.
Start from choosing the features / raw inputs you'd like to monitor. You can select as many as you want :-)
The monitor will compare the distributions of these fields between the inspection period to the baseline you chose. An alert is raised if the monitor finds a drift between these distributions.
Note that the monitor configuration may vary between the detection method you choose.
You can work with the monitor preview and play with these thresholds to make sure you have a healthy amount of alerts. Note that the thresholds are different for numeric field vs. categorical fields.
How are drifts calculated?
If you need to use other metrics, please contact us.