QUICK LINKS

GETTING STARTED

COMPONENTS

TUTORIALS

Data Drift

Data drift occurs when the characteristics of the data encountered by a model in production deviate significantly from those of the dataset on which the model was trained.

Basic concepts:

  • ‍Baseline: Users can define the baseline basis on ‘Tag’ or segment of data based on ‘date’.
  • ‍Frequency: Users can define how frequently they want to calculate the monitoring metrics
  • ‍Alerts frequency: Users can configure how frequently they want to be notified about the alerts

Get all dashboards created:


project.get_all_dashboards("data_drift")

To retrieve any past dashboard, use the following function:


project.get_dashboard(type="data_drift",dashboard_id="")

Set up a data drift dashboard in AryaXAI:

Users can easily establish Data Drift monitoring and diagnosis using the AryaXAI Python SDK. While fetching the default dashboard requires no additional payload, creating a new one necessitates passing the following parameters:


#Fetching Default Data Drift Dashboard 
project.get_data_drift_dashboard()

#make new dashboard
project.get_data_drift_dashboard(
{
    "base_line_tag": ["Training"],
    "current_tag": ["XGBoost_default_data"],
    "stat_test_name": "psi",
    "features_to_use": ['num_op_rev_tl']
 },
#instance_type= "small" #choose the batch server(small,large,etc)
)

You can also use the help function to get all parameters and payloads:


help(project.get_data_drift_dashboard) 

In the config file, to create a data drift dashboard, we need to define 'Baseline' tag, 'Current Tag', which 'statistical test' you would like to use to calculate the drift, threshold of the stat test, which features you would like to run the data drift test for, you can also define the dates in these tag for which you want to calculate the drift.


"base_line_tag": ["Training"],
    "current_tag": ["m12017"],
    "stat_test_name": "psi",
    "features_to_use": ['num_op_rev_tl'],
}

#Note: if you don't define features_to_use, then the drift will be calculated on all columns.
NOTE: If you encounter errors such as "Out of RAM," please select a larger RAM size in the serverless option. For errors like "divide by 0," these may indicate that a column has null values, which can cause the drift calculation to fail. In such cases, remove these columns and rerun the drift dashboards.

# default data drift report generated after uploading train and test data
project.data_drift_diagnosis()

Seeing Data Drift report between Tags:


project.data_drift_diagnosis(['Training'],['Testing'])

Drift Metrics:

AryaXAI offers various statistical tests to analyze data drift

Available Statistical tests:

Compute selection

In addition to the configuration file, you need to specify the compute option where the drift analysis should be performed. You must also decide whether to run the drift analysis in the background or to run it interactively and view the results immediately. If you choose to run it in the background, the cell will initiate the drift analysis, and you can retrieve the results from the logs later.


dashboard = project.get_data_drift_dashboard(config, instance_type = 'medium', run_in_background = False)