Quickstart
Guides
Tutorials
< Home

Data addition & settings

Data addition from GUI

Upon accessing the project dashboard, the first thing you need to do is to upload the data. This can be the data used for training, testing, validation, production data or any other data that you used in your project. 

AryaXAI - Project dashboard
While uploading the data for the first time (even if you want to use an API), you must first upload at least one sample data from the dashboard and define the data settings. The rest of the data can also be uploaded through the API. 

To start with this, select ‘Upload Data File’, which directs you to the data settings page.  Select ‘Upload file’. 

Here, to classify your data, you will see the ‘Upload Type’ dropdown, where you can specify the data type to 'Data' or 'Data description'. Next, you will see the ‘Upload Tag’ dropdown, where you can specify the data type - Training, Testing, Validation, or you can choose to add a custom tag as well. 

Select the file to be uploaded.

Note: You can only upload one file at a time, and the file can only be in CSV format.

Once the upload is complete, you will be directed to ‘Project Config’ to configure the details.  

AryaXAI - Data upload

Project Config.

Project Config. is necessarily the high level details of the project. This configuration will be used for all further operations and cannot be changed once set. 

AryaXAI: Configuring project details
  1. Select the Project type - which can be a classification or regression problem. 
  2. Define the ‘Unique identifier’ - The identifier for every unique data point 
  3. Select the true label - The variable target you are trying to predict (Eg. for a data set from Real estate industry, the true label can be ‘Sale price’ of a house)
  4. If your data has a predicted label, choose the label from the dropdown (This applies when you already have a model and you only want to evaluate the predictions of your model)
  5. Select the features (data points) to be excluded such the XAI model uses the same features used in your model.

There might be multiple features within your project. You can exclude the features that might not be relevant to your project from the ‘Features exclude’ option. You can see all the features included and excluded on the right. 

Note: Your data can have some duplicate unique identifiers, which can be dropped by selecting the checkbox.

 True and Predicated label

The predicted label is required if you want the XAI model to explain model the predictions. If the predicted label is not defined, AryaXAI will pick the true label to build the XAI model.

Once the above steps are completed, select ‘Submit’.

At this point, you can see the overview of the data submitted. The total data volume, Unique features (data points) and alerts are displayed. 

Note: When defining data features, specifically the data settings, it should be noted that these become the base for explainable model training. The feature selection that is done here should align with the final features that have been used in the model. 

The ‘Features’ section displays the data type. The platform starts analyzing the data and creates an explainability model for you.  

Note: Until the XAI model is not trained, the explanations (Feature importance) will show ‘nan’ as the values, you can upload any new file or open case view pages. Once the model is trained, the XAI model results can be seen in all these pages.

Once you submit the Project Config, you will be directed to the 'Project Summary' page. This page displays 3 tabs - Summary, Data Diagnostics and Model Diagnostics.

Data addition from API

First, Get the API Token for the Project. This is accessible at Workspace > Projects > Documentation

The project token (and Client Id) is accessible only to the Admins of the particular project. You can refresh your API token through the ‘Refresh token’ option provided beside the Client Id.

Below this, the project URL for uploading the data is displayed. The Python script is present, which can be used directly in your compiler.

The header XAI token needs to be defined, whereas the Client Id and project name are automatically defined. 

 
headers = 
{    
"x-api-token": your project access secret token 
} 
base_format = 
{
"client_id": test_user_arya,
"project_name": Risk-monitoring_FW6FSKJQRE
}
 

Next, prepare Data in Format of Dictionary (you can upload multiple data points in the list of dictionary format)

Define the unique identifier for the data:

 
 "unique_identifier": 
 

A single data set can be passed in string format. If multiple data sets are uploaded, you need to pass a list of unique identifiers. 

Similarly, a single data point (with one unique identifier and 3-4 columns) can be directly passed through the API. However, for multiple data points, a list of unique identifiers and column needs to be created. 

Finally, you will need to give the post request:

 
resp = requests.post(url,headers= headers json=base_format)

For every post request, data successful responses and acknowledgements are provided, so you are updated on the status.   

Data addition from SDK

To upload data, we need to pass the file path and Tag.

Note: If you are uploading data for the first time, you need to pass Config as well.

Data can be uploaded to the project either directly with a file or by passing the Pandas DataFrame. 

To configure the details in ‘Project config’ and upload data through our SDK, you can use the following commands:


config = {
            "project_type": "classification",  # The Prediction Type of your project (classification / regression)
            "unique_identifier": "Id", # unique identifier for your project
            "true_label": "SaleCondition", # Target label
            "pred_label": "", # Predicted value in case you have it
            "feature_exclude": [],  # feature you don't want Arya Xai surrogate model to use for modelling
        }

# Data is diffrentiated using Tag
Tag = 'Training'  # Data is diffrentiated using Tag

#To upload the data into the project. This will also build the initial ML model.
project.upload_data('file_path','tag', config)

Once the data is uploaded, you can also view the files, and file info through SDK. 


#Check the files that are uploaded in the project.
project.files()

Additional functions:


#You can get the summary for the specific file: Missing values, Max/Min, Data type.
project.file_summary()

#To know all the settings: Data, Data Encoding & Model params
project.config()

project.all_tags()


Additionally, you can also delete the uploaded file:


#project.delete_file('file_name')

Project Summary:

As mentioned earlier, once the Project Config is complete, you will be directed to the 'Project Summary' page. This page displays 3 tabs - Summary, Data Diagnostics and Model Diagnostics.

Summary

The Summary tab displays:

          - An Overview of total data volume and Unique features

          - A data summary (Features) table, and

          - Volume

Volume

This section displays the volume graph, which provides an overview of the data upload activity over time. You can investigate different parameters and plot the data activity, based on data label, feature name, date (of the feature name or date of creation),  range and plot type,  for writing codes with ease.

AryaXAI - Project overview
Data Summary table

Through GUI

The section provides you with a summary of data features and displays the data type. You can easily navigate to the different data tags by selecting them from the dropdown list on the right. 

Note: If the feature table displays ‘NA’ under ‘Feature importance’, it means that the particular feature is not used in the explainability model. (This setting comes from project config. Mentioned in data settings).
AryaXAI - Data summary table
Note: Using the ‘Refresh Data’ option will provide the most latest view of the data. The loading time will differ based on the data volume.

Through SDK

To view the summary of data features and the data types through SDK:


# data summary
project.data_summary('tag')

# data diagnosis
project.data_diagnosis('tag')

Data Diagnostics

Data is at core for building a good ML model. In this section, AryaXAI runs a full profile of all the datasets added. You can see these warnings and decide to include or not include any features.

The Data summary table in the Data Diagnostics tab displays an overview of the total data volume, unique features and warnings for any inconsistencies in the analytical data uploaded. These warnings range from missing data to high feature correlation, high cardinality, etc. 

The details for Data observations and the warnings can be seen in the tables provided below ‘Overview’. 

Additionally, for data drift diagnosis through SDK:


# data drift diagnosis
project.data_drift_diagnosis(['tag'],['tag'])

Model Diagnostics

The ‘Model stability’ table in the Model Diagnostic tab displays the same model details as seen in the AutoML section. 

‘Data Stability’, allows you to compare the data between two models for data drift overview. 

Select the Baseline and Current tags for a detailed comparison, which displays the feature, drift detected, method, feature type, drift score, etc.

Data Setting

Here, the data upload details are displayed, from where you can upload data or delete uploaded data files.

AryaXAI - Data settings dashboard
Note: It is important to define the first file that is uploaded. This file which is uploaded in any of the categories (viz. training, testing, validation or custom), becomes the training data for the explainability model.

When defining data features, specifically the data settings, it should be noted that these become the base for explainable model training. The feature selection that is done here should align with the final features that have been used in the model.

Modify Data Settings

To modify the data settings, select ‘Update config’ option present in ‘Data settings’. Whenever the settings are modified, the explainability model is retrained again.

AryaXAI: Modify Data settings
Whenever these data settings are updated, it triggers training for the XAI model.

Page URL copied to clipboard