Quickstart
Guides
Tutorials
< Home

Synthetic AI

Synthetic data is one of the techniques to align models. However, the efficacy highly depends on the ability to create high-quality synthetic data sets. AryaXAI offers advanced 'Synthetic AI' techniques like GPT-2 and CTGAN to create high-quality synthetic data sets.

AryaXAI - Synthetic AI

Train Model

Through GUI

To generate synthetic data in AryaXAI, the initial step involves training a 'Synthetic model.' This model creates initial data based on the uploaded training data. Once the quality of this generated data is assessed and approved by the user, additional synthetic data can be produced.

To begin, the model needs training. Follow these steps:

  1. Access the 'Synthetic AI' tab in the Main menu (on the left).
  2. Switch to the 'Train Model' tab at the top.
  3. Choose your preferred model and click 'Train' to begin the model training process. This action will redirect you to 'Data Configuration' for customization, where you set the 'Initial Configuration' and 'Model Parameters'

In the 'Data Configuration' section:

      - Under 'Initial Configuration,' select the relevant data tag from the dropdown for creating synthetic data. Exclude specific features if needed and click 'Save initial configuration.'

      - Proceed to 'Model Parameters' and input details such as Batch size, Early stopping patience, Early stopping threshold, Epochs, Model type, Random state, and Tabular config.

After completing the above steps, await the 'Model Training complete' notification.

Through SDK

Help function method train_synthetic_model:


help(project.train_synthetic_model)

Define parameters for your synthetic model:


data_config = {
    "tags": ["Training"],
    "feature_include": feature_include # data used for training/generating synthetic data
}
hyper_params = {
    "epochs": 2,          # epochs are no of iteration of data into model (more the better, but longer) # Max 100 supported
    "test_ratio": 0.2     # Data used for training/generating synthetic data. how much to keep aside for testing
}

project.train_synthetic_model(
    model_name='CTGAN',                    # CTGAN / GPT2 , models are avaialable
    data_config=data_config,
    hyper_params=hyper_params
)

To fetch trained models:


project.synthetic_models()

Synthetic Models

Through GUI

After the model training, the 'Synthetic Models' tab showcases the trained models and their status. This comprehensive list includes key details such as the model's Name, creator, creation date, overall quality score, Column shapes, and Column pair trends.

In the 'Options' column within the list, selecting 'Show' unveils additional model details, including:

      - Synthetic Data Quality

       - Training: Detailed training logs and associated data tags. If the model training fails, the log will provide reasons for the failure.

      - Synthetic Data generation

      - Anonymity test

Clicking on any of these sections reveals further details. Using the saved model, you can generate additional synthetic data. 

Through SDK

To generate data and analyze the synthetic model quality via SDK:


# select model you want
project.synthetic_model(model_name='CTGAN_v14') 
model = project.synthetic_model(model_name='CTGAN_v1')

model.get_data_quality()

Synthetic Data

Through GUI

In the 'Synthetic Data' tab, you'll find the initial data generated post-model training. This list showcases the data's creation date and time, along with the following details:

   - Overall quality score: Represents the mean of Column Shapes and Column pair trends, providing an overview of data quality.
   - Column Shapes: Indicates the similarity between uploaded and synthetic data for individual columns. A higher score implies closer resemblance. A score of 1 signifies significant       divergence, while a score between 5-7 suggests considerable similarity.
   - Column pair trends: Reflects similarity between uploaded and synthetic data for pairs of columns.
   - The PSI plot graph visualizes data distribution congruence, followed by the count of rows and features used in generating the synthetic data.

Synthetic Data generation

To generate additional synthetic data rows:

  1. Visit the 'Synthetic Models' tab and choose 'Show' for your preferred model.
  2. Scroll down to locate 'Synthetic Data generation' under 'Training.'
  3. Specify the number of 'Synthetic Rows' required and click 'Generate.'

AryaXAI will store the newly generated data in the 'Synthetic Data' tab, identified under the same naming convention with '_1.'

Anonymity test

Anonymeter is a comprehensive statistical system that assesses privacy risks in synthetic tabular datasets. It includes evaluators that gauge the likelihood of identifying individuals, linking data, and making inferences, all of which could pose risks to data donors after publishing a synthetic dataset. 

To perform an Anonymity test on your data:

  1. Navigate to the 'Synthetic Models' tab and opt for 'Show' for your intended model.
  2. Scroll down to locate 'Anonymity test' under 'Synthetic Data generation.'
  3. In the 'Aux Columns' dropdown, select the Auxiliary columns for comparing data values. Choose tags from the 'Control tags' dropdown, which were not utilized during training, and click 'Submit.'

Upon successful execution, the screen displays the metric values associated with Privacy Evaluation. AryaXAI measures this on four metrics:

- Univariate: Looks at individual variables in isolation

- Multivariate: Considers the combined effect or correlation among various attributes

- Linkability: Focuses on assessing the risk of connecting or linking sensitive information across different datasets or sources

- Inference: Involves deducing or predicting sensitive details by analyzing patterns, correlations, or statistical relationships present within the data

Through SDK

To generate Synthetic Data via SDK:


model.generate_synthetic_datapoints(1000)

To get the Population Stability Index (psi) plot for synthetic model data via SDK:


model.plot_psi()

To fetch existing anonymity scores for model synthetic data:


model.anonymity_score()

Create new anonymity scores for model via SDK:


model.generate_anonymity_score(
    aux_columns=["Alley","3SsnPorch"],
    control_tag='Training'
)

Prompting

Following the model's training phase, before synthetic data generation, AryaXAI offers a feature called 'Prompting' that allows you to establish specific conditions for the data generation process.

Through GUI

To create a new prompt, go to the 'Prompting' tab in Synthetic AI. Select the 'Create Prompt' button on the right. 

  1. Navigate to the 'Prompting' tab within Synthetic AI and click the 'Create Prompt' button located on the right.
  2. Fill in the Prompt name and specify features while setting conditional operators.
  3. Add the Feature value as required, then save the prompt.

The created prompt will appear in the Prompting tab, displaying its name, creation and update details, and status. Here, you can deactivate or delete the prompt. An 'Active' status indicates that the conditions specified in the prompt will be applied during the generation of new synthetic data.

Through SDK

List existing prompts


project.get_synthetic_prompts()

Create Synthetic Promopts


project.create_synthetic_prompt(
    name='Grade A synths',
    expression='(grade = A)'
)

Additional functions:


# project.get_synthetic_model_params()
# project.get_synthetic_models()
# project.get_synthetic_model(model_name='CTGAN_v17')
# model = project.get_synthetic_model(model_name='CTGAN_v17')
# model.get_model_type()
# model.get_data_quality()
# model.plot_psi()
# model.generate_datapoints(1000)
# model.generate_anonymity_score(
#     aux_columns=["Alley","3SsnPorch"],
#     control_tag='CTGAN_v17_SyntheticData1'
# )
# model.get_anonymity_score()
# model.delete()
# model.get_tags()
# tag = model.get_tag('CTGAN_v17_SyntheticData1')
# tag.get_model_name()
# tag.view_metadata()
# tag.get_datapoints()
# tag.delete()
# project.get_synthetic_prompts()
# prompt = project.get_synthetic_prompt(prompt_id='6a5a494156d6')
# prompt.get_expression()
# prompt.get_config()
# prompt.activate()
# prompt.deactivate()
# project.get_observation_params()
# project.create_synthetic_prompt(
#     name='prompt sdk 5',
#     expression='GarageCond == Ex'
# )

Page URL copied to clipboard