Reviewing the Risks of AI Technical Debt in Financial Services Industries (FSIs)

This paper reviews the AI technical debt (TD) in FSIs and does an empirical study about the risks involved

Vinay Kumar

September 18, 2024
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Download the Paper

Abstract: 

AI is increasingly becoming an important catalyst for improving the efficiency and efficacy of the financial services industry. For this paper, we consider institutions that provide banking, insurance, payment, and investment services as part of the financial services industry (FSI). The recent success of generative AI and predictive AI in the last few years generated enormous interest in deploying AI across FSI use cases. However, because it is highly regulated and lacks open-source datasets, there are not enough published resources on the production challenges, failures, and reasons behind them. Because of this, there is a growing technical debt regarding how AI is deployed in the FSIs. In addition to this, due to a lack of interdisciplinary skills in AI and the associated business risks, traditional risk managers and auditors struggle to create risk frameworks for AI deployments. In this paper, we will review the AI technical debt (TD) in FSIs and empirically study the risks involved.

Approach: 

In the paper, we list various AI technical debts that can be incurred while deploying AI in FSIs. We describe the nature of each AI technical debt and provide an empirical study of the risks. 

We will follow the following steps to arrive at this: 

  • Step 1: Propose the TD categorisation 
  • Step 2: Empirical review of anti-patterns in each TD category 
  • Step 3: Reviewing the risks of the TD

Results:

Data Inconsistencies Debt:

Description: 

When data pipelines and data formats shared between training and production environments differ, ML models suffer to maintain performance between training and production.

Risks: 

Such inconsistencies can lead to significant performance degradation and model errors. 

Causes: 

Non-standard data capturing between systems used for providing training data vs production, the inability of legacy systems to capture data at scale, lack of standardisation and frequent changes in the systems used in the pipeline.

Excluding features without business oversight:

Description: 

Feature omission debt occurs when ML practitioners remove critical data features of the use case, often missing crucial information when providing predictions. 

Risks

Poor capturing of the realistic risks associated with the process, 

Causes

Feature omission debt occurs when crucial features are excluded from a model due to statistical reasons or lack of business oversight. This can result in models that miss important information, leading to reduced accuracy and increased bias, particularly in high-stakes environments like loan underwriting.

Over-Engineering Debt:

Description

Over-Engineering Debt is created when raw features are highly engineered to create synthetic features that overly simplify or complicate the features used in the modelling.

Risks

Over-engineering can hinder the model’s ability to learn from data effectively and makes it difficult to trace and explain model decisions, posing challenges for regulatory compliance.

Causes

When raw features are overly processed into synthetic ones, it can oversimplify or overcomplicate them, reducing their ability to capture critical data. Overcomplicated feature engineering, especially for high-cardinality categorical features, can lead to poor model learning. Practitioners often overlook categorical features due to limitations in models like XGBoost and linear regression, resulting in both feature omission and engineering debt. Techniques like label or one-hot encoding for high-cardinality features create sparse data, and reducing dimensionality by grouping values further limits information capture. Complex synthetic features also hinder explainability, making it difficult to trace feature importance back to raw data, complicating compliance and risk management. Lack of domain knowledge during feature engineering can further degrade feature quality, increasing engineering debt.

Insufficient Testing Debt:

Description

Insufficient testing debt arises when AI models are not adequately tested across the full complexity of real-world scenarios. 

Risks

In FSIs, where regulatory requirements demand high levels of model robustness, inadequate testing can lead to unreliable and non-compliant models.

Causes

Insufficient Testing Debt arises from several causes. Inadequate selection of test data with an unreasonable sample distribution can bias models, leading to performance issues. Lack of proper stress testing hinders the identification of gaps and the design of effective risk policies, affecting model robustness. Additionally, using outdated testing strategies for model updates, without accounting for changes in corner cases, results in models that lose relevance and accuracy after retraining. Failure to perform continuous testing and validation further exacerbates this, as models may not remain accurate over time. Moreover, insufficient adaptation of testing strategies to the dynamic nature of the business and models contributes to testing debt.

Explainability Debt: Risks of Opaque or Misunderstood ML Models

Description

Being unable to explain ML predictions or using a wrong explainability model creates explainability debt that has serious downstream risks. The opacity of AI poses serious risks for FSIs and can lead to loss of control because of a lack of transparency. This can lead to serious damage to consumer trust and confidence.

Risks

The opacity of AI presents significant risks for FSIs, potentially leading to a loss of control and damaging consumer trust due to a lack of transparency, which is particularly critical in FSIs where explainability is essential for regulatory compliance and maintaining customer confidence.

Causes

The urgency of deploying the models in production leads to insufficient attention to explainability, resulting in models being used without proper explanations.  Post-hoc techniques like SHAP and LIME can be unstable, producing inconsistent explanations due to factors such as random perturbations or correlations between features. Problems such as unrealistic data instances in SHAP values and limitations of various explanation methods (e.g., Deep Taylor Decomposition) raise concerns about the accuracy and faithfulness of explanations. Explainability methods can also be manipulated, such as through scaffolding attacks or counterfactual explanations, leading to misleading or biased interpretations. Moreover, standards are not defined in terms of the accuracy of explanations. Frequent disagreements and the use of ad hoc heuristics in resolving explanation issues contribute to the overall explainability debt, posing risks for accurate and reliable model explanations.

Drift Monitoring Debt:

Description

Drift monitoring debt occurs when there is inadequate monitoring of data and model drift during production. 

Risks

As data and market conditions change over time, models that are not regularly updated and monitored can become obsolete, leading to inaccurate predictions and increased risk.

Causes

Drift monitoring debt arises from several key issues. Data drift occurs when the data used in training differs from the data in production due to evolving business or consumer behavior, regulatory changes, or market volatility. This can lead to inaccurate predictions and model degradation. Model/Concept drift occurs when the relationship between input and target variables changes, causing the model’s predictive power to decay over time. Factors such as market microstructure noise from high-frequency trading can further introduce drift in financial data streams, impacting model accuracy. 

Additionally, shifts in consumer behavior or economic conditions can degrade models like credit scoring over time. Anomalies in data, such as changes in transaction patterns, can indicate fraud or shifts in consumer behavior, requiring timely intervention. Traditional models often struggle to adapt to non-stationary data, leading to suboptimal predictions and increased operational costs. Difficulty in distinguishing genuine concept drift from noise can result in unnecessary model updates. Delays in monitoring drift can make model predictions redundant, potentially causing the complete failure of AI solutions. Addressing these issues requires adaptive modeling techniques and robust drift monitoring mechanisms while balancing accuracy with computational and operational costs.

Bias/Fairness Debt:

Description

Bias/Fairness Debt occurs when biased ML models are deployed or used without addressing the inherent biases in the models, resulting in unfair outcomes by perpetuating or amplifying biases present in the training data.

Risks

Biased or unfair models can lead to discriminatory practices, perpetuate inequalities, and cause financial losses or reputational damage. They also create significant operational and compliance risks for companies.

Causes

Bias or unfairness can be introduced when models are trained on historical data reflecting societal biases, leading to discriminatory outcomes. In areas such as lending and insurance, models may systematically disadvantage certain demographic groups, resulting in unfair treatment and pricing. Compliance risks arise from evolving regulations like the EU AI Act and the Algorithmic Accountability Act, which require regular algorithmic audits and impact assessments to identify and mitigate bias.

However, the lack of standardized metrics for measuring algorithmic bias complicates these efforts. Additionally, as noted in "Explainability Debt," it is possible to manipulate explainability methods to deceive auditors, making effective bias regulation even more challenging.

Auditability Debt:

Description

Auditability debt occurs when AI solutions are deployed without capturing the right artefacts and following the right audit framework

Risks 

Auditability Debt poses significant risks including undermining the reliability and trustworthiness of ML systems, unchecked biases and errors due to lack of transparency, and limited effectiveness of audits because of restricted access to model artifacts. These issues can lead to regulatory compliance problems, operational risks, and potential system failures, impacting the accuracy and fairness of decision-making.

Causes

The increasing complexity and vast amount of data in ML models make auditing challenging, leading to difficulties in identifying and rectifying issues. Algorithms often operate as "black boxes," making it difficult to understand their decision-making processes. This opacity can lead to unchecked biases and errors, accumulating over time and resulting in significant audit debt. Additionally, restricted access to model artifacts and data, such as only query access and output observation, limits auditors' ability to conduct thorough evaluations and fine-tuning. 

The development of standardized auditing frameworks that provide guidelines for evaluating algorithmic systems can help ensure consistency and thoroughness in audits.

Model Attack Debt:

Description

AI models are vulnerable to various types of attacks, such as adversarial attacks or model poisoning. Model attack debt is the risk accumulated when these vulnerabilities are not adequately addressed, potentially compromising the integrity and security of AI systems in FSIs.

Risks

Model Attack Debt poses several risks including integrity compromise from adversarial attacks, confidentiality breaches through membership inferencing, and availability issues due to model poisoning. Additionally, model extraction attacks can allow adversaries to replicate and exploit the model’s functionality, further endangering the system’s reliability and security.

Causes

Model Attack Debt arises from several key issues. Adversarial attacks involve crafting malicious inputs to deceive models into making incorrect predictions and can be either white-box (full model access) or black-box (limited query access). Model poisoning involves manipulating training data to introduce malicious behavior, potentially creating vulnerabilities like backdoor access. Membership inferencing attacks aim to determine if specific data points were used in training, compromising data privacy, while model extraction involves creating surrogate models to enable further attacks. Federated learning environments are particularly vulnerable to these issues, making data integrity challenging to ensure. 

Addressing these vulnerabilities requires measures such as adversarial training, secure computation methods, anomaly detection, and regular model audits to maintain the security and integrity of AI solutions.

Shadowy Pre-Trained Models Debt:

Description

This type of debt occurs when pre-trained models are used without sufficient documentation or understanding of the training data and sources. 

Risks

Shadowy Pre-Trained Models Debt poses several risks, particularly in generative AI and financial services use cases. When financial institutions use pre-trained models based on unauthorized data, they risk violating data privacy laws such as the GDPR, CCPA, and DPDP Act. This non-compliance can lead to substantial fines and legal repercussions. Additionally, the use of these models can breach contractual obligations and lacks the necessary accountability for financial institutions, compromising their ability to meet regulatory standards and maintain trust.

Causes

This debt arises from employing pre-trained models that rely on unauthorized data. In generative AI, it involves copyright and trademark violations, while in financial services, using such models for applications like credit scoring can breach data privacy regulations.

Delayed or Unchecked Feedback Debt:

Description

Feedback debt arises when ML models are used without any feedback in production or when the feedback is delayed or not validated correctly.

Risks

Delayed or unchecked feedback debt in ML models poses significant risks. When feedback is not provided or is delayed, models can degrade over time, leading to decreased relevance and effectiveness. This is especially problematic in industries such as loan underwriting, insurance, and health insurance, where the impact of decisions or claims may not be known for months or years. Delayed feedback can negatively affect decision-making and risk assessment, while unchecked feedback can cause models to learn and reinforce inconsequential patterns or biases, compounding errors and compromising reliability and fairness. Additionally, adversarial attacks can distort feedback data, further increasing the risk of incorrect decisions.

Causes

Delayed or unchecked feedback debt stems from various factors. A primary cause is the inherent delay in feedback due to the nature of industries like loan underwriting and insurance, where the outcomes of decisions may not be evident for extended periods. This delay complicates timely validation of model efficacy. The complexity of handling vast amounts of structured and unstructured data can also hinder real-time feedback processing. Moreover, unchecked feedback can lead to models learning erroneous patterns that are difficult to detect and correct, exacerbating errors. Adversarial attacks can further distort feedback, intensifying issues with model bias and fairness. Effective management of feedback requires improved communication among data scientists, engineers, and domain experts to ensure accurate interpretation and implementation.

Reproducibility Debt:

Description

Reproducibility debt occurs when the results of AI models cannot be consistently replicated. 

Risks

Reproducibility debt in financial services institutions (FSIs) poses several risks. Without fully documented artifacts, ML models can lead to significant reproducibility debt, undermining consistency and stability in risk management. Key risks include the inability to reproduce model results accurately, which can compromise the reliability of decisions based on these models. The lack of standardized documentation and version control can exacerbate this issue, especially when complex models and proprietary datasets are involved. This can result in difficulties during model validation and auditing, impacting the organization's ability to ensure stability and trust in their ML processes.

Causes

Reproducibility debt arises primarily from inadequate standardization, poor documentation, and weak version control and governance frameworks. The complexity of ML models and the use of proprietary datasets, which are not always available during validation, further contribute to this issue. Time constraints and pressure to deploy models quickly often lead to incomplete documentation of critical elements such as data reprocessing steps, hyperparameter settings, and training procedures. Additionally, the complex interactions between various systems within FSIs and the lack of continuous capturing of artifacts increase the difficulty of reproducing results, thereby exacerbating reproducibility debt.

Compliance & Governance Debt:

Description

Compliance & governance debt refers to using AI solutions without a regulatory or governance framework because of a lack of clarity in regulations or internal governance.

Risks

Compliance and governance debt in AI arises from the rapid advancement of AI technologies, which outpace the development of regulatory measures. This creates significant challenges for financial services institutions (FSIs) as they struggle to align their AI solutions with evolving and often fragmented regulations. The lack of standardization in benchmarking, definitions, and regulatory approaches between different geographies and regulatory bodies further complicates compliance. This disparity increases the risk of deploying AI systems that may not fully adhere to current or future regulations, leading to potential legal and financial repercussions.

Causes

The causes of compliance and governance debt are rooted in the fast-paced evolution of AI technologies, which regulators find difficult to keep up with. The absence of standardized benchmarking and regulatory definitions contributes to inconsistent regulatory requirements across different regions. For instance, while anti-model laundering and fraud prevention models are heavily regulated in the US, Europe does not have equivalent regulations. Additionally, traditional regulatory approaches are often static and do not adequately address the dynamic nature of AI systems. The lack of clarity regarding liability and accountability for AI decisions further complicates governance, making it challenging for FSIs to ensure their AI models meet all current and future compliance standards.

Stakeholder Debt: The Hidden Cost of Limited Participation in AI Development

Description

Stakeholder debt arises when AI solutions are developed with input from only a limited group of stakeholders, leading to a lack of transparency and understanding among other key participants. This can create dependencies and misalignments that compromise the effectiveness of AI systems.

Risks

Stakeholder debt in ML projects arises from the lack of transparency and understanding between data scientists and other stakeholders. When critical information about ML models is not adequately shared with all stakeholders, it creates gaps in understanding and validation of the AI solutions. This often leads to misalignments in the system’s functionality and stakeholder expectations, as only a subset of individuals, typically data scientists, fully comprehends the technical details. The failure to involve stakeholders in various stages of the ML lifecycle can result in a lack of explainability and understanding, increasing the risk of misunderstandings and dissatisfaction with the final product.

Causes

The causes of stakeholder debt are primarily rooted in the highly technical nature of ML, which often limits comprehensive understanding to data scientists and practitioners. This technical complexity makes it challenging to communicate critical information in an accessible manner to all stakeholders. The lack of iterative feedback and involvement of stakeholders throughout the development process contributes to misalignments between system features and user needs. Additionally, without proper tools and frameworks to facilitate stakeholder engagement and communication, integrating diverse inputs and managing conflicting priorities becomes difficult, leading to potential compromises that may not fully satisfy all stakeholders.

Model Risk Management (MRM) Debt:

Description

MRM debt occurs when AI models are deployed with poorly tested risk management frameworks. In FSIs, effective MRM is crucial for safeguarding organizational objectives and ensuring the long-term sustainability of AI models.

Risks

Model Risk Management (MRM) debt in machine learning (ML) arises from several factors that challenge traditional risk management practices. The complexity and non-linearity of ML models, combined with high-dimensional data and opacity, make it difficult to apply existing MRM guidelines effectively. Unlike traditional models, ML models require continuous monitoring for issues such as bias, data drift, and overfitting. The absence of standardized definitions and metrics for key aspects like explainability further complicates risk management. The evolving regulatory landscape for ML models also adds uncertainty, making it challenging for organizations to implement robust MRM practices and resulting in potential gaps in risk mitigation and model reliability.

Causes

The causes of MRM debt in ML stem from the inherent differences between traditional and ML models. Traditional MRM guidelines, which focus on static models with well-defined components, are less applicable to the dynamic and opaque nature of ML models. The complexity of ML models, including their non-linearity and reliance on high-dimensional data, requires new risk management approaches. Additionally, the lack of standardization in metrics for assessing risks such as bias and explainability creates challenges in applying effective MRM practices. The rapid development of ML technologies and the absence of clear regulatory guidelines further exacerbate these challenges, leaving organizations with insufficient tools and frameworks for managing model risk effectively.

Conclusion

In conclusion, AI Technical Debt (TD) in Financial Services Industries (FSIs) poses significant risks, as any system failure can lead to financial losses, regulatory challenges, and a loss of customer trust. While AI offers immense value by automating critical functions and improving client experiences, the complexity of AI systems creates unique challenges that must be addressed. 

Key debts like explainability and model risk management are particularly complex due to the lack of standardization and the evolving nature of AI governance. However, other debts such as feature engineering, feedback, and compliance are manageable with proper design and internal governance frameworks. As AI becomes more integral to FSIs, addressing these debts is crucial to ensure operational stability and reduce risk.

Join our Webinar

We invite you to join us for an engaging discussion session on October 8th, at 4:30 pm SGT/ 2:00 pm IST, where we will delve deeper into the insights from this paper and discuss practical strategies for managing AI Technical Debt in FSIs.

This webinar will provide actionable insights and tools to help your organization navigate the complexities of AI Technical Debt, ensuring the long-term success of your AI initiatives. Register for the webinar here:

Save your spot

See how AryaXAI improves
ML Observability

Learn how to bring transparency & suitability to your AI Solutions, Explore relevant use cases for your team, and Get pricing information for XAI products.