AI systems must be robust, secure, and safe throughout their life cycles, and potential risks must be continually assessed and managed (OECD, 2019). AI models can be reinforced by careful training and testing of their performance based on their intended purpose. In this article, we look at the processes of training and testing AI models, the guidance for model risk management in the US and EU, and the importance of correlation without causation and tail risk in AI models.
Training AI Models, Validating Them and Testing Their Performance
In order to capture higher order interactions, models may need to be trained with a larger size of datasets. At the same time, using ever-larger sets of data for training models risks making models static, which, in turn, may reduce the performance of the model and its ability to learn. To mitigate this risk, modellers split the data into training and test/validation set and use the training set to build the (supervised) model with multiple model parameter settings; and the test/validation set to challenge the trained model, assess the accuracy of the model and optimise its parameters.
Synthetic datasets are being artificially generated to serve as test sets for validation, and provide an interesting alternative given that they can provide inexhaustible amounts of simulated data, and a potentially cheaper way of improving the predictive power and enhancing the robustness of ML models, especially where real data is scarce and expensive.
Continuous testing of ML models is indispensable in order to identify and correct for ‘model drifts’ in the form of concept drifts or data drifts. Ongoing monitoring and validation of models throughout their life is fundamental for the risk management of any type of model. Validation activities should be performed on an ongoing basis to track known model limitations and identify any new ones, especially during periods of stressed economic or financial conditions.
Guidance for Model Risk Management in the US and EU That Applies to AI Models
In the US, Supervision and regulatory letter SR 11-7 issued by the Federal Reserve in 2011 provides technology-neutral guidance on model risk management that has stood the test of time, and is certainly useful in managing risks related to AI-driven models. The letter provides guidance on model development, implementation and use by banking institutions, and looks into (i) model development, implementation, and use; (ii) model validation and use; and (iii) governance, policies and controls.
More recently, the European Banking Authority (EBA) published Guidelines on loan origination and monitoring, including rules for appropriate management of model risks. In addition to ongoing monitoring and reviewing of the code/model used, some regulators have imposed the existence of ‘kill switches’ or other automatic control mechanisms that trigger alerts under high risk circumstances.
Correlation without Causation and Meaningless Learning
The understanding of cause-and-effect relationships is a key element of human intelligence that is absent from pattern recognition systems. Users of ML models could risk interpreting meaningless correlations observed from patterns in activity as causal relationships, resulting in questionable model outputs. Moving from correlation to causation is crucial when it comes to understanding the conditions under which a model may fail.
Evidence from some research suggests that models are bounded to learn suboptimal policies if they do not take into account human advice, perhaps surprisingly, even when human’s decisions are less accurate than their own.
AI and Tail Risk: The Example of the COVID-19 Crisis
AI models are adaptive in nature, as they evolve over time by learning from new data, they may not be able to perform under idiosyncratic one-time events that have not been experienced before, such as the COVID-19 crisis, and which are therefore not reflected in the data used to train the model. Tail events give rise to discontinuity in the datasets, which in turn creates model drifts that undermine the models’ predictive capacity. Synthetic datasets generated to train the models could going forward incorporate tail events of the same nature, in addition to data from the COVID-19 period, with a view to retrain and redeploy redundant models.
Robustness and resilience of AI models must be reinforced by careful training and testing of their performance based on their intended purpose. Supervision and regulatory letters from the Federal Reserve and the European Banking Authority provide guidance for model risk management that applies to AI models. Correlation without causation and tail risk must be taken into account when training and testing AI models. Ongoing testing of models with validation datasets that incorporate extreme scenarios and continuous monitoring for model drifts are important to mitigate risks encountered in times of stress.