Interpretable AI pricing: combining machine-learning lift with actuarial transparency

A pricing model that wins on out-of-sample deviance but cannot be explained to the pricing committee is not a better model. It is a worse one. Interpretability is not a cosmetic feature in insurance pricing — it is part of model fitness, and the research frontier finally agrees.

By Elana Hörstmann, Quantitative Specialist

GLMs remain the backbone of pricing because they are transparent, stable and well understood. The frontier is not “replace the GLM with a black box”. It is to combine the predictive lift of modern models with the explainability, smoothness, monotonicity and governance that pricing decisions actually require.

Why pricing is not a Kaggle competition

A pricing model has to be accurate. It also has to be implementable in a rating engine, consistent with underwriting strategy, defensible to a board, comprehensible to a customer who challenges a quote, and stable enough that rate changes can be justified. It has to coexist with judgement on expenses, profit margins, reinsurance cost, capital cost, demand elasticity and competitive positioning.

A black-box model that improves deviance can still produce unstable relativities, unexplained cliffs, proxy discrimination or operational complexity. Any of those is fatal to deployment, regardless of the back-test score.

What the research is doing about it

Combined actuarial neural networks (CANNs) blend a classical actuarial model with a neural-network correction. The GLM provides the baseline; the network learns residual structure, interactions or non-linear effects the baseline missed. Duval and co-authors apply this to telematics claim counts; Holvoet, Antonio and Henckaerts benchmark GLMs, gradient-boosted trees, neural nets and CANNs on frequency-severity pricing. The appeal is practical: insurers already have actuarial pricing structures, and ML should enhance rather than erase them.

Neural additive models — the Actuarial Neural Additive Model from Laub, Pho and Wong is the most interesting recent development. Dedicated subnetworks for individual covariates and selected interactions. Flexible relationships preserved alongside interpretability. The authors address pricing-specific requirements like smoothness and monotonicity head-on. That matters because pricing teams have to answer questions like “Why does the rate increase at this age?” and “Does the relativity make commercial sense?”

Global surrogate models translate the patterns a neural network discovers into GLM-like structures. The bridge from research to deployment: the network finds the signal, the surrogate makes it implementable in a rating engine and reviewable by a committee.

Explainability beyond SHAP. A single SHAP chart is not explainability. Pricing requires shape functions, interactions, calibration, monotonic constraints, partial dependence, local explanations, scenario tests and reasonableness checks. It also requires the discipline to distinguish statistical explanation from actuarial justification — a variable can be predictive without being appropriate to use.

The order to do things in

The first step is rarely a more complex model. It is better pricing infrastructure: curated datasets, exposure definitions, claims definitions, earned-premium reconciliations, peril splits, inflation adjustments, geospatial controls, policy version history and repeatable feature engineering. Without that, ML simply automates poor data faster.

The second step is benchmark modelling. GLMs, gradient-boosted trees, neural networks, CANNs and neural additive models on the same train/test splits with the same business metrics — deviance, calibration, lift, stability, interpretability, fairness diagnostics, implementation cost. Pick a winner on the full scorecard, not on test-set accuracy alone.

The third step is governance. Variable-selection rationale, excluded variables, monotonicity constraints, fairness considerations, sensitivity tests, validation results and implementation mapping. Where AI tools are used to draft any part of this, we follow the controls on our How we use AI page.

The South African overlay

The South African market adds competitive pressure, significant socio-economic diversity, uneven data quality, weather and catastrophe variation, affordability concerns and a tight conduct lens. Variables such as geography, credit-like proxies, device behaviour, occupation, distribution channel and payment patterns may raise fairness concerns depending on how they are used. The FSCA / Prudential Authority’s 2025 AI report flags explainability, customer disclosure, model risk, conduct outcomes and the ability to challenge AI-enabled decisions. Pricing models therefore need technical validation and governance evidence — not only predictive accuracy.

The unglamorous checklist

A pricing programme that earns its place includes: a clear model-use policy; a data dictionary and feature governance process; a baseline GLM or GAM for comparison; ML challenger models; interpretability diagnostics; calibration and stability tests; rate-impact analysis by portfolio segment; fairness and proxy-discrimination checks; implementation mapping into the rating engine; and sign-off by actuarial, underwriting, legal, compliance and product. Less glamorous than “deploying AI”. Also the difference between pricing that lasts and pricing that gets pulled.

If you are benchmarking modern pricing approaches against an existing GLM estate, our Finance Modernisation practice covers the data, modelling and governance work in one engagement.

One clear argument

Why pricing is not a Kaggle competition

What the research is doing about it

The order to do things in

The South African overlay

The unglamorous checklist

Sources

Practitioner writing, monthly.

Have a similar opportunity? Talk to us.