Explainable AI for Credit Scoring with SHAP-Calibrated Ensembles: A Multi-Market Evaluation on Public Lending Data

Rapid digitisation has reshaped consumer lending, with machine learning systems now central to underwriting decisions. This transition has improved prediction accuracy while creating concerns about opacity, fairness, and regulatory compliance. The study developed an explainability-first framework for credit scoring that integrates calibrated gradient-boosting models with SHAP and LIME explanations, cost-aware threshold selection, and multi-criteria fairness monitoring. This framework was evaluated across three public lending datasets representing different data-richness environments: Home Credit Default Risk (N=307,511, default rate 8.07%), Default of Credit Card Clients (N=30,000, default rate 22.12%), and LendingClub (N=887,379, default rate 5.63%). XGBoost with SHAP achieves an AUC of 0.892±0.009 to 0.923±0.008 across datasets while maintaining explanation stability (Kendall τ=0.94±0.03) and good calibration (Brier score 0.119±0.003 to 0.154±0.004). Fairness-constrained thresholding reduces demographic-parity gaps by 59-67% (95% CI: 52-74%) with cost increases of 3.2±0.8% to 5.8±1.3%. A complete reproducibility artefact, including code repository, model cards, adverse-action templates, and governance frameworks, was provided. Code and data processing scripts are available at [repository URL].