AI in Banking Software: A Practitioner's Guide

Boitumelo Mosia
Boitumelo Mosia
March 15, 2024
3 mins
AI in Banking Software: A Practitioner's Guide

For most of the last decade, AI in banking meant proofs of concept buried in innovation labs. That has changed. Generative AI is now embedded in customer service, underwriting, compliance, and developer tooling inside production banking platforms. Boards are now asking engineering leaders at mid-market banks and FinTechs a new question: what is our AI roadmap, how are we governing it, and how quickly can we ship?

This is a practitioner's view of where AI is delivering measurable value in banking software today, what it takes to govern it responsibly, and where programmes are most likely to fail.

Where AI creates real value in banking software today

The AI narrative in banking is wide, but the value concentrates in a short list of use cases that meet three tests: the data exists, the risk is bounded, and the outcome is measurable. The use cases below meet all three.

Customer service automation that actually deflects cases

First-generation banking chatbots failed because they were intent-matched and brittle. Modern retrieval augmented generation (RAG) assistants, grounded in product documentation, policy libraries, and the customer's own transaction history, resolve 30 to 60 percent of tier-one cases without agent escalation in our client work. The engineering pattern is consistent: a private vector index of internal content, a tool layer that exposes a small set of safe customer actions (balance lookup, card freeze, dispute filing), and a model prompt that refuses anything outside scope. Guardrails are not optional. Logging and human-in-the-loop review for escalations are required to satisfy internal audit. For a deeper look at how these assistants are architected, see our guide on AI agents in banking.

Fraud detection and anomaly scoring

Classical machine learning already dominates real-time fraud detection. The shift underway is in how banks operationalise those models: feature stores that serve consistent signals to both training and inference, streaming pipelines that score transactions in under 100 ms, and explainability layers that let investigators see why a transaction was flagged. Generative AI plays a supporting role here, summarising case notes and drafting disposition letters rather than replacing the scoring engine itself.

Regulatory compliance and reporting

Compliance teams are early and enthusiastic adopters. Large language models are strong at reading long regulatory texts, extracting obligations, mapping those obligations to internal controls, and drafting the narrative sections of regulatory filings. For US banks navigating CRA, BSA, and the OCC's heightened standards, and for EU banks under DORA (in force since January 2025), AI-assisted evidence collection and gap analysis have become a differentiating capability for compliance functions that were previously the slowest part of the business.

Personalised product experiences

Personalisation in banking has moved past next-best-offer banners. The current generation uses transaction embeddings and behavioural signals to surface savings nudges, renegotiate fixed expenses, and flag cash flow problems before they hit overdraft. The engineering challenge is less about the model and more about the data platform: reliable event streams, consented feature sets, and a personalisation layer that respects regulatory constraints around profiling. Done well, this is the shift from cross-sell campaigns to proactive financial health.

Developer productivity and code generation

This is the quietest but most measurable win. Banking engineering teams adopting AI coding assistants report 20 to 40 percent throughput gains on routine work in our teams' experience: test writing, documentation, data migration scripts, and legacy COBOL to Java translation. The catch is that secure-by-default configuration matters. Models must never send proprietary code to public endpoints, and IP indemnity clauses in enterprise licences are load-bearing. For the QA and code quality side of this pattern, see AI agents for QA and code quality in banking.

The architecture patterns that separate production AI from demos

A working AI feature in a bank looks very different from a notebook demo. Four architectural decisions separate systems that survive audit from those that get quietly switched off. These sit on top of the fundamentals covered in our guide to modern banking architecture.

The first is deployment topology. Regulated banks rarely send customer data to a shared model endpoint. The dominant patterns are dedicated model instances inside a hyperscaler's private tenancy (Azure OpenAI, AWS Bedrock with customer-managed keys, GCP Vertex AI with VPC-SC), or self-hosted open models running on dedicated GPUs inside the bank's own VPC. The second pattern is faster to implement. The third is necessary when the data classification prohibits cloud egress entirely.

The second is evaluation. Offline benchmarks aren't enough. Production AI features need continuous evaluation against a curated set of real cases, with regression tests that fail the build when model output drifts. This is where most banking AI projects stall: the team that built the feature lacks the evaluation muscle to keep it safe once it is live.

The third is observability. Prompt, response, retrieved documents, tool calls, and latency all need to be logged at the span level. Audit will ask for these records. Incident response cannot happen without them. Standard LLM observability tools (Langfuse, Arize, Traceloop, vendor-native tracing) are now table stakes.

The fourth is governance. Model risk management frameworks, already mandatory for credit models under SR 11-7 in the US, are being extended to generative AI. A production AI feature needs a model card, an intended-use statement, a data lineage document, and a named model owner. Treat this as a day-one engineering requirement, not a quarter-four compliance scramble.

The regulatory environment that shapes what you can ship

Banking AI is not a greenfield domain. Regulators have moved quickly and the picture as of 2026 is consistent: transparency, explainability, and human oversight are mandatory. The specific obligations depend on where you operate.

In the European Union, the AI Act entered force in August 2024 with phased obligations. Systems used for credit scoring and insurance pricing fall into the high-risk category. Banks building or deploying such systems need documented risk management processes, data governance, technical documentation, and post-market monitoring. DORA, in force since January 2025, adds operational resilience requirements that extend to AI vendors and model hosting arrangements. For the DevOps implications, see DORA compliance for banking DevOps.

In the United States, federal regulators have not yet converged on a single AI rule, but the OCC, FDIC, and Federal Reserve have all issued guidance signalling that existing model risk management expectations (SR 11-7, OCC 2011-12) apply to generative AI. State-level action, particularly in New York and California, is tightening disclosure obligations on AI-driven decisioning.

In the UK, the FCA's approach remains principles-based, with the Consumer Duty being the operative lens for AI in customer-facing products. Firms are expected to demonstrate that AI-driven outcomes are good for customers and that vulnerable customers are not disadvantaged.

None of this should slow down the work. It does mean that compliance engagement cannot be a post-launch afterthought.

Build, buy, or partner: a practitioner's decision framework

Banks face the same three options they always have, but the trade-offs look different for AI workloads.

Buying off-the-shelf AI features inside an existing core banking, loan origination, or fraud platform is the fastest path. It is also the narrowest: you get what the vendor ships, on their roadmap, with limited ability to differentiate.

Building in-house is the right choice when the use case touches your proprietary data models, when differentiation matters commercially, or when the regulatory posture demands that models and data never leave your control. The cost is talent: senior ML engineers who understand both the model layer and the banking domain command top-of-market salaries and 12-month-plus hiring timelines.

Partnering with a software development firm that has shipped regulated AI is the middle path mid-market banks and FinTechs often take. It is faster than building from zero, more flexible than buying, and lets internal teams focus on the data and domain knowledge that only they have. The selection criteria matter: look for partners with production AI experience in financial services, a track record of passing regulatory audits, and a clear stance on IP ownership and data residency.

Common pitfalls we see in banking AI programs

Five patterns predictably kill banking AI programs.

  1. Starting with the model instead of the data. If event data is inconsistent, customer records are duplicated, and product metadata is maintained in spreadsheets, no amount of model sophistication will save the feature. Data platform investment is the unglamorous prerequisite.
  2. Treating prompts as configuration. Prompts are code. They need version control, review, testing, and staged rollout. A prompt change that widens a tool's scope is functionally equivalent to a production deployment.
  3. Skipping the red team exercise. Banks that ship customer-facing AI without adversarial testing learn about jailbreaks, prompt injection, and data exfiltration attempts from customers and regulators rather than from their own security team. Build the capability before you need it.
  4. Underinvesting in evaluation. The team that can ship an LLM feature in two weeks can rarely keep it safe for two years. Our teams budget continuous evaluation as roughly 30 percent of the total cost of ownership for any production AI feature.
  5. Conflating pilots with platforms. A successful pilot proves a use case. It does not prove a deployment pattern that can support the next ten use cases. Invest in a shared AI platform (model gateway, evaluation harness, observability, secrets handling) before the portfolio of AI features grows.

A 90-day plan for engineering leaders starting now

For a CTO or VP of Engineering who has been asked to produce a credible AI plan in the next quarter, the following sequence is practical and low-risk.

Weeks 1 to 3: establish a governance baseline. Confirm which regulatory regimes apply, appoint a named AI product owner, and agree on a short list of use cases that meet the three-test bar (data exists, risk bounded, outcome measurable). Pick two to prioritise.

Weeks 4 to 8: stand up the platform. Deploy a private model endpoint inside your existing cloud tenancy, integrate observability, and agree on the evaluation harness. Ship one internal-only use case (usually a developer productivity tool or a compliance drafting assistant) to exercise the platform end to end.

Weeks 9 to 12: ship the first customer-facing feature as a limited cohort release. Instrument everything. Run a red team exercise before the cohort expands. Establish the post-launch monitoring cadence.

By the end of the quarter, the conversation with the board has shifted from "what is our AI strategy" to "here is the platform, here is the first feature in production, and here is our backlog."

How Scrums.com helps banks ship AI responsibly

Scrums.com builds production-grade AI features for regulated financial services clients as part of our banking software development services. Our engineering teams bring both the machine learning depth and the regulatory experience needed to take AI from pilot to production without the governance gaps that derail most programs. We work alongside internal teams on architecture, evaluation, observability, and the end-to-end delivery of customer-facing and internal AI tooling inside core banking, lending, payments, and compliance platforms.

If you are building an AI roadmap for your banking or FinTech platform and want a partner who has shipped this before, start a project with us.

FAQ

What is the best first AI use case for a mid-market bank?

A low-risk, internal-only use case such as a compliance drafting assistant or a developer productivity tool. It exercises the platform end to end without customer exposure and gives the team a safe environment to build evaluation and observability capabilities before touching production customer traffic.

How do US and EU AI regulations affect banking software engineering teams?

The EU AI Act classifies credit scoring and insurance pricing as high-risk, requiring documented risk management, data governance, technical documentation, and post-market monitoring. US banking regulators have extended existing model risk management expectations (SR 11-7) to generative AI. In both regimes, explainability, human oversight, and audit-ready logging are non-negotiable.

Should a bank build AI features in-house or use vendor platforms?

It depends on the use case. Vendor platforms are faster for commoditised features like fraud scoring or chatbot frameworks. In-house builds make sense when the use case touches proprietary data models or when differentiation matters commercially. Mid-market banks often partner with software firms that have shipped regulated AI before to accelerate build timelines without compromising control.

What architecture patterns are typical for production AI in banking?

Dedicated model endpoints inside a private cloud tenancy (Azure OpenAI, AWS Bedrock with customer-managed keys, GCP Vertex AI), retrieval augmented generation for grounded responses, a tool layer for safe actions, continuous evaluation against curated test sets, and span-level observability for audit. Model risk governance is a day-one engineering requirement.

How much should a bank budget for ongoing AI model evaluation?

In our experience, roughly 30 percent of the total cost of ownership for any production AI feature. Evaluation, red teaming, and drift monitoring are not optional. Teams that underinvest here ship features that cannot survive audit or adversarial testing once they are live.

Eliminate Delivery Risks with Real-Time Engineering Metrics

Our Software Engineering Orchestration Platform (SEOP) powers speed, flexibility, and real-time metrics.

As Seen On Over 400 News Platforms