DORA Metrics Guide for Engineering Leaders

Scrums.com Editorial Team
Scrums.com Editorial Team
September 23, 2025
13 mins
DORA Metrics Guide for Engineering Leaders

Most engineering teams are measuring the wrong things. Story points, commit counts, sprint velocity: these tell you how busy teams are, not how well the system delivering software is actually performing.

DORA metrics fix that. They are the only set of engineering metrics with a validated, research-backed link to organisational performance. This guide explains what each one measures, what good looks like, how to improve, and where DORA falls short.

What Are DORA Metrics?

DORA metrics are four software delivery performance indicators developed by the DevOps Research and Assessment (DORA) programme. They measure the speed and stability of software delivery across four dimensions:

  • Deployment Frequency: how often code reaches production
  • Lead Time for Changes: time from code commit to production deployment
  • Change Failure Rate: the percentage of deployments that cause degraded service
  • Mean Time to Recovery (MTTR): how quickly teams restore service after an incident

The first two metrics measure speed. The second two measure stability. Used together, they give a balanced picture of delivery health that neither set alone can provide.

The Research Behind DORA Metrics

DORA metrics emerged from the largest longitudinal study of software delivery performance ever conducted. The research was led by Dr. Nicole Forsgren and published in the book Accelerate (2018), co-authored with Jez Humble and Gene Kim. It analysed data from over 32,000 technical professionals across multiple years and industries.

The core finding: four specific metrics reliably predicted both software delivery performance and organisational performance. Teams in the elite tier were significantly more likely to meet or exceed their commercial goals compared to low performers.

The DORA State of DevOps Report, published annually, continues to track performance trends across thousands of engineering organisations globally. It is the most authoritative ongoing source of delivery performance benchmarks available.

The Four DORA Metrics Explained

Deployment Frequency

Deployment frequency measures how often an organisation successfully deploys to production. It is the most direct indicator of delivery cadence.

High deployment frequency signals small batch sizes, reliable automation, and low risk per release. When teams are confident each deployment is safe, they deploy more often. When deployments are risky or painful, teams batch work to reduce the number of releases. This ironically increases risk per release, since each deployment contains more changes and is harder to roll back cleanly.

How to measure it: Count successful production deployments per time period. Exclude deployments to staging or test environments. The most reliable source is your CI/CD platform.

How to improve it: Break large releases into smaller, independent deployments. Automate the deployment pipeline end to end. Use feature flags to decouple deployment from feature release. Move toward trunk-based development to eliminate long-lived feature branches.

The mistake most teams make: Assuming weekly deployments represent a strong position. Per DORA research, weekly is medium tier. Elite teams deploy multiple times per day.

Lead Time for Changes

Lead time for changes measures the time from a code commit to that commit running in production. It captures the throughput of your delivery pipeline from end to end.

Long lead times are usually caused by one of three things: slow code review (work queuing for a reviewer), slow or manual testing (the pipeline takes hours to validate a change), or manual approval gates (changes waiting for sign-off before deployment).

How to measure it: Record the timestamp of each code commit and the timestamp of its production deployment. The difference is lead time. GitHub, GitLab, and most CI/CD platforms provide this data automatically.

How to improve it: Reduce review queue sizes by limiting work in progress. Parallelise test execution. Automate approvals where the risk is genuinely low. Remove unnecessary staging environment steps that add wait time without adding quality signal.

Common confusion: Lead time for changes measures from code commit, not from when a story was started. That broader measure is cycle time. It's useful, but it's a different metric covering a different part of the process.

Change Failure Rate

Change failure rate is the percentage of deployments that result in degraded service, a hotfix, or a rollback. It measures the quality and risk profile of your deployment process.

A low change failure rate means your team is deploying with confidence, backed by good testing coverage and review practices. A high rate means deployments are frequently causing problems, which erodes confidence in the process, reduces deployment frequency, and increases MTTR when emergency fixes are needed.

How to measure it: (Number of deployments requiring a hotfix, rollback, or causing a service degradation / total deployments) x 100. Track this in your incident management tool and cross-reference against deployment logs.

How to improve it: Increase automated test coverage. Use canary deployments or blue/green deployments to limit blast radius. Implement feature flags so failed features can be disabled without a rollback. Run blameless postmortems to identify systemic causes rather than individual errors.

What counts as a failure: Most organisations count only full outages. The DORA research includes any deployment requiring remediation, including partial degradations. Widening the definition gives you more signal and a more honest picture of deployment risk.

Mean Time to Recovery (MTTR)

MTTR measures how quickly a team restores service after a production incident. It is a measure of incident response capability, observability maturity, and on-call practice quality.

MTTR is the stability metric that matters most in production. Deployment frequency tells you how often you ship; MTTR tells you what happens when something goes wrong. Elite teams recover in under an hour. Low performers take days.

How to measure it: Record when an incident is detected and when service is fully restored. Use your incident management platform (PagerDuty, Opsgenie, or similar) to automate this tracking and preserve accurate timestamps.

How to improve it: Invest in observability so failures are detected faster. Write runbooks for common incident types so responders don't have to improvise under pressure. Practise incident response through chaos engineering or game days. Define clear escalation paths so the right people are paged immediately.

The measurement trap: Many teams record MTTR from when the fix is deployed, not from when service is confirmed restored. This understates actual recovery time and produces metrics that look better than the customer experience actually was.

DORA Performance Benchmarks

The DORA research classifies engineering organisations into four performance tiers. The table below shows the benchmark ranges for each tier across all four metrics.

Metric Elite High Medium Low
Deployment Frequency Multiple times/day Once per day to once per week Once per week to once per month Once per month or less
Lead Time for Changes Less than 1 hour 1 day to 1 week 1 week to 1 month 1 to 6 months
Change Failure Rate 0-5% 5-10% 10-15% 46-60%
Mean Time to Recovery Less than 1 hour Less than 1 day Less than 1 day 1 week to 1 month

Source: DORA State of DevOps Report

Two things stand out in these benchmarks. First, the gap between elite and low performers is not incremental: elite teams deploy hundreds of times more frequently than low performers. Second, elite teams achieve both speed and stability at the same time. The assumption that you must trade one for the other is not supported by the data. The research consistently shows the opposite.

DORA Metrics vs. SPACE Metrics: What Is the Difference?

DORA metrics measure delivery performance. The SPACE framework, published in 2021 by Dr. Nicole Forsgren and colleagues, measures developer productivity more broadly across five dimensions:

  • Satisfaction and Wellbeing: developer happiness, NPS, and sense of purpose
  • Performance: the outcomes and impact of work, not just its volume
  • Activity: meaningful output signals (code reviews, documentation, deployments) rather than raw commit counts
  • Communication and Collaboration: the quality of team interaction and knowledge sharing
  • Efficiency: flow state, interruptions, and wait times in the delivery process

DORA is a subset of what SPACE covers, focused on the delivery mechanics most directly connected to business outcomes. For most organisations, DORA is the right starting point: it is fully automatable, it has published benchmarks, and it focuses on measurable delivery performance rather than harder-to-quantify satisfaction signals.

SPACE adds value once DORA baselines are established. If your DORA metrics are healthy but your team is burning out or delivering work that does not land well, SPACE surfaces what DORA cannot see. The two frameworks work together rather than competing.

For a broader look at how both frameworks fit into engineering performance measurement, see our guide to engineering operations for engineering leaders. For a direct comparison of both frameworks, see SPACE vs DORA: which framework to use.

DORA Metrics in FinTech and Regulated Industries

DORA metrics apply to regulated industries, but the benchmarks need context. Understanding the gaps matters before comparing a bank or FinTech directly against the published tiers.

Deployment frequency in regulated environments is typically lower than the general DORA population because of change advisory board (CAB) approvals and release window constraints. This creates artificially low deployment frequency that does not reflect actual team capability. The goal is not to eliminate governance, but to automate it where the risk is genuinely low and preserve manual gates where it is not.

Change failure rate carries additional weight in regulated industries. A failed deployment in a payment system is not only a reliability incident; depending on the nature of the failure, it may trigger regulatory reporting obligations under frameworks like FCA operational resilience rules or PCI-DSS incident requirements.

MTTR is directly tied to operational resilience regulation. The FCA's PS21/3 policy statement and the EU's Digital Operational Resilience Act (DORA regulation, distinct from the DORA research framework) both mandate specific recovery time objectives for critical financial services. For engineering teams in scope, MTTR is not just a performance metric: it has a compliance dimension that connects directly to regulatory obligations.

For engineering teams in financial services, improving DORA metrics and meeting operational resilience requirements are largely the same work approached from different angles. A detailed breakdown of how this plays out in practice is in our guide to DORA metrics for FinTech teams.

How to Implement DORA Metrics in Your Organisation

The most common implementation mistake is trying to improve metrics before establishing baselines. Start by measuring, then set targets based on what you find.

Step 1: Choose your data sources. Deployment frequency and lead time can typically be pulled from your version control system (GitHub, GitLab) and CI/CD platform. Change failure rate requires linking deployment data to your incident management tool. MTTR requires an incident tracking system with accurate detection and resolution timestamps.

Step 2: Run the baseline for 30 to 90 days. Do not act on the first week of data. You need enough history to understand your normal range and filter out outliers. Seasonal variation (holiday code freezes, sprint end batching) will distort short windows and lead to misleading conclusions.

Step 3: Identify the constraint. Look at which metric puts your organisation furthest from the next tier up. That is your starting point. Pushing deployment frequency higher without addressing a high change failure rate will just cause more frequent incidents.

Step 4: Set team-level targets, not only organisational averages. Aggregate averages hide important variation. A team deploying once a month while others deploy daily has a specific, diagnosable problem. The average makes it invisible.

Step 5: Review on a cadence. Weekly at the team level to drive improvement conversations. Monthly at the leadership level to track trends and connect metrics to business outcomes.

The Scrums.com platform automates the data collection side of this: connecting to GitHub, Jira, and CI/CD pipelines to calculate DORA metrics automatically, benchmark against 400+ organisations, and surface the team-level variance that aggregate reporting hides.

Common DORA Metrics Mistakes

Using DORA for performance management

When teams know their DORA metrics feed into performance reviews, they optimise for the numbers rather than the underlying health. Deployment frequency climbs as teams push trivial commits. Change failure rate drops as teams stop counting partial degradations. The metrics become noise.

DORA metrics are a diagnostic tool. Keep them separate from individual or team performance evaluations, or you will corrupt the data you need to manage the system.

Measuring only at the organisational level

Aggregate DORA scores look fine when high-performing teams mask low-performing ones. The signal worth acting on is in the variance: which teams are struggling and why. Always review team-level data alongside the aggregate.

Treating all four metrics as equal priority

They are not equally important for every organisation at every stage. A team with elite deployment frequency and a 40% change failure rate should not be working on deployment frequency. Find the weakest link first.

Setting targets before baselines

Setting a target of elite deployment frequency before knowing your current state, and understanding why you are where you are, sets teams up for gaming rather than genuine improvement. Baseline first. Set targets that reflect what is actually achievable in your context.

Ignoring the connections between metrics

DORA metrics interact with each other. Pushing deployment frequency higher without addressing change failure rate typically produces more frequent incidents. Improving lead time without addressing MTTR means failures resolve more slowly even as new features ship faster. The four metrics are a system; optimise them as one.

How Scrums.com Tracks DORA Metrics

Tracking DORA metrics manually is possible at small scale. At any meaningful team size, the data collection overhead becomes a tax on engineering time that few organisations can sustain with accuracy.

Scrums.com connects to your existing toolchain (GitHub, GitLab, Jira, CI/CD pipelines, and 50+ other integrations) and calculates all four DORA metrics automatically. Engineering leaders get:

  • Real-time DORA dashboards across teams, projects, and time periods
  • Benchmark comparisons against 400+ organisations in the Scrums.com network
  • Team-level variance tracking to surface where the improvement work actually is
  • Trend analysis to separate genuine improvement from statistical noise
  • Compliance-ready reporting for regulated industry requirements

See how the platform works

DORA Metrics FAQ

What are DORA metrics?

DORA metrics are four software delivery performance indicators: deployment frequency, lead time for changes, change failure rate, and mean time to recovery. Developed by the DevOps Research and Assessment programme and validated through research across tens of thousands of engineering professionals, they are the most widely adopted framework for measuring engineering delivery performance.

What is a good DORA metrics score?

Elite performance means deploying multiple times per day, lead time under one hour, change failure rate below 5%, and MTTR under one hour. Most organisations sit between medium and high tier. The goal is not to reach elite across all four metrics overnight; it is to move up one tier at a time, starting with your weakest metric first.

How do you calculate DORA metrics?

Deployment frequency: count successful production deployments per time period. Lead time: measure the time from code commit to production deployment. Change failure rate: divide failed or degraded deployments by total deployments. MTTR: measure from incident detection to full service restoration. Most organisations automate this through engineering intelligence platforms rather than calculating manually.

What is the difference between DORA metrics and SPACE metrics?

DORA metrics measure software delivery performance across four specific, automatable dimensions. The SPACE framework is broader, covering satisfaction and wellbeing, performance, activity, communication and collaboration, and efficiency. DORA has published benchmarks and is fully automatable; SPACE requires more qualitative input. Most organisations start with DORA and layer in SPACE dimensions as their measurement practice matures.

How long does it take to improve DORA metrics?

Teams moving from low to medium performance typically see measurable improvement within 90 days of focused effort on the right constraint. Moving from medium to high, or high to elite, takes longer because the improvements required shift from technical (pipeline automation, test coverage) to organisational (culture, process, governance). Realistic timelines depend heavily on how much change management capacity sits alongside the technical improvement work.

Do DORA metrics apply to FinTech and regulated industries?

Yes, but the published benchmarks need context. Regulated industries typically show lower deployment frequency due to change governance requirements, which makes direct tier comparisons misleading. Tracking improvement over time within the regulatory context is more useful than benchmarking against the general population. Change failure rate and MTTR are particularly relevant for financial services firms with operational resilience obligations. See our guide to DORA metrics for FinTech teams for the full breakdown.

Related Reading

Eliminate Delivery Risks with Real-Time Engineering Metrics

Our Software Engineering Orchestration Platform (SEOP) powers speed, flexibility, and real-time metrics.

As Seen On Over 400 News Platforms