
Most engineering teams that try to improve velocity start in the wrong place. They shorten sprints, add ceremonies, or apply pressure on cycle time without first knowing what is actually slowing them down. Three months later, the numbers have not moved.
The 90-day framework in this guide is structured differently. It sequences three phases: establish an accurate baseline, identify and eliminate the primary constraint, and compound the improvement once the constraint is addressed. Each phase has specific actions, specific metrics, and specific failure modes to avoid.
Engineering velocity is the rate at which an engineering team moves work from concept to production. It is measured through a combination of lead time for changes, deployment frequency, and throughput, with DORA metrics providing the most comparable external benchmarks. Improving it is not about working faster. It is about removing the friction that slows work down between stages.
For how to measure these metrics before you start improving them, see How to Measure Engineering Velocity Without Demoralizing Your Team. For how velocity fits into engineering operations more broadly, see the Engineering Operations Guide.
Why 90 Days
The 90-day window is not arbitrary. The DORA State of DevOps Report consistently finds that delivery improvements take four to eight weeks to show up in metrics after the underlying process change is made. Interventions that look ineffective at week four often show clear impact at week eight. Teams that abandon changes before they have time to register in the data spend months switching between approaches without accumulating any gains.
Ninety days gives each phase enough time to produce measurable signal before the next phase begins. It also matches the organizational rhythm most teams operate in: a quarter is long enough to run a real experiment and short enough to maintain focus and executive attention.
The Accelerate research by Forsgren, Humble, and Kim identified four key metrics that predict software delivery performance: deployment frequency, lead time for changes, change failure rate, and time to restore service. The 90-day structure maps directly to these: Phase 1 establishes your baseline on all four, Phase 2 targets the weakest, and Phase 3 sustains the improvement and moves to the next constraint.
Phase 1 (Days 1 to 30): Establish a Real Baseline
Most teams think they know their velocity. Most are wrong. Sprint velocity in story points is not engineering velocity. It measures output within a sprint, not the time from idea to production or the rate at which value reaches users. Before any improvement program can work, you need accurate numbers on what is actually happening.
What to measure in Phase 1:
- Deployment frequency: How often does code reach production? Not how often it is merged, but how often it is deployed. If you deploy once a week, that is your baseline. If you deploy on demand, take a 30-day average.
- Lead time for changes: The time from the first commit on a piece of work to that work reaching production. This is different from cycle time (PR open to merge) and different from sprint completion rate.
- Change failure rate: What percentage of deployments require a hotfix, rollback, or follow-up patch within 24 hours?
- Time to restore service: When a deployment causes an incident, how long does it take to return to normal operation?
- PR cycle time: Average time from PR open to merge, broken down by time to first review and time in review. This is the diagnostic metric inside lead time.
The 2024 DORA State of DevOps Report benchmarks high-performing teams at deployment frequencies measured in hours or days, lead times under one day, change failure rates below 5%, and time to restore under one hour. Low performers deploy monthly or less frequently, with lead times exceeding one week and change failure rates above 15%. Your Phase 1 goal is not to hit elite numbers. It is to know where you actually are on these benchmarks.
Phase 1 actions:
- Connect your version control, CI/CD pipeline, and incident management system to a single measurement surface. Manual data collection from multiple systems produces inconsistent baselines.
- Pull 90 days of historical data where available. A 30-day snapshot is too short to separate signal from noise on metrics like change failure rate.
- Document the baseline numbers and share them with the team. Teams that see their own data improve faster than teams that do not. The act of measurement creates accountability.
- Identify which DORA metric is furthest from the high-performer benchmark. That is your Phase 2 target.
Phase 1 failure mode: Teams that spend this phase debating metric definitions rather than collecting data. Pick reasonable definitions, stick with them for 90 days, and refine later. Consistency matters more than perfection for a baseline.
Phase 2 (Days 31 to 60): Identify and Eliminate the Primary Constraint
Phase 2 is where most improvement programs go wrong. Teams identify three or four problems and try to fix all of them at once. The result is partial fixes across multiple constraints with no measurable improvement on any one of them.
The right approach is from the Theory of Constraints: find the single bottleneck with the largest impact on velocity, fix it completely, then move to the next one. Improving a non-bottleneck process does not improve throughput. It only produces local efficiency that does not propagate to overall velocity.
The four common constraints, and what they look like in the data:
PR review bottleneck. Deployment frequency is low or lead time is long, but commit-to-PR-open time is short. Time to first review is above 24 hours on average. The work is done but sitting in a queue waiting for review capacity. Fix: define review SLAs, rotate review assignments actively, and reduce PR size so reviews take less time per PR.
Test and CI pipeline bottleneck. PRs are reviewed quickly but time from merge to deploy is long. Test suite runtime above 20 minutes is the typical threshold where developers stop waiting for results and context-switch. Fix: parallelize test execution, identify and quarantine flaky tests, and move slow integration tests out of the critical path.
Deployment process bottleneck. Code is ready but deployment requires manual steps, approval chains, or release coordination that introduces waiting time. This is common in teams with environment parity issues or compliance-driven change management. Fix: automate what is automatable, streamline approvals to the minimum required, and move toward trunk-based development where the deployment surface is smaller per release.
Scope and planning bottleneck. PRs are reasonably sized, review turnaround is acceptable, CI is fast, but lead time is still long because stories are large and take multiple sprints to complete. Work-in-progress is high. Fix: break stories to fit within a single sprint, enforce WIP limits, and use cycle time data to identify stories that are consistently larger than estimated.
Phase 2 actions:
- Use your Phase 1 data to identify which constraint category applies. If you are unsure, lead time broken down by stage (commit to PR, PR to merge, merge to deploy) usually identifies the bottleneck stage directly.
- Run one targeted intervention per week. Two-week cycles are too slow for constraint elimination; one-week cycles force focus and produce faster feedback.
- Track the specific metric for the constraint you are targeting. Do not change multiple things at once. You need to know which intervention produced the result.
- If the metric does not move after two weeks of a consistent intervention, question the diagnosis before adding more interventions.
For how AI-assisted code review affects PR cycle time specifically, see Does AI Code Review Work? Data from 400+ Teams.
Phase 2 failure mode: Fixing the symptom rather than the constraint. If PR review time is slow because PRs are too large, adding more reviewers produces marginal improvement. Reducing PR size resolves the underlying cause. Always ask why the metric looks the way it does before designing the intervention.
Phase 3 (Days 61 to 90): Compound and Sustain
Phase 3 has two objectives: prevent Phase 2 improvements from eroding, and begin working on the next constraint.
Delivery improvements do not sustain themselves. Teams that improve PR cycle time in Phase 2 often see it drift back toward baseline within two to three months if they do not establish norms and monitoring that hold the new standard. The same process that produced the bottleneck (growing PR size, declining review responsiveness, accumulating pipeline debt) will recreate it without active maintenance.
Phase 3 actions:
- Establish alert thresholds on the metrics you improved. If PR cycle time exceeds your new target for three consecutive days, that is an early signal of regression worth addressing before it compounds.
- Make the improved metric visible to the team on a weekly cadence. Teams that see their own metrics maintain improvement at roughly twice the rate of teams that do not, based on data from Scrums.com's 90-day onboarding cohorts.
- Return to the Phase 1 baseline and identify the next highest-impact constraint. Repeat the Phase 2 cycle for that constraint.
- Document what worked. The interventions that produced results in your specific codebase, team structure, and delivery context are more valuable than generic playbook advice, because they are calibrated to your actual system.
For how AI sprint forecasting supports the planning accuracy improvements that Phase 3 depends on, see AI Sprint Forecasting: How It Works and What to Expect.
Phase 3 failure mode: Declaring victory and moving on. The most common reason velocity improvements do not compound over time is that teams treat improvement as a project with an end date rather than an ongoing operational discipline. Phase 3 is the transition from a time-boxed program to a continuous practice.
Metrics to Track at Each Phase
What the Data Shows
Teams that run a structured 90-day improvement program consistently outperform teams that make ad-hoc process changes. Data from Scrums.com's 90-day onboarding cohorts shows an average PR cycle time reduction of 28% by Day 90, with deployment frequency increasing for 67% of teams in the same window. The teams that see no improvement share a common pattern: they skip Phase 1, starting interventions before establishing a baseline. Without a baseline, there is no way to determine whether the constraint identified is real or whether the intervention is working.
The Accelerate research found that high-performing teams were not simply practicing more disciplines than low performers. They were practicing a smaller set of disciplines with higher consistency. The 90-day structure enforces that consistency: one constraint at a time, measured rigorously, sustained before moving on.
Common Mistakes
Measuring output instead of flow. Story points per sprint are a team productivity metric, not a velocity metric. A team can hit its story point target every sprint and still have three-week lead times. Track lead time and deployment frequency, not output volume.
Improving metrics instead of improving delivery. Teams that optimize for metric appearance rather than underlying performance produce numbers that look better without actually shipping faster. Change failure rate can be made to look better by deploying less frequently. Deployment frequency can be gamed by splitting deployments without reducing lead time. The goal is faster, more reliable delivery of real value, not favorable numbers on a dashboard.
Treating the constraint as a people problem. Most velocity constraints are process or tooling problems that look like people problems. A slow review queue is usually a process design problem (PRs too large, no review SLA, misaligned reviewer assignment) rather than evidence that developers are unresponsive. Diagnosing the data first prevents misattributing structural problems to individual performance.
Skipping the baseline. The most common reason 90-day programs fail is that teams start Phase 2 immediately. Without a baseline, Phase 2 interventions cannot be evaluated, and Phase 3 has no standard to sustain. The baseline investment, typically one to two weeks to get accurate data, pays back in every subsequent phase.
Frequently Asked Questions
What is engineering velocity and how is it measured?
Engineering velocity is the rate at which an engineering team moves work from concept to production. It is measured using DORA metrics: deployment frequency (how often code reaches production), lead time for changes (time from first commit to production), change failure rate (percentage of deployments requiring a fix), and time to restore service (how long incidents take to resolve). Story points per sprint measure output within a sprint, not velocity in the DORA sense.
How long does it take to see measurable velocity improvement?
Most process changes take four to eight weeks to show up in delivery metrics. Interventions targeting PR review bottlenecks tend to produce results in two to three weeks. Pipeline and deployment process improvements take four to six weeks. The 90-day window is structured to allow each phase enough time to produce measurable signal before the next intervention begins.
What is the most common engineering velocity bottleneck?
PR review time is the most commonly identified bottleneck in teams using DORA metric tracking. The typical pattern: commit-to-PR time is short (the work is being done quickly) but time to first review exceeds 24 hours, extending lead time significantly. The fix is usually a combination of PR size reduction (shorter reviews get prioritized faster) and explicit review SLAs rather than implicit availability expectations.
How do you improve velocity without burning out the team?
Velocity improvement programs that increase output demands rather than reduce friction consistently produce short-term gains followed by burnout and regression. The constraint-based approach works differently: it identifies and removes friction from the delivery process, which reduces effort per unit of output rather than increasing effort. When PR review is slow because PRs are too large, the fix (smaller PRs) reduces effort for reviewers and authors simultaneously. The Accelerate research found that high-performing teams did not work harder than low performers; they worked with fewer impediments.
What should an engineering team measure in the first 30 days of a velocity improvement program?
The four DORA metrics (deployment frequency, lead time for changes, change failure rate, time to restore service) plus PR cycle time broken down into time to first review and time in review. Ninety days of historical data provides a more reliable baseline than 30 days. The goal of Phase 1 is not improvement; it is an accurate picture of current performance against external benchmarks.
If you want to run this 90-day program with automated metric tracking, Scrums.com connects to your GitHub, Jira, and CI/CD pipeline and tracks all four DORA metrics plus PR cycle time in one place, with benchmarks from 400+ engineering teams to contextualize your baseline. To discuss your team's setup, start a conversation with our team.











