DORA Metrics for DevOps: What Engineering Leaders Need to Know in 2026

Software delivery is harder to reason about than it used to be. Teams ship more code, use more tools, and deploy more frequently, but the question that keeps surfacing in leadership meetings is the same one it was five years ago: are we actually getting better?

DORA metrics exist to answer that question with data instead of intuition.

Developed by the DevOps Research and Assessment team (now part of Google Cloud), the DORA framework provides a shared, evidence-based language for measuring software delivery performance. It has become the most widely adopted measurement system in engineering, used by thousands of organizations to understand how work moves from commit to production and how stable it remains once it gets there.

But if your understanding of DORA stopped at the original four metrics and the elite/high/medium/low tiers, it is time to update. The framework has evolved significantly. It now includes a fifth metric (Rework Rate), a quasi-metric (Reliability), and a complete replacement of the old performance tiers with seven team archetypes.

This guide covers what DORA metrics are, how they changed in 2024 and 2025, how AI is reshaping their interpretation, and how engineering leaders can use them to drive real improvement in 2026, especially when paired with a practical 2026 guide for engineering leaders on using all five DORA metrics.

What Are DORA Metrics?

DORA metrics are performance measurements that help teams deliver software more efficiently and quickly by tracking four core metrics: deployment frequency, lead time for changes, change failure rate, and mean time to recover (MTTR). These metrics are designed to measure team-level system capabilities, not individual performance, ensuring the focus remains on collective improvement and avoiding counterproductive behaviors.

The original framework included four metrics. As of 2025, it includes five formal metrics and one quasi-metric, grouped into two categories, and many teams still look for a comprehensive overview of DORA metrics and their role in software delivery performance.

Throughput Metrics

  • Deployment Frequency: Deployment frequency is the average number of daily finished code deployments to any given environment, serving as an indicator of a team's overall efficiency and responsiveness.
  • Lead Time for Changes: Lead time for changes measures the average speed at which a DevOps team delivers code, from commitment to deployment, indicating the team's capacity and overall ability to respond to changes in the environment.
  • Failed Deployment Recovery Time (formerly Mean Time to Recovery): The time to restore service, also known as mean time to recovery (MTTR), is the average time taken to recover from a failure in production, reflecting how quickly an organization can resolve issues and restore services. This metric was reclassified from stability to throughput in recent DORA reports. The reasoning: fast recovery after a failed deployment supports delivery flow, helping teams deploy again sooner. This reframing shifts the interpretation from “fixing failures” to maintaining operational momentum.

Stability Metrics

  • Change Failure Rate: The change failure rate is the percentage of deployments that cause a failure in production, indicating the quality of the deployment process. High change failure rates typically point to gaps in testing, review, or deployment practices.
  • Rework Rate: Introduced in the 2024 Accelerate State of DevOps report, rework rate measures the proportion of unplanned deployments made to fix user-visible issues after a production incident. A high rework rate means the team is spending significant capacity on reactive fixes rather than planned delivery. Think of it as a companion signal to Change Failure Rate: one tells you how often things break, the other tells you how much effort goes into cleaning up after they do. To improve time to restore service after failures, teams should benchmark their response and recovery processes to identify where they can respond faster.

Quasi-Metric: Reliability

Reliability sits alongside the five formal metrics as a broader indicator of operational performance. It captures the overall dependability of the system from an end-user perspective, separate from the deployment-specific signals that Change Failure Rate and Rework Rate track.

Why DORA Metrics Matter for DevOps Teams

Before DevOps practices became mainstream, most organizations assumed throughput and stability were inherently opposed. Ship faster, expect more breakage. The DORA research proved this wrong. When you align development and operations around the same goals, the things teams do to improve throughput also increase stability. Automating deployments to increase speed, for example, also makes deployments more reliable and repeatable.

This is the core insight: speed and stability are not a trade-off. They reinforce each other when the right practices are in place.

Common Language Across Teams

For engineering leaders, DORA metrics serve as a common language across teams. In large organizations where multiple teams work on different products with different release cadences, DORA provides a consistent vocabulary for discussing delivery performance. A VP of Engineering can compare cycle time patterns across ten teams without relying on each team’s self-reported narrative.

Focused Improvement

Focused improvement is another key benefit. Retrospectives generate ideas, but those ideas do not always translate into measurable outcomes. DORA metrics close that loop and provide a baseline for setting goals, measuring progress, and helping teams identify areas that need improvement. As reliable metrics and practical key performance indicators, they let teams identify areas for improvement, and a practical overview of DORA metrics and how to implement them effectively can help leaders turn data into action. The statistical model of practices and capabilities produced by the research offers concrete experiments teams can run to improve and drive continuous improvement.

Alignment with Business Outcomes

Alignment with business outcomes is a third function. DORA research across multiple years has consistently shown that teams performing well on these metrics are more likely to meet organizational performance targets, and research has found elite performers are twice as likely to meet those targets. The metrics connect engineering work to business goals without requiring individual-level tracking. DORA metrics reflect team-wide system capabilities, not individual performance reviews.

What Changed in 2024–2025: The End of Performance Tiers

For years, DORA benchmarked teams into four performance tiers: elite, high, medium, and low. These tiers became the most widely cited element of the framework. Teams chased "elite" status. Dashboards color-coded against them. But the tiers were never meant to be used that way.

As RedMonk's analysis pointed out, DORA used cluster analysis, not fixed performance thresholds. Teams were not ranked against static definitions of what it meant to be "elite." The clusters were determined each year based on what metrics loaded together. Many organizations took a descriptive model and treated it as a prescriptive scorecard, but DORA was always intended to be a pattern detector, not a ruler.

In the 2025 report, DORA retired the four-tier classification entirely.

Seven Team Archetypes Replace Linear Tiers

In place of the performance tiers, the 2025 report introduced seven team archetypes based on throughput, stability, and team well-being. The shift is significant: knowing your team type draws a map of what capabilities to improve, something the old performance tiers never did well.

The seven archetypes are:

  • Foundational Challenges — teams in survival mode with significant process gaps
  • Legacy Bottleneck — teams constantly reacting to unstable systems
  • Constrained by Process — teams consumed by inefficient workflows
  • High Impact, Low Cadence — teams that deliver value but ship infrequently
  • Stable and Methodical — teams with reliable delivery but moderate throughput
  • Pragmatic Performers — teams balancing speed and stability effectively
  • Harmonious High Achievers — teams excelling across delivery, stability, and well-being

Two teams can show similar DORA scores for completely different reasons. One may be genuinely healthy and well-balanced. Another may be performing under unsustainable pressure. The archetype model captures that distinction, where a single metric dashboard cannot.

The model is built on eight drivers of holistic performance, including burnout, friction, and time spent on valuable work. These dimensions form the foundation for the archetypes, ranging from Legacy Bottleneck (about 11% of respondents) to Harmonious High Achievers (roughly 20%).

How AI Is Reshaping DORA Metrics

The 2025 report was renamed from "State of DevOps" to "State of AI-Assisted Software Development." That is not a cosmetic change. It is a change of scope for the entire report.

According to the Google Cloud announcement, 90% of survey respondents report using AI at work. More than 80% believe it has increased their productivity. But the relationship between AI adoption and DORA metrics is more complicated than the adoption numbers suggest.

AI Amplifies What Already Exists

The central finding of the 2025 report is direct: AI does not fix a team; it amplifies what is already there. Strong teams use AI to become even better and more efficient when they have already mastered the core DORA metrics and their implementation. Struggling teams find that AI only highlights and intensifies their existing problems.

This has practical implications for how engineering leaders interpret their DORA numbers:

  • Deployment frequency may rise without meaningful delivery improvement. If AI tools help developers merge more PRs, deployment frequency goes up on the dashboard. But if those PRs carry higher defect rates or require more rework, the team is deploying more without delivering more value.
  • Lead time can compress while review quality erodes. As Thoughtworks' analysis of the report noted, a developer might produce ten times the volume of code, but if that code introduces subtle bugs, security flaws, or architectural debt, the net effect can be negative. Teams then spend more time on rework, debugging, and managing fragile systems.
  • Traditional productivity measures break down. Lines of code, story points, and even PR counts become misleading in an AI-augmented environment. The Thoughtworks perspective identifies a new category of waste, "AI engineering waste," that emerges when organizations adopt AI tools without adjusting their measurement systems.
  • Stability metrics carry more diagnostic weight. Change failure rate and rework rate become the critical check on whether throughput gains are real or illusory. When throughput rises but stability does not keep pace, these metrics surface the gap before it compounds into delayed quarters and burned-out teams.

The AI Capabilities Model

Alongside the team archetypes, the 2025 report introduced the DORA AI Capabilities Model, identifying seven foundational practices that determine whether AI benefits scale beyond individuals to organizational performance:

  • Clear and communicated AI stance
  • Healthy data ecosystems
  • AI-accessible internal data
  • Strong version control practices
  • Working in small batches
  • User-centric focus
  • Quality internal platforms

The model reinforces a theme that runs through the entire report: the greatest return comes not from the AI tools themselves, but from a strategic focus on the quality of internal platforms, the clarity of workflows, and the alignment of teams.

Platform Engineering as the Foundation

The 2025 DORA report found that 90% of organizations have adopted at least one platform, and there is a direct correlation between a high-quality internal platform and an organization's ability to unlock the value of AI. Platform engineering is not optional infrastructure. It is the foundation that determines whether AI tools accelerate delivery or accelerate dysfunction.

Using DORA Metrics Effectively: Practical Patterns

The metrics are only useful if they drive action. Here are the patterns that consistently show up when teams use DORA data well.

Throughput and Stability Together

Never optimize one category at the expense of the other. A team with high deployment frequency but a climbing change failure rate is not improving. They are accelerating into instability. These are specific metrics for understanding devops performance, and reducing batch size is a practical way to improve throughput and stability over time.

The most useful diagnostic pairs:

  • Deployment Frequency + Change Failure Rate: High deployment frequency with a low change failure rate signals mature CI/CD practices and effective testing. High deployment frequency with a rising change failure rate signals that speed is outpacing quality controls.
  • Lead Time for Changes + Rework Rate: Short lead times with low rework mean the pipeline is efficient and the code shipping through it is sound. Short lead times with rising rework suggest corners are being cut during review or testing, which is why teams should review these pairs regularly as part of continuous improvement.
  • Failed Deployment Recovery Time + Change Failure Rate: A high change failure rate with fast recovery indicates a team that is good at detecting and resolving issues, but not yet good at preventing them. A high change failure rate with slow recovery is a team under significant operational stress.

PR Size as a Leading Indicator

PR size is not a DORA metric, but it is one of the strongest predictors of DORA outcomes. Large PRs take longer to review, are harder to reason about, and carry a higher probability of introducing defects. Teams that enforce PR size discipline (keeping changes under 200 lines where practical) consistently show better lead times and lower change failure rates.

In an AI-augmented environment, this matters more than before. AI tools can generate large volumes of code quickly, which often translates to larger PRs. Without explicit guardrails, the ease of generation works against the discipline of small, reviewable changes.

Code Review as a Quality Gate

The relationship between review practices and stability metrics is well-documented in DORA research. Teams with thorough review processes show lower change failure rates and lower rework rates, which helps teams ship quality software. Teams where PRs merge with minimal or no review show the opposite pattern. Better lead time for changes usually comes from agile ways of working, automated code reviews, small batch deployments, streamlined processes, and automated testing.

With AI-generated code now comprising a growing share of merged PRs, the review step has shifted from “checking a colleague’s work” to “verifying machine-generated output against system context.” This is a different cognitive task. It requires reviewers to evaluate not just correctness, but architectural fit, security implications, and long-term maintainability.

Sprint Health and Delivery Predictability

DORA metrics focus on the deployment pipeline, but they can be combined with sprint-level signals to create a fuller picture of delivery health:

  • WIP limits and cycle time consistency: Teams with stable work-in-progress limits and consistent cycle times across sprints are more predictable in their delivery. High WIP with volatile cycle times signals context-switching, unclear priorities, or scope creep.
  • Carryover rate: The proportion of planned work that rolls from one sprint to the next. Persistent carryover indicates systematic overcommitment or recurring blockers.
  • Deployment frequency per sprint: Connecting DORA's deployment frequency to sprint boundaries helps teams see whether delivery is flowing continuously or clustering at sprint ends (a sign of batch-and-rush patterns).

Integrating DORA with Other Frameworks

DORA metrics provide a strong foundation, but they were designed to measure delivery performance, not the full engineering experience. Several complementary frameworks have emerged.

SPACE Framework

The SPACE framework, co-authored by Nicole Forsgren (a founder of the DORA research program), measures developer productivity across five dimensions: Satisfaction, Performance, Activity, Communication, and Efficiency. It intentionally includes qualitative measures like satisfaction and well-being alongside quantitative delivery signals, and a deeper guide to mastering developer productivity with the SPACE framework can help teams operationalize these ideas.

SPACE is most useful when teams have already established DORA baselines and want to understand why their numbers look the way they do. A step-by-step view of how to measure developer productivity with SPACE makes it easier to connect delivery signals with human factors. A team with strong DORA metrics but declining satisfaction scores is at risk of attrition, something DORA alone would not surface.

DX Core 4

The DX Core 4 framework focuses on four dimensions: Speed, Effectiveness, Quality, and Impact. It layers business outcome alignment on top of delivery metrics, helping organizations avoid optimizing one dimension at the expense of the others, similar to how a holistic SPACE-based approach to developer experience and productivity balances well-being with performance.

When to Use What

DORA is the right starting point for any team that does not yet have a structured measurement practice. It is simple, well-researched, and directly actionable. As teams mature, layering SPACE or DX Core 4 on top provides richer diagnostic capability without replacing the DORA baseline.

The 2025 DORA report itself makes this point: simple software delivery metrics alone are not sufficient. But they remain the strongest common baseline for delivery performance.

Common Mistakes When Implementing DORA Metrics

Using DORA to Evaluate Individuals

  • Using DORA to evaluate individuals: DORA metrics reflect team delivery capability. Using them to score or rank individual developers creates perverse incentives, encourages gaming, and distracts from learning and improvement while eroding trust. A developer who takes time on a thorough code review is improving the team’s change failure rate, even though their own PR throughput appears lower.

Treating the Old Performance Tiers as Current

  • Treating the old performance tiers as current: Any dashboard or report that still references elite/high/medium/low tiers is working from outdated definitions. The 2025 report replaced these with seven archetypes for good reason: linear tiers oversimplify the relationship between delivery, stability, and team health.

Chasing Metrics Instead of Understanding Them

  • Chasing metrics instead of understanding them: A team that artificially inflates deployment frequency by splitting deployments into smaller units has not improved. They have gamed the metric. DORA metrics are diagnostic tools, not targets. The goal is to understand what is happening in the delivery pipeline, identify bottlenecks, and understand why, then act on that understanding. Deployment frequency also behaves differently from other DORA metrics because teams can configure what counts as a deploy in different environments.

Ignoring Stability When Throughput Improves

  • Ignoring stability when throughput improves: This is the most common mistake in 2026. AI tools make it easy to ship more code. But if change failure rate and rework rate are rising alongside deployment frequency, the team is accumulating instability debt. The bill comes due in production incidents, developer fatigue, and eroded delivery confidence.

Measuring AI Adoption Instead of AI Impact

  • Measuring AI adoption instead of AI impact: The 2025 DORA report is explicit about this: the critical question is no longer “are people using AI?” (90% already are) but “is AI helping us achieve better outcomes for individuals, teams, products, and the organization?” Tracking license counts and adoption rates without connecting them to delivery and quality metrics misses the point. To implement DORA metrics well, teams need to embed them into everyday DevOps workflows rather than treat them as a reporting exercise.

Getting Started with DORA Metrics

For teams that do not yet track DORA metrics, the path to implementation is straightforward:

  1. Start with what you already have. Most teams using GitHub, GitLab, or Bitbucket alongside a CI/CD pipeline already generate the raw data DORA metrics require across the software development process. Deployment frequency comes from your deployment pipeline. Lead time comes from commit-to-deploy timestamps. Change failure rate comes from incident or rollback data. Integrating SDLC logs and automating telemetry gathering improves tracking accuracy. To collect data reliably, map the systems in your technology stack that touch the delivery process. The data exists. The question is whether anyone is looking at it systematically.
  2. Baseline before you optimize. Measure your current state for at least four to six weeks before setting improvement targets across the software delivery process. Without a baseline, you cannot distinguish signal from noise, and you will not know whether a change actually moved the metric.
  3. Focus on one constraint at a time. If lead time is your biggest bottleneck, focus there. If change failure rate is high, focus on testing and review practices first. Trying to improve all five metrics simultaneously dilutes attention and slows progress in the development process.
  4. Connect metrics to practices. The DORA research does not just measure outcomes for development and operations teams. It identifies the practices that drive them: trunk-based development, continuous integration, CI/CD automation, small batch sizes, loosely coupled architectures, and team autonomy. Use the metrics to identify the gap, then use the DORA capabilities model to find the practice that addresses it.
  5. Review regularly, but not obsessively. Monthly or bi-weekly reviews help measure DORA metrics on a regular cadence instead of checking them ad hoc. The goal is trend detection, not real-time reaction during the development process.

DORA Metrics in 2026: Where the Framework Stands

The DORA framework has come a long way from four metrics and four tiers. It now encompasses five delivery metrics, a reliability quasi-metric, seven team archetypes, and an AI capabilities model. The 2025 report drew on nearly 5,000 technology professionals and over 100 hours of qualitative interviews.

The core message has not changed: teams that invest in strong engineering foundations, clear workflows, and healthy cultures deliver better software. What has changed is the context. AI has made it easier to generate code and harder to verify whether that code is actually improving outcomes. The bottleneck has shifted from production to verification.

DORA metrics remain the best shared baseline for understanding delivery performance. They are not sufficient on their own, and the DORA team says as much. But any measurement strategy that does not include them is missing the foundation.

The question for engineering leaders is no longer whether to track DORA metrics. It is whether you are reading them with the nuance that 2026 demands.