Software delivery is harder to reason about than it used to be. Teams ship more code, use more tools, and deploy more frequently, but the question that keeps surfacing in leadership meetings is the same one it was five years ago: are we actually getting better?
DORA metrics exist to answer that question with data instead of intuition.
Developed by the DevOps Research and Assessment team (now part of Google Cloud), the DORA framework provides a shared, evidence-based language for measuring software delivery performance. It has become the most widely adopted measurement system in engineering, used by thousands of organizations to understand how work moves from commit to production and how stable it remains once it gets there.
But if your understanding of DORA stopped at the original four metrics and the elite/high/medium/low tiers, it is time to update. The framework has evolved significantly. It now includes a fifth metric (Rework Rate), a quasi-metric (Reliability), and a complete replacement of the old performance tiers with seven team archetypes.
This guide covers what DORA metrics are, how they changed in 2024 and 2025, how AI is reshaping their interpretation, and how engineering leaders can use them to drive real improvement in 2026, especially when paired with a practical 2026 guide for engineering leaders on using all five DORA metrics.
DORA metrics are performance measurements that help teams deliver software more efficiently and quickly by tracking four core metrics: deployment frequency, lead time for changes, change failure rate, and mean time to recover (MTTR). These metrics are designed to measure team-level system capabilities, not individual performance, ensuring the focus remains on collective improvement and avoiding counterproductive behaviors.
The original framework included four metrics. As of 2025, it includes five formal metrics and one quasi-metric, grouped into two categories, and many teams still look for a comprehensive overview of DORA metrics and their role in software delivery performance.
Reliability sits alongside the five formal metrics as a broader indicator of operational performance. It captures the overall dependability of the system from an end-user perspective, separate from the deployment-specific signals that Change Failure Rate and Rework Rate track.
Before DevOps practices became mainstream, most organizations assumed throughput and stability were inherently opposed. Ship faster, expect more breakage. The DORA research proved this wrong. When you align development and operations around the same goals, the things teams do to improve throughput also increase stability. Automating deployments to increase speed, for example, also makes deployments more reliable and repeatable.
This is the core insight: speed and stability are not a trade-off. They reinforce each other when the right practices are in place.
For engineering leaders, DORA metrics serve as a common language across teams. In large organizations where multiple teams work on different products with different release cadences, DORA provides a consistent vocabulary for discussing delivery performance. A VP of Engineering can compare cycle time patterns across ten teams without relying on each team’s self-reported narrative.
Focused improvement is another key benefit. Retrospectives generate ideas, but those ideas do not always translate into measurable outcomes. DORA metrics close that loop and provide a baseline for setting goals, measuring progress, and helping teams identify areas that need improvement. As reliable metrics and practical key performance indicators, they let teams identify areas for improvement, and a practical overview of DORA metrics and how to implement them effectively can help leaders turn data into action. The statistical model of practices and capabilities produced by the research offers concrete experiments teams can run to improve and drive continuous improvement.
Alignment with business outcomes is a third function. DORA research across multiple years has consistently shown that teams performing well on these metrics are more likely to meet organizational performance targets, and research has found elite performers are twice as likely to meet those targets. The metrics connect engineering work to business goals without requiring individual-level tracking. DORA metrics reflect team-wide system capabilities, not individual performance reviews.
For years, DORA benchmarked teams into four performance tiers: elite, high, medium, and low. These tiers became the most widely cited element of the framework. Teams chased "elite" status. Dashboards color-coded against them. But the tiers were never meant to be used that way.
As RedMonk's analysis pointed out, DORA used cluster analysis, not fixed performance thresholds. Teams were not ranked against static definitions of what it meant to be "elite." The clusters were determined each year based on what metrics loaded together. Many organizations took a descriptive model and treated it as a prescriptive scorecard, but DORA was always intended to be a pattern detector, not a ruler.
In the 2025 report, DORA retired the four-tier classification entirely.
In place of the performance tiers, the 2025 report introduced seven team archetypes based on throughput, stability, and team well-being. The shift is significant: knowing your team type draws a map of what capabilities to improve, something the old performance tiers never did well.
The seven archetypes are:
Two teams can show similar DORA scores for completely different reasons. One may be genuinely healthy and well-balanced. Another may be performing under unsustainable pressure. The archetype model captures that distinction, where a single metric dashboard cannot.
The model is built on eight drivers of holistic performance, including burnout, friction, and time spent on valuable work. These dimensions form the foundation for the archetypes, ranging from Legacy Bottleneck (about 11% of respondents) to Harmonious High Achievers (roughly 20%).
The 2025 report was renamed from "State of DevOps" to "State of AI-Assisted Software Development." That is not a cosmetic change. It is a change of scope for the entire report.
According to the Google Cloud announcement, 90% of survey respondents report using AI at work. More than 80% believe it has increased their productivity. But the relationship between AI adoption and DORA metrics is more complicated than the adoption numbers suggest.
The central finding of the 2025 report is direct: AI does not fix a team; it amplifies what is already there. Strong teams use AI to become even better and more efficient when they have already mastered the core DORA metrics and their implementation. Struggling teams find that AI only highlights and intensifies their existing problems.
This has practical implications for how engineering leaders interpret their DORA numbers:
Alongside the team archetypes, the 2025 report introduced the DORA AI Capabilities Model, identifying seven foundational practices that determine whether AI benefits scale beyond individuals to organizational performance:
The model reinforces a theme that runs through the entire report: the greatest return comes not from the AI tools themselves, but from a strategic focus on the quality of internal platforms, the clarity of workflows, and the alignment of teams.
The 2025 DORA report found that 90% of organizations have adopted at least one platform, and there is a direct correlation between a high-quality internal platform and an organization's ability to unlock the value of AI. Platform engineering is not optional infrastructure. It is the foundation that determines whether AI tools accelerate delivery or accelerate dysfunction.
The metrics are only useful if they drive action. Here are the patterns that consistently show up when teams use DORA data well.
Never optimize one category at the expense of the other. A team with high deployment frequency but a climbing change failure rate is not improving. They are accelerating into instability. These are specific metrics for understanding devops performance, and reducing batch size is a practical way to improve throughput and stability over time.
The most useful diagnostic pairs:
PR size is not a DORA metric, but it is one of the strongest predictors of DORA outcomes. Large PRs take longer to review, are harder to reason about, and carry a higher probability of introducing defects. Teams that enforce PR size discipline (keeping changes under 200 lines where practical) consistently show better lead times and lower change failure rates.
In an AI-augmented environment, this matters more than before. AI tools can generate large volumes of code quickly, which often translates to larger PRs. Without explicit guardrails, the ease of generation works against the discipline of small, reviewable changes.
The relationship between review practices and stability metrics is well-documented in DORA research. Teams with thorough review processes show lower change failure rates and lower rework rates, which helps teams ship quality software. Teams where PRs merge with minimal or no review show the opposite pattern. Better lead time for changes usually comes from agile ways of working, automated code reviews, small batch deployments, streamlined processes, and automated testing.
With AI-generated code now comprising a growing share of merged PRs, the review step has shifted from “checking a colleague’s work” to “verifying machine-generated output against system context.” This is a different cognitive task. It requires reviewers to evaluate not just correctness, but architectural fit, security implications, and long-term maintainability.
DORA metrics focus on the deployment pipeline, but they can be combined with sprint-level signals to create a fuller picture of delivery health:
DORA metrics provide a strong foundation, but they were designed to measure delivery performance, not the full engineering experience. Several complementary frameworks have emerged.
The SPACE framework, co-authored by Nicole Forsgren (a founder of the DORA research program), measures developer productivity across five dimensions: Satisfaction, Performance, Activity, Communication, and Efficiency. It intentionally includes qualitative measures like satisfaction and well-being alongside quantitative delivery signals, and a deeper guide to mastering developer productivity with the SPACE framework can help teams operationalize these ideas.
SPACE is most useful when teams have already established DORA baselines and want to understand why their numbers look the way they do. A step-by-step view of how to measure developer productivity with SPACE makes it easier to connect delivery signals with human factors. A team with strong DORA metrics but declining satisfaction scores is at risk of attrition, something DORA alone would not surface.
The DX Core 4 framework focuses on four dimensions: Speed, Effectiveness, Quality, and Impact. It layers business outcome alignment on top of delivery metrics, helping organizations avoid optimizing one dimension at the expense of the others, similar to how a holistic SPACE-based approach to developer experience and productivity balances well-being with performance.
DORA is the right starting point for any team that does not yet have a structured measurement practice. It is simple, well-researched, and directly actionable. As teams mature, layering SPACE or DX Core 4 on top provides richer diagnostic capability without replacing the DORA baseline.
The 2025 DORA report itself makes this point: simple software delivery metrics alone are not sufficient. But they remain the strongest common baseline for delivery performance.
For teams that do not yet track DORA metrics, the path to implementation is straightforward:
The DORA framework has come a long way from four metrics and four tiers. It now encompasses five delivery metrics, a reliability quasi-metric, seven team archetypes, and an AI capabilities model. The 2025 report drew on nearly 5,000 technology professionals and over 100 hours of qualitative interviews.
The core message has not changed: teams that invest in strong engineering foundations, clear workflows, and healthy cultures deliver better software. What has changed is the context. AI has made it easier to generate code and harder to verify whether that code is actually improving outcomes. The bottleneck has shifted from production to verification.
DORA metrics remain the best shared baseline for understanding delivery performance. They are not sufficient on their own, and the DORA team says as much. But any measurement strategy that does not include them is missing the foundation.
The question for engineering leaders is no longer whether to track DORA metrics. It is whether you are reading them with the nuance that 2026 demands.