Agents can now run the full SDLC in Cosmos. What do engineers do?

Last month I opened a roughly 1,000-line PR. The colleague I asked to review it looked at it and said, “Okay, give me 45 minutes. I’ll do it now.”

A year ago, that same engineer would have said, “This is six or seven hours. I’ll get to it next Tuesday.”

What meaningfully changed in that time? Not just the models, though they did improve. It was Cosmos, our unified agent platform: the system around agentic engineering that scopes, hands off, reviews, and routes work before a human steps in.

The ceiling on coding agents

A year ago I already had AI in the IDE and CLI. I could generate code quickly, but overall output of shipping features end-to-end only improved by about 40%.

That sounds disappointing until you do the math. Coding is maybe 30% of the job. The rest is review, testing, tickets, design, incident response, and planning. Speed up only typing and throughput barely moves. That’s just Amdahl’s law.

The missing piece was not another coding agent. It was a system where agents could share context, remember patterns, and hand work across stages.

A day in the life	Before Cosmos	With Cosmos
Large PR review	Wait in queue for a full human read	Agents check correctness, handle routine fixes, and prepare a focused review
Low-risk changes	Sit behind bigger reviews	Risk analysis routes obvious cases without extra queue work
Incident triage	Chase logs, metrics, deploys, owners, and recent PRs	Incident Investigator posts an RCA and recommended action in Slack
On-call week	Feature work mostly stops	The engineer reviews decisions and keeps building
Human role	Do every mechanical step	Set direction, check judgment, tune thresholds, and own the final call

Code review and incident response are the examples here, but the same “system around the agent” idea applies to nearly every software engineering task.

Code review stopped being a queue

Review was the first place this became obvious. When code generation sped up, PRs piled up. The bottleneck was confidence. A human still had to reason through every line before the team could feel safe merging.

The solution was not “add a Deep Review step.” It was a workflow that extracts the mechanical work around code review and automates it with agents, leaving only judgment calls for humans. In practice, that system is Deep Code Review + Pair Reviewer: agents verify correctness and do routine fix-up first, then a human reviewer focuses on architecture, risk, and intent.

A couple of weeks ago I merged a PR that collapsed two near-identical incident-response expert templates into one. About a thousand lines of change, easy to get subtly wrong. Before a human read it, the review workflow had started.

Deep Code Review found the real bug: an old prompt file I removed was still referenced by deployed bundles. That would have broken live sessions.

Screenshot of Deep Code Review flagging a runtime-breaking issue in a pull request (a removed prompt file is still referenced in deployed bundles), with recommended fix steps.

Code review and incident response are the examples here, but the same “system around the agent” idea applies to nearly every software engineering task.

Deep Code Review caught a runtime break; PR Author fixed it before human review.

Then the PR Author fixed it in the same thread, pushed a commit, explained the change, and left me one decision: do I agree? I did. That took seconds, not a context switch.

The human review changed too. My reviewer did not spend seven hours reading deleted lines. They opened a Pair Reviewer session scoped to architecture and design and used it to answer the real question: was collapsing these two experts into one shared template the right design?

Screenshot of the Pair Reviewer interface turning a large PR diff into a focused architecture/design conversation with a human reviewer.

The Pair Reviewer turned a giant diff into a focused design conversation instead of a reading assignment.

The goal is not “no human review.” It is better human review. Correctness checks, low-risk routing, and routine fix-up happen first. The human spends time on design, risk, and system understanding.

We’ve published the full code review numbers. Reviewer time on a large PR went from six or seven hours to 45 minutes, weekly output tripled, and bug-introducing commits per output unit went down. Auto-approval works only because the risk analyzer is conservative. A false low-risk call costs more than one unnecessary review.

[ Meet Cosmos ]

Run your software agents at scale

Cosmos gives your agents the context, tools, and feedback loops they need to get better with every workflow.

Try it out

On-call stopped being manual context reconstruction

The same pattern showed up on call. Once coding and review moved faster, operational load became the next bottleneck. A faster team creates more alerts, more deploys, and more questions for whoever has the pager.

A year ago, investigating one alert took me about 30 minutes. I moved between Slack, PagerDuty, logs, metrics, recent deploys, recent PRs, owners, and prior incidents. The work was not always hard, but it always demanded attention. On call, feature work mostly stopped.

Now the Incident Investigator does the first pass inside Slack. It gathers the evidence, posts an RCA with a recommended path, and routes the incident toward monitor, code fix, rollback, or escalation. If code needs to change, it hands the fix to PR Author and sends the result into review.

Screenshot of the Incident Investigator workflow in Slack, summarizing evidence and proposing an RCA and remediation path to speed incident triage.”

Time to first RCA fell from about 30 minutes to 6 minutes; full thread resolution dropped by about a third.

The human role did not disappear. I review the RCA, ask follow-up questions, and approve or change the remediation path. The agent gathers evidence and proposes action. I still make the production call.

The cleanest example was a noisy alert last week, one of those monitors that fires too aggressively. My contribution was one sentence: tune it to one hour and put up a PR. The agent kicked off the change. The code review agents handled review. I went back to feature work.

The incident management numbers tell the same story: agents now handle 81.3% of incidents, and on-call engineers merged 44% more PRs per week.

The platform is the point

The important part is that code review and incident response are not separate tools. They run on the same platform, share context, carry forward corrections, and hand work across stages. Review can move from risk analysis to fix-up to design review. On call, an alert can move from RCA to remediation to PR to review. The engineer is not gluing steps together by hand.

Cosmos also comes with out-of-the-box workflows like code review and incident response (and many others). Agents drive the mechanical work and pull humans in only for judgment.

That means teams can adopt these battle-tested workflows — with the right model choices, UX, and guardrails — in a day, instead of spending months designing the system themselves.

You can see an overview of that workflow in the demo below.

This is where my job changed. Agents took over mechanical work: reading diffs, chasing logs, reconstructing context, writing routine fixes, and summarizing evidence. What remains is judgment: architecture, risk thresholds, whether an RCA is good enough, when auto-approval should expand, and how agents should improve when they miss something.

That is not smaller work. It is more consequential because more code and incidents move through the system before a human steps in.

You can automate the full SDLC, today.

If you’re running a fairly standard modern dev stack: common issue trackers, Git workflows, CI/CD, observability, and on-call tooling, etc., then you can automate almost the entire SDLC lifecycle today.

Are there rough edges and polish work to be done? Yes.

Will teams running fully bespoke processes, custom infra, or unusual constraints have a harder time? Yes.

The agentic engineering era is here. This is the first time the path has felt practical. It has never been more possible, or more fun, to build this way.