Developing AI with Agile: Redefining “Done” and “User” Before the Model Redefines You

A small team in a focused discussion around a table, reviewing notes and challenging ideas during a collaborative work session.

When we started applying Agile to AI work, I assumed our existing playbook—thin slices, short sprints, frequent demos—would carry over. It didn’t. We shipped increments, yes. But the increments often looked productive while learning the wrong lessons. In my world (Salesforce ecosystems in healthcare and banking), that gap showed up as models that “worked” in sprint reviews yet created downstream friction, compliance churn, or quietly optimized for the wrong behavior.

Over several programs, though, I learned that Agile does fit AI if you redefine two fundamentals up front:

  1. “Done” is not a working model. It’s a model + evidence that we’re learning the right thing.
  2. And “user” isn’t just an end user. It includes the data, the reviewers, and the controls that will live with your model after you ship.

Below, I’ll share the patterns that made the difference, the moments that forced me to change my mind, and a practical backbone you can take into your next AI iteration.

1. Redefine “done”: From output to outcome evidence

The moment my thinking changed came during a sprint demo where our AI assistance reduced Salesforce Opportunity creation time dramatically in a banking context. On paper, this was a win. But a week later, compliance flagged subtle edge cases the model had learned from historical shortcuts in the CRM. We had optimized speed at the cost of governance. Sprint “done” wasn’t business “done.”

What finally worked: we added Outcome Evidence to our Definition of Done (DoD):

  • Metric + guardrail pair for every story (e.g., time saved paired with policy adherence rate).
  • Champion–Challenger check in the acceptance criteria: the new model must beat a baseline and preserve guardrails in a holdout set.
  • Counterfactual example review: before closing the story, we ask, “Where would this model make a confident but wrong recommendation, and how will we detect it?”

This didn’t slow us down; it prevented expensive rework. It also changed our sprint reviews: instead of demoing “look what the model can do,” we demoed “look what we learned and how we know we didn’t learn the wrong thing.”

2. Expand “user”: Include data, reviewers, and controls

In healthcare integrations (Epic/Genesys ↔ Salesforce Health Cloud), we learned that the people who review model output—care coordinators, operations leads, security, compliance—are as critical as end users. If they lack visibility or a lightweight override path, your team will quietly harden manual workarounds outside the system. That’s how “shadow process” is born.

Our rule of thumb now:

  • Treat data stewards as primary users. Give them promptable checks, lineage notes, and drift signals where they live (often the CRM or SPM dashboards).
  • Treat reviewers as first‑class citizens. If a human‑in‑the‑loop can’t give feedback inside the workflow, your model quality will decay without anyone noticing.
  • Treat controls (security/compliance) as product features, not gate meetings. We ship small compliance surfaces early—policy hints, audit events, sampling views—so reviewers can see and shape the model’s boundaries.

When we took this view, adoption improved, and the model learned faster because feedback lived where the work is, not in a separate spreadsheet.

3. Thin vertical slices for AI are different: Slice by decision, not by screen

On a Salesforce program, our first “thin slice” was UI‑centric: autocomplete fields, smart defaults, then later a recommendation panel. Velocity looked great; learning did not. The model wasn’t exposed to the decision pressure we actually cared about (e.g., which lead deserves attention now; which case needs escalation).

We changed the slice to “one decision, end‑to‑end”:

  • Define the decision (e.g., “Which opportunity deserves a same‑day touch?”).
  • Ship a tiny model (or even a rules‑based challenger), a visible rationale, a one‑click override, and a feedback capture.
  • Measure decision quality against a baseline, not just click speed.

That slice exposed the real trade‑offs (precision vs. recall, throughput vs. fairness) far earlier and gave us the right conversations in sprint reviews.

4. Backlog hygiene: Separate model work, data work, and control work, but demo them together

In classic Agile, “as a user I want…” stories can blur model training, data remediation, and governance hardening into one mega‑ticket. We split the work into three tracks, each with its own cadences, but we demo them together:

  • Model work: features, loss curves, challenger results, drift signals.
  • Data work: pipeline health, lineage notes, labeled edge cases added this sprint.
  • Control work: audit events shipped, sampling policy, access scopes.

One sprint review equals one narrative: what we changed in the model, what data moved because of it, and how controls evolved. Executives track value; auditors see traceability; teams see cause and effect.

5. Make feedback frictionless (and visible) inside the tooling people already use

In one hospital rollout, we had brilliant feedback buried in SharePoint and email threads. We moved it into the systems where work happened: Salesforce comments, case objects, and lightweight review forms. We also surfaced “Top 5 feedback themes” on an SPM dashboard that leadership already reviewed weekly.

That closed the loop. Instead of asking clinicians and support teams to go somewhere else to be heard, we met them in the flow of work. Model quality improved because the feedback finally did.

6. When to say “no” to a model

A counterintuitive lesson: there are moments where the most Agile move is not to build a model. If the data generating process is unstable (e.g., new workflow, new form fields, evolving taxonomy), first stabilize the process with automation or explicit rules. Then invite an AI challenger. Otherwise, you’ll teach your model yesterday’s chaos and call it learning.

We now use a simple guardrail: If we cannot articulate a stationary slice of the process for 4–6 weeks, we don’t train, we instrument. That instrumentation later becomes gold for training.

A lightweight backbone you can copy

  • DoD upgrade: outcome metric + guardrail + challenger check + counterfactual example.
  • User map: end user, data steward, reviewer, control owner—each with an in‑tool feedback path.
  • Slice by decision: one real decision, end‑to‑end, with rationale and override.
  • Three‑track backlog: model / data / control, demoed as one story of change.
  • In‑flow feedback: capture and summarize where people already work.
  • Instrument before you model: only learn from processes stable enough to teach.

What this work has taught me

Working on AI inside fast‑moving, highly regulated environments hasn’t made me more certain about the “right way” to do things. If anything, it’s humbled me. I’ve had late‑night standups where the model drifted without warning, sprint reviews where a small insight changed the entire roadmap, and moments where I caught myself asking, “Are we even solving the right problem?”

What I’ve learned, often the hard way, is that the teams that thrive treat learning as the real product. Controls become features, and decisions become tiny, testable slices of truth. When we redefine “done” and “user” through that lens, the model gets better one sprint at a time, and the organization starts trusting it enough to scale.

So, if you’ve ever paused mid‑project and questioned, “Is this still working?” while building AI in a multi‑system, highly regulated world, you’re not alone. That question is usually where the real conversation begins. I’d love to hear how you’re navigating it, too.

We hope you found this post informative

Before you move on, please consider supporting our non-profit mission by making a donation to Agile Alliance todayThis is a community blog post. The opinions contained within belong solely to the author or authors, and may not represent the opinion or policy of Agile Alliance.

Picture of Pulkit Singhal

Pulkit Singhal

Pulkit Singhal is a multi‑certified Senior Salesforce Business Analyst known for delivering enterprise‑level impact across financial services, healthcare, and operational environments. He has helped organizations to improve their business efficiency, he has led transformative Salesforce initiatives, strengthened cross‑functional alignment, and significantly improved operational efficiency across key business units. Blending Lean Six Sigma discipline with deep platform expertise, Pulkit excels at turning ambiguity into scalable, user‑centered solutions - earning recognition as a trusted strategic partner, an…

Recent Blog Posts

Recent Posts

Join Agile Alliance!

$5 per month (paid annually)*

*Corporate plans are also available

Post your comments or questions

Recent Agile Alliance Blog Posts

Ready to join Agile Alliance?

Unlock members-only access to online learning sessions, Agile resources, annual conference discounts, and more! And when you join, you’ll be supporting our member initiatives, regional events, and global community groups.

Privacy Preference Center

IMPORTANT: We have transitioned to a new membership platform. If you have not already done so, you will need to SET UP AN ACCOUNT on the new platform to establish your user profile. Your previous login credentials will not work until you do this set up.

When you see the login screen, choose “Set up Account” and follow the prompts to create your new account. You can choose to log in using your social credentials for either Google or Linkedin (recommended), or you can set up your account using an email address.