Using Spec-Driven Development with FHIR to Build AI Applications

Make me something good

Sep 25, 2025

Healthcare AI applications are often built in high-stakes environments where precision, transparency, and interoperability matter as much as speed. But most of today’s AI workflows fall victim to quick and dirty “vibey coding” code that looks plausible but drifts from the original intent when generated from vague prompts.

Specification-Driven Development (SDD) offers a disciplined alternative. By starting with a specification your source of truth for intent you can let agents, frameworks, and toolkits safely accelerate implementation. With the rise of GitHub’s Spec Kit, this approach is now accessible to AI engineers, data teams, and FHIR developers alike.

In this post, I’ll share how we’re applying SDD in healthcare AI, highlight an example using my Plumly app, and extend the conversation to Google’s ADK and A2A for agent-to-agent workflows.

What is Specification-Driven Development?

SDD builds on principles from Extreme Programming (XP) and Contract-First Design. Instead of diving straight into implementation, you:

Specify – Capture intent as a feature-level spec.
Plan – Translate it into a contract (OpenAPI, FHIR profile, etc.).
Task – Break down the work across front-end, back-end, and test phases.
Implement – Let agents or developers execute against the spec, with guardrails.

With Spec Kit, this workflow feels like pair programming with a process coach. You evolve an application step by step, while keeping alignment with your original design. For FHIR, this means your API contracts and data profiles aren’t bolted on later—they’re baked into the process.

Importantly, the spec is not a static doc it’s a living, executable artifact. As the product evolves, so does the spec. Agents can re-ingest updates and regenerate downstream plans and tasks, making iteration safe and predictable.

Why Prompts Alone Fail

SDD reduces hidden assumptions. By separating what (intent, captured in spec) from how (implementation details), you reduce the ambiguity that derails prompt-only approaches.

Applying SDD to FHIR + AI

FHIR projects often stumble on two recurring issues:

Contract drift: The JSON or XML you exchange looks different from what was intended.
Inconsistent assumptions: Data models don’t line up across teams, which breaks downstream analytics and AI.

With Spec Kit, you can mitigate both:

Use /specify to capture healthcare scenarios (e.g., “a patient summary includes demographics, encounters, observations, medications”).
Generate an OpenAPI contract (or FHIR StructureDefinition) during /plan.
Apply tools like Specmatic MCP for contract tests and resiliency checks.
Let agents build backend logic against the FHIR spec, while frontend or AI summarization components work against mocked endpoints until integration.

Crucially, you can also embed constraints early security, patient consent, audit logging, latency targets directly into the specification. Instead of being afterthoughts, these become non-negotiable rules enforced by agents and tests.

Example: Plumly

Plumly (repo: FHIR-IQ/Plumly) is an experimental app that uses AI to generate structured summaries from patient data. Here’s how SDD helps:

/specify – Define the scope: “Generate both narrative and structured summaries (FHIR Composition, List, or DocumentReference).”
/plan – Create OpenAPI endpoints for /summarize/patient and /summarize/provider.
/tasks – Split the build: backend LLM integration, frontend UI for configuration, and integration tests for FHIR validity.
implement – Use agents (Claude Code, Gemini CLI, etc.) to execute.

Because the spec is executable and versioned, adding new capabilities (e.g., handling genomic Observations or social risk factors) is not guesswork. You update the spec, regenerate tasks, and iterate.

Extending with Google ADK and A2A

FHIR + AI isn’t just about isolated endpoints it’s increasingly about agent-to-agent (A2A) interoperability. Google’s Agent Development Kit (ADK) and the A2A project enable conversational workflows between AI agents.

Where Spec Kit brings design discipline, ADK + A2A bring runtime coordination.

Imagine:

A FHIR Summarization Agent (built with SDD + Spec Kit) provides clean, spec-aligned outputs.
A Patient Scheduling Agent (built on ADK) negotiates with a provider’s availability service.
A Care Gap Agent consumes FHIR data and requests summaries from Plumly to close HEDIS gaps.

Because each is built against specs and contracts, they can talk reliably via A2A. The complement is powerful: SDD ensures you design correctly, while A2A ensures agents execute collaboratively.

Why This Matters

Healthcare AI will only scale if we:

Design with intent – Start with specifications, not vibes.
Bake in interoperability – FHIR contracts, OpenAPI specs, and structured exchanges.
Enable agent ecosystems – Using ADK and A2A to extend from single apps to networks.
Support all contexts – Greenfield apps, new feature builds, and legacy modernization are all use cases where SDD shines.

The result is faster iteration without drift, and innovation without breaking trust.

Scaling Challenges

It’s worth noting that as systems grow:

Spec management becomes complex – You’ll need modularization, diffing, and merge strategies.
Context engineering is non-trivial – Coordinating multiple specs for large systems requires discipline.
Culture shifts matter – Teams must buy into “spec first” to avoid slipping back into shortcut coding.