The Shift from Prompting to Planning: How Agentic Workflows are Redefining Software Architecture in the LLM Era
Practical guide for engineers: move from prompting to planning with agentic workflows, architecture patterns, and a hands-on example.
The Shift from Prompting to Planning: How Agentic Workflows are Redefining Software Architecture in the LLM Era
Modern large language models (LLMs) made one thing trivial: getting text back from a prompt. The real challenge engineers now face is building systems that convert high-level human intent into reliable, observable, and maintainable behavior. That transition is less about crafting better prompts and more about designing agentic workflows — systems that plan, delegate, and execute across tools and services.
This post is a compact, practical guide for engineers who must design systems in the LLM era. You’ll get definitions, architectural patterns, pitfalls, and a code example showing the planner-executor pattern. The goal: stop treating LLMs as glorified text generators and start treating them as components in deterministic workflows.
Why the shift matters
Prompting optimizes for single-turn text outputs. It produces useful language but offers limited structure for multi-step tasks, error handling, or integration with external tools. Agentic workflows, by contrast, introduce a planning layer that:
- Breaks intent into discrete steps.
- Chooses and sequences tools or APIs.
- Manages state, retries, and error handling.
- Observes execution and adjusts plans dynamically.
The outcome is not just better semantics from the model; it is software you can test, monitor, and operate.
Not an either/or choice
This is not a repudiation of prompt engineering. Prompts still matter — they define the planner and the reasoning style. The difference is that prompts feed a planning engine rather than trying to encode all orchestration within a single prompt response.
Core concepts: planner, executor, tools, and memory
Designing agentic systems means decomposing responsibilities.
- Planner: Receives user intent and environment context. Outputs a structured plan: an ordered list of steps, each mapped to a capability or tool.
- Executor: Consumes the plan and executes steps reliably, handling retries, backoffs, and rollbacks when needed.
- Tools: The concrete capabilities (databases, search, code runners, web APIs, system shells). Treated as idempotent or wrapped to be so.
- Memory / State: Persistent store for context, intermediate results, and audit logs.
Separating these responsibilities reduces coupling and makes the system testable: you can unit-test planners (logic and reasoning) separately from executors (integration and resilience).
Architectural patterns
Below are pragmatic patterns that work at different scales.
Planner-as-service (centralized reasoning)
Run a dedicated planning service that receives intents, consults memory, and returns plans. Pros: single place to evolve reasoning logic; easier to enforce policies. Cons: can become a bottleneck and single point of failure.
Distributed planners (near tools)
Attach light planners to tool clusters where latency or privacy matters (e.g., on-prem databases). Pros: lower latency, better data locality. Cons: increased coordination complexity.
Hybrid orchestration
Use a central planner for global intent and local planners for domain-specific refinement. This is the common pattern in scaled systems: a global plan delegates subplans to domain agents.
Event-driven execution
Treat plan steps as events/messages. Executors pick up tasks from queues, run tools, and emit results. This fits well with autoscaling and observable architectures.
Design concerns: determinism, observability, safety
Agentic workflows surface non-determinism from LLMs. Address it deliberately:
- Validate plans before execution. Make a lightweight static analyzer that checks permissions, tool usage, and safety constraints.
- Record plan provenance. Keep the prompt, model version, plan text, and planner decisions for replay and debugging.
- Make steps idempotent where possible. Use deduplication keys and transactional wrappers.
Observability: each step should emit structured logs, metrics (latency, success rate), and traces. Instrument the planner to expose confidence scores and alternatives so ops can surface ambiguous plans.
Safety: enforce capability caps and sandbox tools. The planner should be capability-aware and constrained by a policy layer.
Example: planner-executor sketch
This example demonstrates the canonical flow: intent -> planner -> plan -> executor -> tools. The code is intentionally minimal to highlight the mechanics rather than the LLM integration details.
class Planner:
def __init__(self, llm):
self.llm = llm
def create_plan(self, intent, context):
prompt = f"Plan steps to achieve: {intent}\nContext: {context}"
# llm.generate returns a structured plan (string or JSON)
plan_text = self.llm.generate(prompt)
# parse_plan is a domain-specific parser that turns text into steps
plan = parse_plan(plan_text)
return plan
class Executor:
def __init__(self, tools, store):
self.tools = tools
self.store = store
def run_plan(self, plan, run_id):
results = []
for step in plan.steps:
tool = self.tools.get(step.tool)
try:
res = tool.invoke(step.args)
results.append((step, res))
self.store.append(run_id, step, res)
except Exception as e:
# retry policy or fail-fast, depending on step
handle_failure(step, e)
raise
return results
# usage
planner = Planner(llm)
plan = planner.create_plan("summarize repo and open PR", repo_context)
executor = Executor(available_tools, run_store)
executor.run_plan(plan, run_id="run-123")
Notes:
- Keep the planner lightweight and deterministic where possible: use structured outputs (JSON or YAML) and a strict parser.
- Executors must treat steps as first-class objects with metadata: id, tool name, args, retry policy, idempotency key.
- Use an append-only run_store to support replay and audit.
Testing and evolution
Design your planner and executor to be testable independently:
- Unit-test the planner with fixed LLM responses (mock the LLM). Confirm plan structure, permissions checks, and fallback behavior.
- Integration-test executors against tool mocks that simulate failures and latencies.
- Use canary experiments: route a small percentage of live intents through new planner logic and compare outcomes.
Version the planner outputs and store plan schema versions to avoid broken executors.
Operational playbook
- Monitor: success rate, average plan length, step failure distribution, time-to-first-execution.
- Alert: high frequency of ambiguous plans (planner returns multiple high-confidence alternatives) or unauthorized tool usage attempts.
- Rollback: planners should expose a “safe mode” that returns conservative plans (only read-only operations) in failure scenarios.
Common anti-patterns
- Embedding all orchestration in a single prompt. This makes debugging impossible.
- Treating the LLM response as authoritative. Always validate and sanitize.
- Omitting idempotency and retries on external calls.
When to use agentic workflows
- Multi-step tasks that involve external side effects (deployments, financial ops, modifications).
- Workflows that require dynamic tool selection (choose search vs. DB vs. code-runner based on intent).
- Systems where auditability and observability matter.
If your app is purely content generation with no side effects, classic prompting may still be the simplest pattern.
Checklist: shipping agentic workflows
- Define planner contract: structured schema for plans and step metadata.
- Implement a planner service with deterministic parsing of LLM outputs.
- Build an executor that enforces idempotency, retries, and transactional semantics.
- Catalog tools with capability metadata and permission controls.
- Implement persistent run_store for audit, replay, and debugging.
- Add observability: structured logs, metrics, and trace correlation between planner and executor.
- Run staged experiments and version planner schemas.
Summary
The era of LLMs shifts the engineering problem from crafting better single-shot prompts to building reliable workflows that can plan, delegate, and execute. Agentic architectures introduce a planner that reasons about intent and an executor that runs steps against tools. This separation gives you testability, observability, and safety — properties essential to production systems.
Treat LLMs as reasoning components, not oracles. Make their outputs structured, auditable, and executable. When you do, software architecture becomes about building robust chains of responsibility that turn human intent into safe, observable actions.