Agentic AI without the hype: a practical taxonomy for non-technical leaders

Agentic AI without the hype: a practical taxonomy for non-technical leaders

If you’ve sat in a vendor pitch in the last six months, you’ve heard the word “agentic” thrown around like it means something specific. It mostly doesn’t.

That’s a problem, because the architectural choices behind these systems vary wildly — and so do the costs, the risks, and the realistic outcomes. Buying the wrong shape of system for your problem is one of the most expensive mistakes we see.

Here’s the simple taxonomy we use when we’re scoping work. Four levels. Pick the lowest one that solves your problem.

Level 0: AI in the seat of a human

The model writes; a human sends. The model summarizes; a human reads. The model drafts; a human reviews and edits.

This is where most teams should start, especially in regulated or brand-sensitive contexts. It’s the lowest-risk way to get value from LLMs because the model never takes an action on its own. A bad output costs you a few seconds of editing.

Examples we’ve shipped at this level:

  • Auto-drafting product descriptions from spec sheets
  • Summarizing weekly support trends from raw ticket data
  • Generating first-draft responses for sales reps to personalize

If you can solve your problem at Level 0, do it. The economics are great, the failure modes are obvious, and you build organizational muscle for what comes next.

Level 1: AI inside a deterministic workflow

The model does one well-defined task as a step in a larger automated pipeline. Everything else around it is regular code.

Trigger fires → fetch data → call model to classify / extract / summarize → conditional logic → take action.

The model’s scope is narrow and the surrounding code constrains it. If the model returns garbage, the workflow can detect it (schema validation, confidence scores) and route to a human or retry.

This is where the real ROI is hiding for most mid-market companies. It’s not glamorous and it’s not what gets covered in keynotes, but it’s the workhorse pattern for 2026.

Examples:

  • Inbound ticket → classify intent → if “refund” route to refund flow, if “product question” search KB and reply
  • Resume submitted → extract structured fields → compare to JD → score and shortlist
  • Vendor invoice arrives → extract line items → match to PO → flag discrepancies

The failure mode is bounded because the model’s job is bounded. You don’t need any of the agentic infrastructure (memory, tool selection, planning) to make this work. Plain code plus one or two model calls.

Level 2: Single-purpose agent with constrained tools

Now we’re in proper agent territory. The model decides — within a tightly-scoped domain — which tool to call, in what order, with what arguments. There’s a goal (“resolve this ticket,” “qualify this lead,” “onboard this customer”) and a fixed set of tools the agent can use to pursue it.

The key word here is constrained. A well-scoped Level 2 agent has:

  • Five to ten tools, each with a precise contract
  • Hard limits on what each tool can do (max refund amount, allowed countries, etc.)
  • Confidence thresholds for autonomous action vs. human escalation
  • Full audit logging of every decision and tool call
  • Explicit tone and behavior rules

This is where the support agent in our Northwind case study lives. It’s also the right architecture for things like SDR-style outbound (research → personalize → send → log), automated returns processing, and most internal ops agents.

When this works, the leverage is real. When it doesn’t, it’s usually because someone gave the agent more tools than it could reason about, or skipped the guardrails to ship faster.

Level 3: Multi-agent / open-ended

Multiple agents collaborate. Each has different tools, different contexts. They negotiate, hand work back and forth, sometimes spawn sub-agents. The boundaries of the system are deliberately fuzzy.

This is where most of the hype lives. It’s also where most of the failed demos live.

We have shipped Level 3 systems. They can be magical. They’re also exponentially harder to debug, evaluate, and operate. The compute cost is real, the latency is real, the failure modes are creative, and observability is a research-grade problem.

Our rule of thumb: don’t go to Level 3 unless you’ve genuinely outgrown Level 2 — and if you have, you usually need a senior engineer dedicated to the system, not a side project.

Most companies asking for “multi-agent” really wanted a well-built Level 2 system with better tooling.

How to pick

Three questions:

  1. What’s the cost of a wrong autonomous action? If high, stay at Level 0 or 1, or invest heavily in Level 2 guardrails.
  2. How well-defined is the goal? If you can write the goal in one sentence and list the tools needed to pursue it, you’re a Level 2 candidate. If the goal needs to be discovered as the work unfolds, you’re looking at Level 3 — and you should triple-check whether you actually need it.
  3. How much engineering can you afford to maintain this thing? Level 0/1 systems are basically free to keep running. Level 2 needs ongoing tuning. Level 3 is its own engineering org.

Pick the lowest level that solves your problem. Resist the gravitational pull toward sexier architectures. The companies winning with this technology in 2026 are not the ones with the most agents — they’re the ones with the most boring, reliable, well-scoped agents running in the background.

That’s the un-sexy truth, and it’s also the one with the highest ROI.

Back to blog