# Method: what it costs an AI to read a Mendix™ microflow

> **Reproducible.** This is the full method behind the "~29 → 0" / "94.4% fewer tokens" claim, on one
> *real* flow, so a technical buyer can re-run it and check the numbers themselves. It measures the two
> things that actually determine whether an AI can work with a Mendix model: **how many tokens** the
> representation costs, and **how many blind inference steps** the AI must perform before it understands
> the logic.

## The flow under test
`ClaudiusCore.ACT_ApproveRequest`: a real microflow from the Claudius app (in the data pack).
Logic: *if status is Submitted or InReview → set Reviewer, approve, commit, write an audit entry,
close the popup; else show a validation warning.* 10 nodes, 9 edges.

## The three representations compared
1. **The model's native JSON graph** (what an AI gets today): the flow as a bag of nodes wired by
   opaque UUID edges, no execution order, no inline names. (The data pack ships this exact graph.)
2. **mxto typed IR (MIR)**: the same flow lifted into typed, ordered records.
3. **Axon**, mxto's read-optimized projection: the flow as ~13 lines read top-to-bottom, references
   inline by name. (The data pack ships the MIR + Axon for this flow.)

## Axis 1: tokens (measurable, exact)
Tokeniser: `tiktoken` `o200k_base` (the GPT-4o family encoding; `cl100k` agrees within 0.3 pp).

| Representation | Tokens | vs raw graph |
|---|---:|:---:|
| Raw model JSON graph | 2,592 | (baseline) |
| MIR (typed IR) | 1,148 | −55.7% |
| **Axon** | **145** | **−94.4% (17.9 : 1)** |

The per-flow raw graph is already a *cleaned* extraction (nodes/edges pre-separated, expressions
pre-stringified), so 2,592 is a conservative, graph-favourable baseline; the raw model object graph
is heavier still.

## Axis 2: inference steps (the part that breaks agents)
Counted from the graph's own structure. These are the discrete operations an AI must perform to recover the
program before it can reason about it:

| Operation (on the raw graph) | Count | Why |
|---|:---:|---|
| UUID cross-reference resolutions | 21 | 9 edges × 2 node-UUIDs each + start + 2 ends; each resolved against the node array to recover wiring |
| Branch disambiguations | 2 | read each split edge's `caseValue` to know which path runs |
| Nested-action unwraps | 6 | each action wraps an inner object whose type+properties must be read |
| **Total** | **≈ 29** | …to comprehend one ~13-line flow |

In Axon the same flow is **0** resolutions: references are inline names, control flow is lexical,
operations are named verbs read in execution order.

## Reproduce it
```bash
python3 - <<'PY'
import tiktoken
enc = tiktoken.get_encoding('o200k_base')
# point these at the three files shipped in the data pack:
for name, path in [('raw graph','sdk-graph.json'), ('MIR','flow.mir.yaml'), ('Axon','flow.axon')]:
    print(name, len(enc.encode(open(path).read())), 'tokens')
PY
```
Inference-step counts are computed directly from the graph JSON: count `edges` × 2 + `startNodeId` +
`endNodeId`s for resolutions; `caseValue` edges for branches; action-wrapper objects for unwraps.

## Honest scope (what this is, and isn't)
- This measures **cost and error-surface** (tokens + blind resolution steps): the structural reasons
  the raw graph is hard for an AI and Axon is easy. Both axes are deterministic and re-runnable.
- It is **not** an end-to-end AI-accuracy A/B (feed an LLM each representation, score comprehension
  accuracy). That is a stronger, separable experiment we can run on request; this file does not claim
  to be one.
- It compounds at estate scale: one flow is 2,592 → 145 tokens / ≈29 → 0 steps; a large estate carries
  thousands of flows, where the raw-graph representation saturates the context window with bookkeeping.