Proportion of steps in each high-level category. Normalised so models are comparable despite different step counts.
Frequency per 100 steps, compared across the selected models.
Cumulative share of runs that finished within N steps. Dashed line marks the 250-step cap.
Stacked area chart: how the mix of actions evolves from start to end of the average trajectory. Panels ordered by resolve rate (descending).