Why Affinity Diagrams Fail Product Managers (And What to Do Instead)

Affinity diagrams feel productive. You walk out of a synthesis session with a wall of color-coded sticky notes, clean clusters, labels that sound like insight: "Users want more control" or "Onboarding is confusing." Someone took a photo. It's in Notion. Everyone nods.

Ask the PM two days later what those clusters mean for the roadmap and you often get a shrug — or worse, a confident but fuzzy statement that can't be traced back to any specific participant or decision.

That shrug is not a PM problem. It's a method problem.

What Affinity Diagrams Are Actually Good At

To be clear about the limit of this argument: affinity diagrams are not bad research. In the right context — exploratory generative work, early discovery when you genuinely don't know what you're looking for — they're a reasonable way to organize a large volume of messy, heterogeneous data. Karen Holtzblatt's contextual design tradition built serious products on affinity methods. The technique has earned its place.

The problem is what product teams are usually trying to do when they use them. By the time a PM is synthesizing interview data to prepare for a roadmap review, they are not in an exploratory phase. They have an existing set of bets, jobs, or themes they're tracking. They need evidence for or against specific propositions. Affinity diagramming is the wrong instrument for that task — the same way a kitchen thermometer is a legitimate tool but not the right one for measuring your tire pressure.

The Translation Tax

Here's the mechanical failure mode. You run eight customer interviews over two weeks. You and a researcher do a synthesis session: two hours, sticky notes, clusters emerge. You label them. You photograph them. You walk away with something like: "Cluster 4 — Handoff friction: users struggle with handoffs to other teams."

Now the roadmap review is Thursday. Your CPO asks: "Do we have evidence that the integration problem is more urgent than the notification problem?" You go back to Cluster 4. You can see that six stickies live there. But what exactly did those six people say? Were they all senior ICs or did they include managers? Were they current customers or churned accounts? Were any of them on the target segment for next quarter's roadmap item?

You can't answer those questions from the cluster. The cluster destroyed the information. That's the translation tax: the affinity method strips participant context in the act of organizing data. What's left is a category that represents the data, not the data itself.

Tony Ulwick's argument in Outcome-Driven Innovation is relevant here. Ulwick spent decades studying why product teams fail to build things customers actually want, and a core finding is that teams conflate a customer's solution request with their underlying job. The affinity cluster "Handoff friction" is a solution-request category — it bundles what people complained about without anchoring to what they were trying to accomplish. You can't build a roadmap argument on it because it doesn't tell you whether the job is important or underserved.

Why JTBD Theming Produces Different Output

The alternative isn't to abandon synthesis — it's to change what the synthesis is anchored to. Jobs-to-be-done theming starts from the other end: you enter a synthesis session with explicit job statements your team has already agreed on. Something like: "When I'm preparing a quarterly roadmap review, I want to present prioritization decisions as evidence-backed, so I'm seen as credible by engineering leadership." Or: "When I've completed a round of discovery interviews, I want to share the findings with a PM who wasn't in the room, without that PM misinterpreting the data."

Every interview utterance gets evaluated against those jobs. The synthesis question isn't "what theme does this belong to?" — it's "which job does this provide evidence for, and how strong is that evidence?" The output is a set of quotes, each mapped to a job, each traceable back to a participant and a session timestamp.

Bob Moesta, who worked with Clayton Christensen to develop the demand-side version of JTBD theory, describes the difference as moving from supply-side thinking (what did users say?) to demand-side thinking (what is the customer trying to accomplish, and what's getting in the way?). The cluster "Handoff friction" is supply-side — it's organized around what users reported. The JTBD mapping "five participants in the 'handoff to PM' job said they abandon the process at the attachment step" is demand-side — it's organized around the job and the obstacle.

A Concrete Failure Pattern

A 60-person B2B SaaS company ran their quarterly discovery cycle: ten interviews with enterprise customers, standard synthesis process, affinity diagram delivered as a Miro board. The research team presented their findings to the PM group. Three clusters were labeled as "top themes." The PMs took notes.

Four weeks later, a product decision came up that required evidence from those interviews. The researcher was no longer available — she had moved to a different project. The Miro board existed but nobody could tell which specific customers had expressed which specific concerns, because the stickies had been anonymized during the synthesis session ("makes things go faster"). There was no quote-level attribution, no participant IDs, no session timestamps.

The PM made the decision based on engineering estimates and a gut read on the Miro clusters. Not because the research was bad — the interviews were good — but because the synthesis method had made the underlying data irretrievable. Two months later, the shipped feature got negative feedback from exactly the segment that had flagged the problem in those original interviews. The evidence had been in the room. The synthesis process had buried it.

What Synthesis That Survives a Roadmap Review Looks Like

Synthesis output that holds up in a roadmap review has three properties that affinity clusters rarely do.

First, it's traceable. Every claim has a source: a participant identifier (not a name, but a stable ID), a session date, a transcript timestamp. If your CPO asks "who said that?" you can answer in 30 seconds.

Second, it's mapped to a decision-relevant frame. Not a generic theme — a job your roadmap is actually tracking. The quote doesn't just mean "users struggle with handoffs." It means "four participants in the cross-team coordination job expressed abandonment at the attachment step — this is the highest-density obstacle for that job in this data set."

Third, it reflects sample density. Four people expressing the same concern about the same job is different from one person saying something interesting. Affinity diagrams make this visible through sticky-note count, which is a crude proxy. Job-mapped synthesis makes it precise: you know how many sessions generated evidence for each job and whether that count is growing or plateauing.

When you run synthesis through GetWhys, the output isn't a cluster map — it's quote cards, each tagged with a job statement from your imported framework, each carrying a participant ID, session date, and transcript timestamp. The cluster has been replaced by a structured evidence artifact. That artifact can be put directly into a roadmap review doc without a translation step.

The Cost of the Status Quo

The affinity diagram persists in product teams partly because it's taught as the default synthesis method in UX curricula, and partly because it produces something that looks like analysis — a wall of organized notes feels like progress. But the PM community is getting more sophisticated about evidence. Stakeholders who ask "how many users said that?" are asking the right question, and "we had a whole cluster about it" is not a satisfying answer.

The research teams that are earning the most credibility in roadmap reviews right now are the ones who show up with a quote, a participant ID, a session date, and a JTBD job tag. That combination is citable, auditable, and defensible. It took a different synthesis method to produce it — but it took the same interviews you were already running.

The bottleneck has never been the interviews. It's always been what happens to them afterward.