AI Trust Research, Emilie Calmels

The Brief

A medical publisher was exploring whether an AI assistant could help healthcare professionals navigate complex scientific books faster. The prototype, an AI chatbot embedded alongside a digital library, was still early stage. The stakeholder question was simple: Can users trust this AI, and would it actually be useful in their workflow?

My role was to design and run the research, uncover whether the prototype was meaningful, and translate the findings into a strategic direction for the product team.

Research Design

I recruited 9 medical scientists at a publisher event and conducted prototype testing using a comparative format: participants experienced both a standalone AI chatbot version and an embedded side-by-side version where the AI panel and the source document were visible simultaneously.

I used a mix of task-based observation and qualitative interview probes, not just to measure usability, but to understand the mental model users brought to AI-assisted reading.

Embedded side-by-side layout: source document displayed alongside the AI chat panel — The embedded side-by-side layout: source document alongside the AI assistant

The Discovery: Preference for the Embedded Interaction Model

The comparative testing produced a striking result: 9 out of 9 participants chose the embedded side-by-side layout.

Initial feedback centered on practical benefits: "I can multitask," "I don't have to switch screens." But as I dug deeper in the qualitative follow-up, a more fundamental driver emerged. Users weren't just optimizing for convenience. They needed something more critical: visual evidence.

They needed to see the source and the AI response at the same time, not because the interface was awkward without it, but because they couldn't trust the AI without it.

The Insight: The Verification Loop

This is what I found most significant, and what I believe has the broadest implications for AI product design in expert domains.

Medical professionals do not consume AI outputs passively. Because the stakes of medical misinformation are so high, every participant engaged in what I call a Verification Loop: a real-time auditing behavior where they cross-referenced the AI's response against the source document, even when they rated the AI's speed and accuracy highly.

Even when the AI gave a correct answer, participants refused to close the source document. They used the open book as the "ground truth," scanning back and forth, verifying the AI's summary against the original text before acting on it.

For expert users, trust is not a prerequisite for use: it is something that must be earned continuously, in context, with every interaction.

The Strategic Recommendation: From Chat Interface to Grounding UI

This finding fundamentally reframed the product direction. The Verification Loop wasn't a usability problem to be solved by making the AI more accurate. It was a behavioral signal pointing to a missing design layer.

I recommended moving beyond a simple chat interface toward what I called a "Grounding UI": a design pattern where the AI doesn't just respond, but actively points to the evidence behind its response.

This led to two concrete feature requirements:

Inline Citations

Every AI response should reference the specific section, chapter, or paragraph it's drawing from, visible directly in the chat panel.

Auto-Scrolling

When a user asks a question, the source document should automatically scroll to the relevant passage, so the AI and the source are always in sync. The user should never have to hunt for the evidence themselves.

Together, these features reduce cognitive load by eliminating the need for users to manually audit AI outputs. By making verification effortless, the product earns trust through transparency, rather than asking users to simply take the AI's word for it.