ENGINEERING

How We Stopped Our AI from Making Things Up

Andres MuguiraFebruary 17, 20266 min read
AIHallucinationSafetyData IntegrityTrust
← Back to Blog
Summarize with AI

The Problem: AI That Invents Data

Ask a generic AI chatbot to "email John at Acme" and it will confidently generate a plausible-looking email address — john.smith@acme.com — that may not exist. The format looks right. The domain is real. But the specific address is fabricated. In a CRM, that means sending real business emails to fake addresses, getting bounces, and damaging your sender reputation. Or worse — the AI invents a phone number, a deal amount, or a contract date that looks real but is completely fabricated, and you make business decisions based on fiction.

This is the hallucination problem, and it is fundamentally different in a CRM context than in a general-purpose chatbot. When ChatGPT hallucinates a historical fact, the worst case is that someone learns something incorrect. When a CRM AI hallucinates a contact's email address, the worst case is that you send a confidential pricing proposal to a stranger. When it hallucinates a deal value, you might commit to a revenue forecast that has no basis in reality. The stakes are categorically higher because CRM data drives real business actions.

We built three layers of hallucination prevention into SalesSheet's AI assistant to ensure that every piece of data the AI references is grounded in your actual CRM records. The system is designed to be conservative — it would rather tell you it does not know something than confidently present invented data.

What Hallucinations Look Like in a CRM

Before diving into our approach, it is worth understanding the specific ways AI hallucinations manifest in CRM data. We identified four common patterns during development that guided our detection strategy:

Each of these patterns is dangerous specifically because the output looks correct. A sales rep who trusts the AI's response will act on it — calling the wrong number, quoting the wrong price, or showing up to a meeting that does not exist. The hallucination problem in CRM is not about obviously wrong answers; it is about subtly wrong answers that pass the smell test.

AI catching a potential hallucination and asking for clarification

Safeguard One: Org-Scoped Data Grounding

The first and most important safeguard is architectural: the AI never answers questions about your data from its training knowledge. Every query that references contacts, deals, activities, or emails triggers a real database search against your organization's actual CRM records. When you type "email John at Acme," the AI does not generate an email address — it searches your contacts table for records matching "John" at a company matching "Acme" and returns the email address stored in your database.

This is implemented through Claude's tool-use capability. We define a set of tools that the AI can call — search_contacts, search_deals, get_recent_activities, get_email_threads — and each tool executes a real query against your Supabase database. The AI receives structured results from these queries and uses them to compose its response. It cannot reference data that does not come from one of these tool calls, because its system prompt explicitly instructs it to use tools for any data lookup rather than relying on general knowledge.

The org-scoping is critical here. Every database query is filtered by the user's organization ID, enforced at the Row Level Security (RLS) policy level in Supabase. Even if the AI somehow constructed a query for a different organization's data, the database would return empty results. This is the same organization sharing infrastructure that powers our team collaboration features, repurposed as a security boundary for AI data access.

AI searching real organization data before composing a response

Safeguard Two: Hallucination Detection and Confidence Scoring

Even with data grounding, hallucinations can still occur. The AI might receive search results for three contacts named "John" and then mix attributes between them in its response. Or it might extrapolate from partial data — seeing that a deal was last updated two weeks ago and incorrectly stating that a meeting happened two weeks ago. Our second safeguard catches these cases through post-generation validation.

After the AI generates a response, a validation layer compares every data point in the response against the raw tool results. If the AI mentions an email address, the validator checks whether that exact email appears in the search results. If the AI states a deal value, the validator confirms the number matches a real record. Any data point that cannot be traced back to a tool result is flagged as a potential hallucination.

When a potential hallucination is detected, the system does not silently correct it or ignore it. Instead, it surfaces the uncertainty to the user. If you ask "what is Sarah's phone number?" and the AI's response contains a number that does not match any record, the response will include a clarification: "I found multiple contacts named Sarah. Could you specify which one you mean?" This is a deliberate design choice — we would rather ask a follow-up question than present possibly wrong data with false confidence.

Safeguard Three: Fallback Text Builder

The third safeguard handles a different class of failure: when the AI returns malformed output. Large language models occasionally return broken JSON, incomplete tool calls, or responses that our parser cannot interpret. In a traditional implementation, this results in an error message or a blank screen. In SalesSheet, a fallback text builder catches these failures and produces clean, safe text output.

The fallback builder works by extracting whatever usable content exists in the malformed response and presenting it as plain text, stripped of any data claims that cannot be verified. If the AI was trying to show contact details but the JSON was corrupted, the fallback says something like "I found a contact matching your query but encountered an issue formatting the details. Let me try searching again." It then retries the query with a simplified prompt that is less likely to trigger a formatting error.

This matters more than it might seem. Sales reps use the AI assistant during live conversations with prospects — on calls, in meetings, between back-to-back demos. If the AI breaks and shows an error, they lose trust in the tool and stop using it. The fallback builder ensures that the AI always produces something useful, even when the underlying model has a hiccup. Reliability builds trust, and trust is what drives adoption.

Technical Implementation with the Claude API

Our AI pipeline uses Claude as the underlying model, accessed through Anthropic's API via a Supabase Edge Function. The system prompt is carefully structured to reinforce data grounding. It explicitly tells the model: never fabricate contact information, always use the provided tools to look up data, and if a query returns no results, say so rather than guessing. We also include examples of correct behavior — showing the model what a grounded response looks like versus a hallucinated one.

The tool definitions include strict schemas that constrain what the model can return. The search_contacts tool, for example, returns a structured object with specific fields: name, email, phone, company, and deal_count. The model can only reference these fields in its response, which prevents it from inventing additional attributes like "last meeting date" that are not part of the search result. This schema-based constraint is one of the most effective anti-hallucination techniques we have found — it limits the surface area for fabrication.

We also implement what we call "citation chaining." When the AI references a specific data point, it internally tags which tool call and which result index the data came from. This chain is not visible to the user, but it powers the validation layer described above. If a data point has no citation chain — meaning the model produced it without a corresponding tool result — the validator catches it before it reaches the user.

Comparison: Guarded vs. Unguarded AI in CRM

To understand why these safeguards matter, consider what happens without them. We tested an unguarded AI — same model, same prompts, but without data grounding or hallucination detection — and asked it common CRM questions. The results were instructive. Asked "what is Maria's email?" it confidently generated maria@company.com — an address that does not exist in the CRM. Asked "when is my next meeting with the Acme team?" it invented a date and time. Asked "what is the total pipeline value this quarter?" it estimated a number based on the conversation context rather than actual deal data.

Every one of these responses looked plausible. A rep in a hurry would trust them. That is exactly the failure mode we designed against. Our guarded AI, given the same questions, either returns verified data from the CRM or asks for clarification. It is a less impressive demo — the responses take a fraction of a second longer and sometimes include follow-up questions instead of instant answers. But it is trustworthy, and in a CRM, trust is the only thing that matters.

Wrong data in a CRM cascades. A fabricated email leads to a bounced message, a confused prospect, and a lost deal. Hallucination prevention is not a nice-to-have — it is essential for any AI that touches business data.

Why This Matters for Sales Teams

Sales teams operate on trust — trust between the rep and the prospect, trust between the rep and their manager, and trust between the rep and their tools. If a CRM AI gives wrong information once, the rep stops trusting it. If the rep stops trusting the AI, they stop using it. If they stop using it, you have paid for an AI-powered CRM that nobody uses for AI. The entire value proposition collapses.

Our approach to AI-native CRM design is built on the principle that AI should augment the rep's judgment, not replace it. The AI finds data faster than the rep could manually search for it. It drafts emails in the rep's personal writing style. It summarizes call recordings so the rep does not have to take notes. But it never presents fabricated data as fact, and it always makes its uncertainty visible. That is the line between a useful AI assistant and a liability.

Try SalesSheet Free

No credit card required. Start selling smarter today.

Start Free Trial