Engineering

How We Rebuilt Email Sync from Scratch (And Wrote 90 Tests)

By Andres Muguira September 8, 2025 7 min read
Email CRM Integration Gmail CRM Sync Engineering Testing
Summarize with AI

Email sync is the hardest feature in a CRM. It sounds simple — pull emails from Gmail, show them on a timeline. But the reality involves threading, deduplication, progressive loading, quoted-content stripping, read/unread state tracking, attachment handling, multi-user attribution, and a dozen other edge cases that only surface when you have real users with real inboxes.

Our first email sync implementation worked, but it was fragile. It would occasionally miss emails during rapid syncs, show duplicates when the same thread arrived from different queries, and slow down noticeably for contacts with long email histories. So we rebuilt it from scratch.

This post describes the architecture of the new system and the testing strategy that gives us confidence it actually works.

Email timeline with threaded messages, notes, and the relationship pill at the bottom

Progressive Loading: Show What You Have, Fetch What You Need

When you open a contact's timeline in SalesSheet, the system needs to show you something useful immediately while continuing to load older history in the background. The old implementation would fetch all emails upfront and display a loading spinner until everything arrived. For contacts with hundreds of emails, this meant waiting several seconds for the initial render.

The rebuilt system uses a three-tier loading strategy. First, it renders any activities already cached locally — notes, calls, and recently synced emails appear instantly. Second, it triggers a sync for the latest emails from Gmail using the contact's email address as a server-side filter. Third, it sets up infinite scroll loading so older emails are fetched on demand as the user scrolls down the timeline.

The infinite scroll implementation uses a debounced trigger at the 60% scroll threshold. When the user has scrolled past 60% of the current content, we start pre-fetching the next batch. This means the content appears before the user reaches the bottom, creating a seamless experience. A one-second cooldown prevents rapid-fire triggers from overloading the API.

Server-Side Filtering

The Gmail API supports querying emails by participant address. Instead of pulling all emails and filtering on the client, we pass the contact's email as a query parameter to the API. This reduces the data transferred by orders of magnitude for users with large inboxes. A single contact might have 50 emails, but the inbox might contain 50,000. Fetching all 50,000 and filtering client-side is not a viable approach.

The server-side filter uses Gmail's search syntax to match both the from and to fields. We also filter out drafts, spam, and trash to ensure the timeline only shows relevant correspondence. The query is constructed dynamically and includes the correct encoding for email addresses with special characters.

The Diff/Merge Strategy

Email sync has a fundamental deduplication challenge. When you fetch the latest 20 emails, some of them might already exist in the local database from a previous sync. The old system used a simple check-and-skip approach, but it broke down when emails arrived out of order or when the Gmail API returned overlapping result sets across pagination boundaries.

The rebuilt system uses a diff/merge strategy. Each email has a unique Gmail message ID that serves as the primary key. During sync, we fetch a batch of message IDs from the API, compare them against the set of IDs already stored locally, and only download the full message content for new IDs. This reduces API calls significantly because fetching a list of message IDs is much cheaper than fetching full message bodies.

The merge step handles updates to existing emails — read/unread state changes, label modifications, and thread relationship updates. Each local email record includes a sync timestamp, and we only update records where the Gmail API reports a more recent modification time.

Threaded Email Display

Gmail groups emails into threads, but the threading model is more complex than most people realize. A single thread can include emails from multiple participants, forwarded messages, and inline replies. Our ThreadedEmailItem component handles this by extracting the main content from each message while stripping quoted sections.

The content extraction logic uses multiple strategies. For HTML emails, it removes blockquote elements, Gmail-specific quote dividers, and content that appears after "On ... wrote:" patterns. For plain-text emails, it scans line by line and stops when it encounters a line starting with ">" or the quote attribution pattern. The result is a clean display of just the new content in each email, without the cascading quotes that make threaded emails unreadable.

90 tests across 4 test suites with 94% statement coverage

90 Tests: What We Test and Why

Email sync is exactly the kind of feature that needs comprehensive automated testing. The logic is complex, the edge cases are numerous, and manual testing is painfully slow because it requires real Gmail accounts with specific email patterns.

We wrote 90 tests across four test suites, covering the full stack of email sync logic:

The test suite achieves 94% statement coverage and runs in 3.2 seconds. We run it on every commit and every pull request. If a test fails, the deploy is blocked.

Auto-Triggered Deep Sync

One of the more interesting engineering challenges was the "deep sync" feature. When the user scrolls to the bottom of the timeline and the local database is exhausted (hasMoreEmails becomes false), the system automatically triggers a deeper Gmail sync to fetch older history. This happens transparently — the user sees a loading pill at the bottom of the timeline while the sync runs in the background.

The deep sync uses a deduplication key composed of the contact email and the current email count. This prevents the same deep sync from being triggered multiple times for the same state. When the sync completes and new emails are added to the database, the timeline reactively updates and the user can continue scrolling.

What We Got Right (And What We Would Do Differently)

The progressive loading strategy was the biggest improvement. Users no longer stare at a spinner while the system fetches hundreds of emails. The timeline renders instantly with cached data and fills in progressively.

The testing investment was worth it many times over. We caught at least a dozen edge-case bugs during development that would have been extremely difficult to debug in production — race conditions in concurrent syncs, off-by-one errors in pagination cursors, and incorrect merge behavior when emails were modified between sync cycles.

If we were starting over, we would build the diff/merge layer as a standalone module with its own comprehensive test suite from day one, rather than evolving it from the simpler check-and-skip approach. The migration was painful because the data model had to change mid-flight while maintaining backward compatibility with existing synced emails. Starting with the right abstraction saves time in the long run, even if it feels like over-engineering at the beginning.

For more on how we use the email timeline in the product, see our post on the relationship pill — a small UX detail that sits at the bottom of every email timeline and changes how you think about your contacts.

Email That Just Works

SalesSheet syncs your Gmail automatically. See every conversation on a unified timeline.

Try SalesSheet Free — No Credit Card