ENGINEERING

How We Made Gmail Sync 50x Faster

Andres MuguiraFebruary 26, 20268 min read
GmailPerformanceAPIEmail
← Back to Blog
Summarize with AI

The Sync That Took 47 Seconds

When we first launched Gmail sync in SalesSheet, it worked like this: every time a user opened their email view, we called the Gmail API's messages.list endpoint to get the latest 100 message IDs. Then we called messages.get for each message to fetch the full content. Then we compared the fetched messages against our database, inserted new ones, and updated any that had changed labels or read status.

For a user with a busy inbox, this process took 47 seconds. Not on first sync — on every sync. Every time they navigated to their email view, they waited 47 seconds for their inbox to appear. This was unacceptable.

47 seconds is not a performance problem. It is a product failure. No user will tolerate waiting 47 seconds to see their email in a CRM when Gmail itself loads in under 2 seconds.
Before: full-sync architecture re-fetching 100 messages on every cycle — 47 seconds total, 3.2s per contact

Why Full Sync Was Broken

The fundamental problem was architectural. We were doing a full sync every time when we only needed an incremental sync. Gmail's inbox might receive 2-3 new messages in the 10 minutes between sync cycles. But our sync engine was re-fetching and re-comparing 100 messages every time, doing 97 redundant operations to find the 3 that were new.

The overhead broke down like this:

Fix 1: The History API Migration

Gmail provides a History API specifically designed for incremental sync. Instead of fetching all messages and comparing them, you store a history ID from your last sync and ask the API: "what changed since this history ID?" The API returns only the new messages, deleted messages, and label changes that occurred since then.

The migration was straightforward in concept but required careful handling of edge cases:

After: History API incremental sync processing only 2-5 changes per cycle — 0.7 seconds total, 67x faster

The Happy Path

On each sync cycle, we call history.list with the stored history ID. The response contains a list of changes: messagesAdded, messagesDeleted, and labelsAdded/labelsRemoved. We process only these changes. For a typical 10-minute window, this means processing 2-5 changes instead of 100 messages. The sync completes in under 1 second.

The History Expiration Edge Case

Gmail history IDs expire after approximately 7 days. If a user does not open SalesSheet for a week, their stored history ID is no longer valid. The API returns a 404 error. When this happens, we fall back to a one-time full sync to re-establish the baseline, then resume incremental syncing from the new history ID. We added a background job that refreshes the history ID daily, even if the user is not active, to prevent this fallback from occurring during an active session.

The Partial Failure Edge Case

What happens if the sync process crashes midway through processing a history batch? We might have processed the first 3 changes but not the last 2, yet the history ID would have been updated to reflect all 5. On the next sync, those 2 unprocessed changes would be skipped. We solved this by updating the history ID only after all changes in the batch have been successfully committed to the database. If the process crashes, the next sync will re-fetch and re-process the same batch from the beginning. This is idempotent because our message storage uses upserts keyed on the Gmail message ID.

Fix 2: The Hardcoded Null Bug

While profiling the remaining performance bottlenecks after the History API migration, we discovered a bug that had been silently degrading performance since launch. In our email parsing code, there was a function that extracted the sender's name and email from the Gmail message headers. When the "From" header was missing (rare but possible for draft messages and system notifications), the function returned null. But the null was not handled downstream.

The contact matching logic that associates an email with a CRM contact was comparing the sender email against every contact's email field. When the sender was null, the comparison null === contact.email always returned false, which is correct. But the query that performed this comparison was not using an index because the WHERE clause included a COALESCE(sender_email, '') expression that prevented index usage. Every email with a null sender triggered a full table scan of the contacts table.

The fix was two lines: add an early return when the sender is null (since there is nothing to match), and remove the COALESCE from the query. But finding this bug required tracing a 47ms query that should have been 0.7ms, which only became visible after the History API migration reduced the noise from the much larger sync overhead.

Code diff: the hardcoded null bug fix — WHERE contact_id = null (full table scan) to WHERE contact_id = $1 (indexed lookup)
The worst performance bugs are the ones hidden behind bigger performance problems. Fix the big problem first, then measure again. There is always another layer.

Fix 3: Batch Query Optimization

The original sync wrote each message to the database individually. 100 messages meant 100 INSERT or UPDATE statements, each with its own round trip to the database. After the History API migration, the typical batch was only 2-5 messages, so this was less catastrophic. But during the initial full sync for new users or history ID expiration, we still needed to process 100+ messages efficiently.

We replaced individual upserts with batch operations:

The batch approach reduced database time for a 100-message sync from 12 seconds to 340 milliseconds. For the typical 2-5 message incremental sync, database time dropped to under 15 milliseconds.

Fix 4: Parse Cache

MIME parsing is surprisingly expensive. A single email with HTML content, inline images, and two attachments takes 20-50ms to parse. For the full sync path, we added a parse cache that stores the parsed result alongside the raw message. On subsequent syncs, if the raw message has not changed (same internal date and history ID), we skip parsing entirely and use the cached result.

This optimization had no impact on incremental syncs (where every message is new by definition) but reduced the full sync path from 7 seconds of parsing to under 200ms, since most messages in a full sync have already been parsed in previous syncs.

The Results

After all four optimizations, the sync performance looks like this:

The improvement from 47 seconds to 0.7 seconds is a 67x speedup. We conservatively call it 50x because the full sync path (which some users still hit occasionally) averages 4 seconds. But for the daily experience, it is closer to 70x.

The User Impact

Before the optimization, users complained about email sync. It was the second most common negative feedback after authentication issues (which we also fixed recently). After the optimization, email sync complaints dropped to zero. Not reduced — eliminated entirely. When sync takes under a second, users do not even notice it is happening. The emails are simply there when they navigate to the email view, as if they had always been there.

That is the goal of performance work: not to make things feel fast, but to make the wait disappear entirely. When the technology gets out of the way, users can focus on what they came to do — sell.

Try SalesSheet Free

No credit card required. Start selling smarter today.

Start Free Trial