1,600 Tests and Counting: How We Sleep at Night

The Speed Paradox

There is a persistent myth in startup culture that tests slow you down. The argument goes like this: writing tests takes time, maintaining tests takes time, running tests takes time, therefore tests are a tax on velocity. This reasoning is seductive because it is half true. Tests do take time to write. But the conclusion is backwards. Tests do not slow you down. They are the reason you can go fast.

Terminal output from our test suite: 1,626 tests passed across 32 suites in 87.2 seconds.

SalesSheet has 1,626 automated tests as of this writing. Our CI pipeline runs all of them in under 90 seconds. Every commit that lands on main has passed every test. In the last 30 days, our tests have caught 23 regressions that would have reached production without them. Each regression catch saved us an estimated 2-4 hours of debugging, user-reported issue triage, and hotfix deployment. That is 46 to 92 hours of firefighting avoided in a single month.

Tests are not a tax on velocity. They are the compound interest. Every test you write today saves debugging time tomorrow, next week, and next year. The payoff grows as the codebase grows.

Test Factories: The Foundation

The single most important testing decision we made was investing in test factories. A test factory is a function that generates realistic test data with all required fields populated and sensible defaults. Instead of writing test data by hand for every test case, you call the factory and override only the fields that matter for your specific test.

Here is why this matters. A contact in SalesSheet has 28 fields. A deal has 22 fields. An email message has 34 fields. If every test that involves a contact had to manually specify all 28 fields, test files would be 80% boilerplate data and 20% actual test logic. Worse, when we add a new required field to a contact, every single test that creates a contact would break.

Our test factory pattern: composable helpers that generate realistic contacts and deals with sensible defaults.

How Our Factories Work

Each factory function returns a complete, valid object with randomized but realistic data. The contact factory generates a first name, last name, email, phone number, company, title, and all other required fields using deterministic random generation seeded by the test name. This means the same test always generates the same data, making failures reproducible.

Factories compose. The deal factory automatically creates an associated contact using the contact factory. The email factory creates an associated contact and thread. The activity factory creates an associated contact and user. When you need a complete scenario with a contact, three deals, five emails, and two call logs, you call the factories in sequence and they wire up all the foreign key relationships automatically.

Contact factory: generates name, email, phone, company, title, address, tags, custom fields, created/updated timestamps
Deal factory: generates title, value, stage, probability, close date, associated contact, pipeline assignment
Email factory: generates subject, body (plain and HTML), sender, recipients, thread ID, message ID, timestamps, labels
Activity factory: generates type (call, meeting, task, note), description, duration, outcome, associated entities
User factory: generates name, email, role, permissions, settings, avatar URL

When we added the WhatsApp message schema last month, creating the WhatsApp factory took 20 minutes. Writing the first 15 tests for WhatsApp message handling took another hour. Without factories, that same work would have taken a full day of crafting test data by hand.

Smoke Tests: The Safety Net

Smoke tests are the broadest, simplest tests in our suite. They do not test business logic. They test that things exist and respond. Does the login page render? Does the contact list endpoint return a 200? Does the deal pipeline component mount without throwing? Can the email composer open and close?

We have 142 smoke tests. They run in 8 seconds. They catch an outsized number of issues because the most common regression in a web application is not a subtle logic error — it is a missing import, a renamed component, or a broken route. Smoke tests catch all of these instantly.

Our Smoke Test Pattern

Every page in SalesSheet has a smoke test that renders it with mock data and asserts that it does not throw an error. Every API endpoint has a smoke test that sends a minimal valid request and asserts a successful response. Every React component that accepts props has a smoke test that renders it with factory-generated props and asserts it produces non-empty output.

The discipline is simple: no component ships without a smoke test. The smoke test is the first test written for any new feature, before any business logic tests. It takes 30 seconds to write and runs in milliseconds. The return on that 30-second investment is enormous, because it guarantees that the component at least renders. You would be surprised how many bugs are just "this component does not render at all."

Supabase Query Testing: The Real Database

The most controversial decision in our testing strategy is testing against a real database. Most testing advice says to mock your database layer. We tried that. It did not work for us.

The problem with mocking Supabase is that our queries use Supabase-specific features: row-level security policies, database functions, real-time subscriptions, and PostgREST filtering syntax. Mocking these features means reimplementing them in test code, which defeats the purpose of testing. You end up testing your mock, not your query.

The Local Supabase Approach

Instead, we run a local Supabase instance using the Supabase CLI. The local instance mirrors our production schema exactly, including all tables, views, functions, triggers, and row-level security policies. Before each test run, we reset the database to a clean state and seed it with factory-generated data.

This approach gives us true confidence that our queries work correctly. When we test that a user can only see their own contacts, the test runs against real row-level security policies, not a mock that pretends to enforce them. When we test that a database function calculates deal probability correctly, it runs the actual PostgreSQL function.

The tradeoff is speed. Database tests are slower than mocked tests. A single database test takes 50-200ms compared to 1-5ms for a mocked test. We mitigate this by running database tests in parallel across 4 worker threads and by keeping each test focused on a single query or operation. Our 180 database tests complete in about 12 seconds total, which is acceptable for our CI pipeline.

Mock tests tell you that your code calls the right function with the right arguments. Database tests tell you that the right function returns the right data. Only one of those prevents production bugs.

The CI Pipeline

Our CI pipeline runs on every push to any branch. The full sequence takes 87 seconds on average:

The full CI pipeline: Lint, Type Check, Unit Tests, DB Tests, Smoke Tests, and Build -- all in 87 seconds.

TypeScript compilation (12s): Catches type errors, missing imports, and syntax issues
Lint check (8s): Enforces code style, catches common mistakes, rejects as any
Unit tests (22s): 1,304 tests covering business logic, utility functions, and component rendering
Database tests (12s): 180 tests against local Supabase instance
Smoke tests (8s): 142 tests for page rendering and endpoint availability
Build check (25s): Ensures the production build completes without errors

If any step fails, the pipeline stops and the commit is blocked from merging. We do not have a "tests are flaky, merge anyway" culture. If a test fails, it is either a real regression or a broken test, and both need to be fixed before merging.

What 1,600 Tests Actually Feels Like

The number itself is not the point. What matters is the confidence it creates. When we shipped the passwordless auth migration, we changed every authentication flow in the app. The test suite told us, within 90 seconds, that nothing else broke. When we rewrote the Gmail sync engine, the email tests confirmed that thread grouping, deduplication, and HTML rendering still worked correctly.

This confidence is what lets a small team ship 97 commits in 8 days. Each commit goes through the full test suite. Each commit is either green or red. There is no ambiguity, no "I think it works," no "let me manually test this real quick." The tests are the truth, and the truth runs in 90 seconds.

The Next Milestone: 2,000

We are actively working toward 2,000 tests. The areas with the thinnest coverage are:

Real-time subscriptions: testing that live updates propagate correctly when data changes
File uploads: testing image resizing, format conversion, and storage quota enforcement
Webhook handlers: testing all edge cases for Twilio, Stripe, and Gmail push notifications
Accessibility: automated a11y checks on every component to catch contrast, ARIA, and keyboard navigation issues

Every test we add makes the next feature safer to ship. Every feature we ship without breaking existing functionality proves that the testing investment was worth it. The compound interest keeps compounding.

Try SalesSheet Free

No credit card required. Start selling smarter today.

Start Free Trial