Ed25519 Webhook Verification: The Soft-Fail to Hard-Fail Journey

The Incident

On a Tuesday morning, every inbound call to SalesSheet stopped working. No call events were being processed. No call recordings were saved. No call activities appeared in the timeline. The calling feature was effectively dead, and we had no idea why.

The error logs showed a single line repeated thousands of times: Webhook signature verification failed - rejecting request. We had deployed a security upgrade the night before. It was supposed to be a routine hardening change. Instead, it silently broke the most critical real-time integration in our system.

Security upgrades that break production at 2 AM are not security upgrades. They are incidents that happen to involve security code.

Background: How Webhook Signatures Work

When our telephony provider sends us a webhook (call started, call ended, recording ready), the request includes two headers: X-Signature and X-Timestamp. The signature is an Ed25519 digital signature computed over the timestamp concatenated with the raw request body. To verify, we combine the timestamp and body, then use the provider's public key to check the signature.

HTTP request diagram showing the X-SalesSheet-Signature and X-SalesSheet-Timestamp headers sent with every webhook

Ed25519 is an elliptic curve signature scheme that is fast, deterministic, and resistant to timing attacks. Unlike HMAC-SHA256 (which uses a shared secret), Ed25519 uses asymmetric keys. The provider holds the private key and we hold the public key. This means even if our server is compromised, an attacker cannot forge webhook signatures because they do not have the private key.

Soft-Fail vs. Hard-Fail

Our original implementation used soft-fail verification. If the signature check failed, we logged a warning but still processed the webhook. This was intentional during development -- it allowed us to test with curl and local tunnels without worrying about signatures. The plan was always to switch to hard-fail before launch.

The "security upgrade" was exactly that switch: changing the verification from soft-fail (log and continue) to hard-fail (reject with 401). The code change was three lines.

Before/After: soft-fail (log warning and continue) vs. hard-fail (reject with 401 Unauthorized)

What Went Wrong

The verification logic was correct for 90% of webhooks. But there was a subtle bug in how we read the request body. Our Supabase Edge Function used the Deno runtime, and the request body is a ReadableStream that can only be consumed once. Our middleware chain looked like this:

Logger middleware reads the body to log the payload
Verification middleware reads the body to check the signature
Handler reads the body to process the event

Step 1 consumed the stream. Step 2 received an empty string. Ed25519 verification of an empty string against a signature computed over the actual body will always fail. Under soft-fail, the warning was buried in thousands of other log lines. Under hard-fail, it became a 100% failure rate.

Why It Only Affected Call Events

Not all our webhooks went through the same Edge Function. Email sync webhooks used a different function without the logger middleware. Contact enrichment webhooks used a different function with the body pre-cached. Only the calling webhook function had the logger-then-verify middleware order. That is why calls broke while everything else continued working -- making the incident harder to diagnose.

The Fix

The fix had two parts:

Immediate: Cache the Raw Body

We added a body-caching step at the very beginning of the middleware chain. The first thing the function does is read the body into a string and attach it to the request context. All downstream middleware reads from the cached string instead of the stream. This ensures the verification middleware always sees the complete body.

Structural: Verification-First Architecture

We restructured the middleware chain so that signature verification is always the first step, before any other middleware touches the request. If verification fails, the function returns 401 immediately without logging, parsing, or processing anything. This is not just a correctness fix -- it is also a security improvement. A forged webhook should be rejected before any server resources are spent on it.

The new middleware order:

Cache raw body into context
Verify Ed25519 signature (hard-fail with 401)
Parse JSON body
Log event metadata (not the full body)
Route to handler

Ed25519 verification function: timestamp validation + signature check using @noble/ed25519

Testing Webhook Signatures

The incident exposed a gap in our test coverage. We had unit tests for the Ed25519 verification function itself, but we did not have integration tests that verified the complete middleware chain with a real signature. After the fix, we added:

Signed request test -- generates a real Ed25519 signature using a test keypair and sends it through the full middleware chain. Asserts 200 response.
Invalid signature test -- sends a request with a tampered body but valid signature. Asserts 401 response.
Missing signature test -- sends a request without signature headers. Asserts 401 response.
Replay attack test -- sends a valid signed request with a timestamp older than 5 minutes. Asserts 401 response (timestamp validation prevents replay attacks).

The test you do not write is the bug you will ship. We had perfect unit test coverage for Ed25519 verification and zero integration test coverage for the middleware that called it.

Lessons for Your Webhook Security

If you are building webhook integrations, here is what this incident taught us:

Never deploy soft-fail to hard-fail in one step. Add monitoring first. Run both modes in parallel for a week, logging every case where soft-fail would have allowed a request that hard-fail would reject. If there are zero such cases, the switch is safe. If there are any, investigate before flipping.
Request body streams are single-use. In serverless environments (Deno, Cloudflare Workers, Vercel Edge Functions), the request body is a ReadableStream. Cache it early or you will lose it.
Integration tests beat unit tests for middleware. The Ed25519 math was never wrong. The plumbing around it was. Test the plumbing.
Deploy security changes during business hours. We deployed at 11 PM to "minimize impact." Instead, we maximized time-to-detection because nobody was watching the dashboards.

Our calling feature has been running on hard-fail Ed25519 verification for six months now with zero false rejections. The security is real, the performance overhead is negligible (Ed25519 verification takes under 1 millisecond), and the incident that got us here made us permanently better at deploying security changes.

Try SalesSheet Free

No credit card required. Start selling smarter today.

Start Free Trial