On a Tuesday morning, every inbound call to SalesSheet stopped working. No call events were being processed. No call recordings were saved. No call activities appeared in the timeline. The calling feature was effectively dead, and we had no idea why.
The error logs showed a single line repeated thousands of times: Webhook signature verification failed - rejecting request. We had deployed a security upgrade the night before. It was supposed to be a routine hardening change. Instead, it silently broke the most critical real-time integration in our system.
Security upgrades that break production at 2 AM are not security upgrades. They are incidents that happen to involve security code.
When our telephony provider sends us a webhook (call started, call ended, recording ready), the request includes two headers: X-Signature and X-Timestamp. The signature is an Ed25519 digital signature computed over the timestamp concatenated with the raw request body. To verify, we combine the timestamp and body, then use the provider's public key to check the signature.
Ed25519 is an elliptic curve signature scheme that is fast, deterministic, and resistant to timing attacks. Unlike HMAC-SHA256 (which uses a shared secret), Ed25519 uses asymmetric keys. The provider holds the private key and we hold the public key. This means even if our server is compromised, an attacker cannot forge webhook signatures because they do not have the private key.
Our original implementation used soft-fail verification. If the signature check failed, we logged a warning but still processed the webhook. This was intentional during development -- it allowed us to test with curl and local tunnels without worrying about signatures. The plan was always to switch to hard-fail before launch.
The "security upgrade" was exactly that switch: changing the verification from soft-fail (log and continue) to hard-fail (reject with 401). The code change was three lines.
The verification logic was correct for 90% of webhooks. But there was a subtle bug in how we read the request body. Our Supabase Edge Function used the Deno runtime, and the request body is a ReadableStream that can only be consumed once. Our middleware chain looked like this:
Step 1 consumed the stream. Step 2 received an empty string. Ed25519 verification of an empty string against a signature computed over the actual body will always fail. Under soft-fail, the warning was buried in thousands of other log lines. Under hard-fail, it became a 100% failure rate.
Not all our webhooks went through the same Edge Function. Email sync webhooks used a different function without the logger middleware. Contact enrichment webhooks used a different function with the body pre-cached. Only the calling webhook function had the logger-then-verify middleware order. That is why calls broke while everything else continued working -- making the incident harder to diagnose.
The fix had two parts:
We added a body-caching step at the very beginning of the middleware chain. The first thing the function does is read the body into a string and attach it to the request context. All downstream middleware reads from the cached string instead of the stream. This ensures the verification middleware always sees the complete body.
We restructured the middleware chain so that signature verification is always the first step, before any other middleware touches the request. If verification fails, the function returns 401 immediately without logging, parsing, or processing anything. This is not just a correctness fix -- it is also a security improvement. A forged webhook should be rejected before any server resources are spent on it.
The new middleware order:
The incident exposed a gap in our test coverage. We had unit tests for the Ed25519 verification function itself, but we did not have integration tests that verified the complete middleware chain with a real signature. After the fix, we added:
The test you do not write is the bug you will ship. We had perfect unit test coverage for Ed25519 verification and zero integration test coverage for the middleware that called it.
If you are building webhook integrations, here is what this incident taught us:
Our calling feature has been running on hard-fail Ed25519 verification for six months now with zero false rejections. The security is real, the performance overhead is negligible (Ed25519 verification takes under 1 millisecond), and the incident that got us here made us permanently better at deploying security changes.