We identified the problem. There was an index in our database wrongfully removed in preparation for a maintenance task*. This caused regular queries to be very slow and caused a complete slowdown. We recreated the index immediately and as soon as that happened everything was back to normal.
The downtime was across our biggest region and impacted all clients on that region. We were down for approximately 24 minutes with intermittent uptime (from 09:21 CET until 09:45 CET).
*We are creating new indices in preparation of bigger architectural work, but one of the newly created indices was invalid. This was missed during the validation of the indices. The cleanup of the old index actually caused a missing index (as the replacement was invalid). We're writing up an internal process to make this impossible in the future.
Posted Sep 10, 2021 - 09:47 CEST
Silverfin is currently down. First signs point to a lock in our database holding up several queries. We're investigating, and will update in 15 minutes or earlier when we know more.