Status | Blacksmith - GitHub webhooks degraded causing job queueing – Incident details

GitHub webhooks degraded causing job queueing

Resolved
Degraded performance
Started 3 days agoLasted about 9 hours

Affected

Blacksmith Managed Runners

Degraded performance from 2:50 PM to 11:39 PM, Operational from 9:22 PM to 11:39 PM

eu-central ARM

Degraded performance from 2:50 PM to 9:22 PM, Operational from 9:22 PM to 11:39 PM

eu-central x86

Degraded performance from 2:50 PM to 9:22 PM, Operational from 9:22 PM to 11:39 PM

us-west ARM

Degraded performance from 2:50 PM to 9:22 PM, Operational from 9:22 PM to 11:39 PM

us-west x86

Degraded performance from 2:50 PM to 11:39 PM

eu-west x86

Degraded performance from 2:50 PM to 9:22 PM, Operational from 9:22 PM to 11:39 PM

Updates
  • Resolved
    Resolved
    This incident has been resolved.
  • Update
    Update

    We are close to the end of the backlog of the queued tasks and are seeing full recovery in certain runner pools.

  • Update
    Update

    We're still working through a substantial backlog of queued tasks that has accumulated over this period of delayed webhook arrivals.

  • Update
    Update

    We're still seeing a large backlog of queued jobs due to the incident that the system is working through, we're exploring mitigations.

  • Monitoring
    Monitoring

    We're seeing upstream recovery for GitHub webhook deliveries. Jobs may queue as we process GitHub's backlog of webhook events.

  • Investigating
    Investigating

    We're seeing evidence of webhooks delivery being degraded from GitHub. We're investigating.