Delays in sending scheduled email and text messages
Incident Report for GOV.UK Notify
Postmortem

Delays in processing some scheduled tasks

After the timezone change from BST to GMT at 2am BST/1am GMT on Sunday 30th October, we saw intermittent issues with some of our regularly-scheduled tasks failing to run. The most prominent impact of this was that scheduled bulk email/text message sends may not have gone out exactly on time, instead going out 15 or 30 minutes later.

We use a piece of software called celery to schedule and run our regular tasks, such as processing bulk notification jobs. There are some open issues on celery around jobs failing to be scheduled when a timezone change (such as daylight savings) happens, which we weren’t previously aware of. As the issues have not been fixed in celery yet, we are not simply able to upgrade the dependency and resolve the problem for the future, so need to explore other options.

To fix the immediate problem of tasks not being scheduled, celery just needs to be restarted. Jobs will then start to be scheduled and run on time again. Our fix during the incident was to re-deploy our celery application, and we will make sure that happens automatically after future timezone changes until the issue is fixed permanently.

We will also improve our monitoring and alerting around scheduled tasks (specifically bulk email/text message sends) running on time so that we can proactively handle the situation in the future. before users notice any significant impact.

Posted Nov 09, 2022 - 11:54 GMT

Resolved
There have been no further issues with delays since we restarted the scheduling service at 10:45 this morning. We will continue to keep an eye on the system over the next day or two to ensure it keeps working.

Apologies to any delays this may have caused to your services yesterday and this morning.
Posted Oct 31, 2022 - 17:01 GMT
Monitoring
We have restarted the affected scheduling service which should resolve the delays in processing bulk emails and text messages since the timezone change.

We will continue to monitor the situation to ensure all tasks are being processed correctly and on time throughout the morning and take further action if needed.
Posted Oct 31, 2022 - 11:29 GMT
Investigating
Since the change from British Summer Time (BST) to Greenwich Mean Time (GMT) at 02:00am on Sunday 30th October, some of our regularly scheduled tasks have not been processed.

This may cause delays in the sending of bulk emails and text messages that are scheduled to go out at a specific time.

This does not affect emails or text messages that are sent immediately, for example all notifications sent via the API and any notifications sent via the web interface that don't use scheduling.
Posted Oct 31, 2022 - 11:00 GMT
This incident affected: Text message sending and Email sending.