Celery Worker doesn't start consuming tasks until 15 min after restart on Debian 12 #8445

houmie · 2023-08-18T19:00:50Z

I have come across a situation on Debian 12 with Celery 5.3.1, I even downgraded to Celery 5.2.7, but it didn't help.
I then reinstalled the VPS with Debian 11 with Celery 5.3.1 and it works instantly. Same code no difference.
I keep thinking how can I break down the problem further. How can this the fault of Debian 12? Has perhaps something changed in Debian 12 that affects running Celery as a service? I'm a bit speechless. For the time being I have to delay our migration to Debian 12 until I figure out what is affecting Celery.

On Debian 12, when I try to restart Celery it takes 5 min to shutdown and restart. There is no error in the logs.
After restart I see this in logs:

[2023-08-18 16:48:10,159: DEBUG/MainProcess] | Worker: Preparing bootsteps.
[2023-08-18 16:48:10,161: DEBUG/MainProcess] | Worker: Building graph...
[2023-08-18 16:48:10,161: DEBUG/MainProcess] | Worker: New boot order: {Beat, StateDB, Timer, Hub, Pool, Autoscaler, Consumer}
[2023-08-18 16:48:10,171: DEBUG/MainProcess] | Consumer: Preparing bootsteps.
[2023-08-18 16:48:10,172: DEBUG/MainProcess] | Consumer: Building graph...
[2023-08-18 16:48:10,186: DEBUG/MainProcess] | Consumer: New boot order: {Connection, Events, Heart, Agent, Mingle, Tasks, Control, Gossip, event loop}
[2023-08-18 16:48:10,196: DEBUG/MainProcess] | Worker: Starting Hub
[2023-08-18 16:48:10,196: DEBUG/MainProcess] ^-- substep ok
[2023-08-18 16:48:10,196: DEBUG/MainProcess] | Worker: Starting Pool
[2023-08-18 16:48:10,281: DEBUG/MainProcess] ^-- substep ok
[2023-08-18 16:48:10,282: DEBUG/MainProcess] | Worker: Starting Consumer
[2023-08-18 16:48:10,286: DEBUG/MainProcess] | Consumer: Starting Connection
[2023-08-18 16:52:31,271: INFO/MainProcess] Connected to redis://localhost:6379/0
[2023-08-18 16:52:31,272: DEBUG/MainProcess] ^-- substep ok
[2023-08-18 16:52:31,272: DEBUG/MainProcess] | Consumer: Starting Events
[2023-08-18 16:54:42,336: DEBUG/MainProcess] ^-- substep ok
[2023-08-18 16:54:42,338: DEBUG/MainProcess] | Consumer: Starting Heart
[2023-08-18 16:56:53,405: DEBUG/MainProcess] ^-- substep ok
[2023-08-18 16:56:53,406: DEBUG/MainProcess] | Consumer: Starting Tasks
[2023-08-18 16:56:53,414: DEBUG/MainProcess] ^-- substep ok
[2023-08-18 16:56:53,415: DEBUG/MainProcess] | Consumer: Starting Control
[2023-08-18 17:01:15,550: DEBUG/MainProcess] ^-- substep ok
[2023-08-18 17:01:15,551: DEBUG/MainProcess] | Consumer: Starting event loop
[2023-08-18 17:01:15,552: DEBUG/MainProcess] | Worker: Hub.register Pool...
[2023-08-18 17:01:15,553: INFO/MainProcess] worker@DE-FSN-XY-1 ready.
[2023-08-18 17:01:15,553: DEBUG/MainProcess] basic.qos: prefetch_count->4
[2023-08-18 17:01:15,560: INFO/MainProcess] Task notify.tasks.push[456cb981-7295-4f73-89aa-45dc855a6dbf] received

As you can see it takes 15 min for it to become available. Once it's online it consumes all the outstanding tasks very quickly in milliseconds:

[2023-08-18 17:43:22,754: INFO/MainProcess] Task notify.tasks.push[9efb499c-046e-4584-b6e7-46606afa7fc7] received
[2023-08-18 17:43:22,768: INFO/MainProcess] Task notify.tasks.push[ef7fd747-2857-4206-9cda-c9dd0fcb4b23] received
[2023-08-18 17:43:22,776: INFO/MainProcess] Task notify.tasks.push[3f047bc7-b158-4862-a947-1a9c4e211ca5] received
[2023-08-18 17:43:22,780: INFO/MainProcess] Task notify.tasks.push[668c1f8c-e890-4ed9-940c-cfabd96010d8] received
[2023-08-18 17:43:22,788: INFO/MainProcess] Task notify.tasks.push[c01576a4-1828-42e9-85bc-7f86f644154c] received
[2023-08-18 17:43:22,792: INFO/MainProcess] Task notify.tasks.push[936c6192-0ebe-4c54-a5bb-7e5aa570cb6c] received
[2023-08-18 17:43:22,804: INFO/MainProcess] Task notify.tasks.push[e26daea2-0796-4f89-8a1c-6ddf688ec6dc] received
[2023-08-18 17:43:22,807: INFO/MainProcess] Task notify.tasks.push[bf51497c-d49a-42d8-85d7-d37eea6c192f] received
[2023-08-18 17:43:22,812: INFO/MainProcess] Task notify.tasks.push[85ab3f68-d25d-46b3-ba32-49bf502f733c] received
[2023-08-18 17:43:22,816: INFO/MainProcess] Task notify.tasks.push[c1f75cc3-348a-4feb-884e-748850d43b1a] received

This is Celery Beat, it also has a hard time starting, but once it has started, but won't take long until the first beats get in.

[2023-08-18 17:55:45,113: INFO/MainProcess] Scheduler: Sending due task run-push-every-40-seconds (notify.tasks.push)
[2023-08-18 17:56:25,113: INFO/MainProcess] Scheduler: Sending due task run-push-every-40-seconds (notify.tasks.push)

However they won't be consumed until celery is up and running, which takes 15 min. After that delay everything is smooth, but 15+ min delay is not acceptable, in case something happens in production and we need to restart the service.

It is the same code, same task, same celery version, how comes it works perfectly fine on Debian 11, but hangs on Debian 12?

Has perhaps my way of creating the Celery service changed for Debian 12?

echo
echo "--- Configuring Celery ---"
echo
sudo useradd -m --system -s /bin/bash celery
sudo usermod -a -G admin celery

echo "mkdir /var/run/celery/
chown celery:celery /var/run/celery/
" | sudo tee /root/startup.sh >/dev/null

sudo mkdir /etc/celery/

echo "CELERY_APP=\"notify.celery:app\"
CELERYD_NODES=\"worker\"
CELERYD_OPTS=\"--without-mingle --without-gossip\"
CELERY_BIN=\"/home/admin/.pyenv/versions/venv/bin/celery\"
CELERYD_PID_FILE=\"/var/run/celery/%n.pid\"
CELERYD_LOG_FILE=\"/var/log/celery/%n%I.log\"
CELERYD_LOG_LEVEL=\"INFO\"
CELERYBEAT_PID_FILE=\"/var/run/celery/beat.pid\"
CELERYBEAT_LOG_FILE=\"/var/log/celery/beat.log\"
" | sudo tee /etc/celery/celery >/dev/null

sudo chown -R celery:celery /etc/celery/
sudo mkdir /var/run/celery/
sudo mkdir /var/log/celery/
sudo chown celery:celery /var/run/celery/
sudo chown celery:celery /var/log/celery/

echo "[Unit]
Description=Celery Service
After=network.target

[Service]
Type=forking
User=celery
Group=celery
EnvironmentFile=/etc/celery/celery
WorkingDirectory=$HOME/project/app/
ExecStart=/bin/sh -c '\${CELERY_BIN} -A \$CELERY_APP multi start \$CELERYD_NODES --pidfile=\${CELERYD_PID_FILE} --logfile=\${CELERYD_LOG_FILE} --loglevel=\"\${CELERYD_LOG_LEVEL}\" \$CELERYD_OPTS'
ExecStop=/bin/sh -c '\${CELERY_BIN} multi stopwait \$CELERYD_NODES --pidfile=\${CELERYD_PID_FILE} --logfile=\${CELERYD_LOG_FILE}'
ExecReload=/bin/sh -c '\${CELERY_BIN} -A \$CELERY_APP multi restart \$CELERYD_NODES --pidfile=\${CELERYD_PID_FILE} --logfile=\${CELERYD_LOG_FILE} --loglevel=\"\${CELERYD_LOG_LEVEL}\" \$CELERYD_OPTS'
Restart=always

[Install]
WantedBy=multi-user.target
" | sudo tee /etc/systemd/system/celery.service >/dev/null

sudo systemctl daemon-reload
sudo systemctl enable celery.service

echo "[Unit]
Description=Celery Beat Service
After=network.target

[Service]
Type=simple
User=celery
Group=celery
EnvironmentFile=/etc/celery/celery
WorkingDirectory=$HOME/project/app/
ExecStart=/bin/sh -c '\${CELERY_BIN} -A \${CELERY_APP} beat --pidfile=\${CELERYBEAT_PID_FILE} --logfile=\${CELERYBEAT_LOG_FILE} --loglevel=\${CELERYD_LOG_LEVEL} --schedule=/home/celery/celerybeat-schedule'
Restart=always

[Install]
WantedBy=multi-user.target
" | sudo tee /etc/systemd/system/celerybeat.service >/dev/null

sudo systemctl daemon-reload
sudo systemctl enable celerybeat.service

Thank you, much appreciated for any advice

The text was updated successfully, but these errors were encountered:

auvipy · 2023-08-21T03:26:30Z

from your comments it seems to be an issue with debian not celery, right?

houmie · 2023-08-21T07:59:35Z

Hello Asif,

Thank you for your reply. This is the problem though, if I take this to Debian, they will say check with the Celery team.

Ipv6 is disabled on the our VPS. I have a suspicion that the initial lag has something to do with Celery trying to connect via Ipv6 and after it fails for a long time, it switches to IPv4. This issue could be a bug that didn't happen with Debian 11.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

This issue was moved to a discussion.

Celery Worker doesn't start consuming tasks until 15 min after restart on Debian 12 #8445

Celery Worker doesn't start consuming tasks until 15 min after restart on Debian 12 #8445

houmie commented Aug 18, 2023 •

edited

auvipy commented Aug 21, 2023

houmie commented Aug 21, 2023

This issue was moved to a discussion.

This issue was moved to a discussion.

Celery Worker doesn't start consuming tasks until 15 min after restart on Debian 12 #8445

Celery Worker doesn't start consuming tasks until 15 min after restart on Debian 12 #8445

Comments

houmie commented Aug 18, 2023 • edited

auvipy commented Aug 21, 2023

houmie commented Aug 21, 2023

This issue was moved to a discussion.

houmie commented Aug 18, 2023 •

edited