Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Celery Worker doesn't start consuming tasks until 15 min after restart on Debian 12 #8445

Closed
houmie opened this issue Aug 18, 2023 · 2 comments

Comments

@houmie
Copy link

houmie commented Aug 18, 2023

I have come across a situation on Debian 12 with Celery 5.3.1, I even downgraded to Celery 5.2.7, but it didn't help.
I then reinstalled the VPS with Debian 11 with Celery 5.3.1 and it works instantly. Same code no difference.
I keep thinking how can I break down the problem further. How can this the fault of Debian 12? Has perhaps something changed in Debian 12 that affects running Celery as a service? I'm a bit speechless. For the time being I have to delay our migration to Debian 12 until I figure out what is affecting Celery.

On Debian 12, when I try to restart Celery it takes 5 min to shutdown and restart. There is no error in the logs.
After restart I see this in logs:

[2023-08-18 16:48:10,159: DEBUG/MainProcess] | Worker: Preparing bootsteps.
[2023-08-18 16:48:10,161: DEBUG/MainProcess] | Worker: Building graph...
[2023-08-18 16:48:10,161: DEBUG/MainProcess] | Worker: New boot order: {Beat, StateDB, Timer, Hub, Pool, Autoscaler, Consumer}
[2023-08-18 16:48:10,171: DEBUG/MainProcess] | Consumer: Preparing bootsteps.
[2023-08-18 16:48:10,172: DEBUG/MainProcess] | Consumer: Building graph...
[2023-08-18 16:48:10,186: DEBUG/MainProcess] | Consumer: New boot order: {Connection, Events, Heart, Agent, Mingle, Tasks, Control, Gossip, event loop}
[2023-08-18 16:48:10,196: DEBUG/MainProcess] | Worker: Starting Hub
[2023-08-18 16:48:10,196: DEBUG/MainProcess] ^-- substep ok
[2023-08-18 16:48:10,196: DEBUG/MainProcess] | Worker: Starting Pool
[2023-08-18 16:48:10,281: DEBUG/MainProcess] ^-- substep ok
[2023-08-18 16:48:10,282: DEBUG/MainProcess] | Worker: Starting Consumer
[2023-08-18 16:48:10,286: DEBUG/MainProcess] | Consumer: Starting Connection
[2023-08-18 16:52:31,271: INFO/MainProcess] Connected to redis://localhost:6379/0
[2023-08-18 16:52:31,272: DEBUG/MainProcess] ^-- substep ok
[2023-08-18 16:52:31,272: DEBUG/MainProcess] | Consumer: Starting Events
[2023-08-18 16:54:42,336: DEBUG/MainProcess] ^-- substep ok
[2023-08-18 16:54:42,338: DEBUG/MainProcess] | Consumer: Starting Heart
[2023-08-18 16:56:53,405: DEBUG/MainProcess] ^-- substep ok
[2023-08-18 16:56:53,406: DEBUG/MainProcess] | Consumer: Starting Tasks
[2023-08-18 16:56:53,414: DEBUG/MainProcess] ^-- substep ok
[2023-08-18 16:56:53,415: DEBUG/MainProcess] | Consumer: Starting Control
[2023-08-18 17:01:15,550: DEBUG/MainProcess] ^-- substep ok
[2023-08-18 17:01:15,551: DEBUG/MainProcess] | Consumer: Starting event loop
[2023-08-18 17:01:15,552: DEBUG/MainProcess] | Worker: Hub.register Pool...
[2023-08-18 17:01:15,553: INFO/MainProcess] worker@DE-FSN-XY-1 ready.
[2023-08-18 17:01:15,553: DEBUG/MainProcess] basic.qos: prefetch_count->4
[2023-08-18 17:01:15,560: INFO/MainProcess] Task notify.tasks.push[456cb981-7295-4f73-89aa-45dc855a6dbf] received

As you can see it takes 15 min for it to become available. Once it's online it consumes all the outstanding tasks very quickly in milliseconds:

[2023-08-18 17:43:22,754: INFO/MainProcess] Task notify.tasks.push[9efb499c-046e-4584-b6e7-46606afa7fc7] received
[2023-08-18 17:43:22,768: INFO/MainProcess] Task notify.tasks.push[ef7fd747-2857-4206-9cda-c9dd0fcb4b23] received
[2023-08-18 17:43:22,776: INFO/MainProcess] Task notify.tasks.push[3f047bc7-b158-4862-a947-1a9c4e211ca5] received
[2023-08-18 17:43:22,780: INFO/MainProcess] Task notify.tasks.push[668c1f8c-e890-4ed9-940c-cfabd96010d8] received
[2023-08-18 17:43:22,788: INFO/MainProcess] Task notify.tasks.push[c01576a4-1828-42e9-85bc-7f86f644154c] received
[2023-08-18 17:43:22,792: INFO/MainProcess] Task notify.tasks.push[936c6192-0ebe-4c54-a5bb-7e5aa570cb6c] received
[2023-08-18 17:43:22,804: INFO/MainProcess] Task notify.tasks.push[e26daea2-0796-4f89-8a1c-6ddf688ec6dc] received
[2023-08-18 17:43:22,807: INFO/MainProcess] Task notify.tasks.push[bf51497c-d49a-42d8-85d7-d37eea6c192f] received
[2023-08-18 17:43:22,812: INFO/MainProcess] Task notify.tasks.push[85ab3f68-d25d-46b3-ba32-49bf502f733c] received
[2023-08-18 17:43:22,816: INFO/MainProcess] Task notify.tasks.push[c1f75cc3-348a-4feb-884e-748850d43b1a] received

This is Celery Beat, it also has a hard time starting, but once it has started, but won't take long until the first beats get in.

[2023-08-18 17:55:45,113: INFO/MainProcess] Scheduler: Sending due task run-push-every-40-seconds (notify.tasks.push)
[2023-08-18 17:56:25,113: INFO/MainProcess] Scheduler: Sending due task run-push-every-40-seconds (notify.tasks.push)

However they won't be consumed until celery is up and running, which takes 15 min. After that delay everything is smooth, but 15+ min delay is not acceptable, in case something happens in production and we need to restart the service.

It is the same code, same task, same celery version, how comes it works perfectly fine on Debian 11, but hangs on Debian 12?

Has perhaps my way of creating the Celery service changed for Debian 12?

echo
echo "--- Configuring Celery ---"
echo
sudo useradd -m --system -s /bin/bash celery
sudo usermod -a -G admin celery

echo "mkdir /var/run/celery/
chown celery:celery /var/run/celery/
" | sudo tee /root/startup.sh >/dev/null

sudo mkdir /etc/celery/

echo "CELERY_APP=\"notify.celery:app\"
CELERYD_NODES=\"worker\"
CELERYD_OPTS=\"--without-mingle --without-gossip\"
CELERY_BIN=\"/home/admin/.pyenv/versions/venv/bin/celery\"
CELERYD_PID_FILE=\"/var/run/celery/%n.pid\"
CELERYD_LOG_FILE=\"/var/log/celery/%n%I.log\"
CELERYD_LOG_LEVEL=\"INFO\"
CELERYBEAT_PID_FILE=\"/var/run/celery/beat.pid\"
CELERYBEAT_LOG_FILE=\"/var/log/celery/beat.log\"
" | sudo tee /etc/celery/celery >/dev/null

sudo chown -R celery:celery /etc/celery/
sudo mkdir /var/run/celery/
sudo mkdir /var/log/celery/
sudo chown celery:celery /var/run/celery/
sudo chown celery:celery /var/log/celery/

echo "[Unit]
Description=Celery Service
After=network.target

[Service]
Type=forking
User=celery
Group=celery
EnvironmentFile=/etc/celery/celery
WorkingDirectory=$HOME/project/app/
ExecStart=/bin/sh -c '\${CELERY_BIN} -A \$CELERY_APP multi start \$CELERYD_NODES --pidfile=\${CELERYD_PID_FILE} --logfile=\${CELERYD_LOG_FILE} --loglevel=\"\${CELERYD_LOG_LEVEL}\" \$CELERYD_OPTS'
ExecStop=/bin/sh -c '\${CELERY_BIN} multi stopwait \$CELERYD_NODES --pidfile=\${CELERYD_PID_FILE} --logfile=\${CELERYD_LOG_FILE}'
ExecReload=/bin/sh -c '\${CELERY_BIN} -A \$CELERY_APP multi restart \$CELERYD_NODES --pidfile=\${CELERYD_PID_FILE} --logfile=\${CELERYD_LOG_FILE} --loglevel=\"\${CELERYD_LOG_LEVEL}\" \$CELERYD_OPTS'
Restart=always

[Install]
WantedBy=multi-user.target
" | sudo tee /etc/systemd/system/celery.service >/dev/null

sudo systemctl daemon-reload
sudo systemctl enable celery.service

echo "[Unit]
Description=Celery Beat Service
After=network.target

[Service]
Type=simple
User=celery
Group=celery
EnvironmentFile=/etc/celery/celery
WorkingDirectory=$HOME/project/app/
ExecStart=/bin/sh -c '\${CELERY_BIN} -A \${CELERY_APP} beat --pidfile=\${CELERYBEAT_PID_FILE} --logfile=\${CELERYBEAT_LOG_FILE} --loglevel=\${CELERYD_LOG_LEVEL} --schedule=/home/celery/celerybeat-schedule'
Restart=always

[Install]
WantedBy=multi-user.target
" | sudo tee /etc/systemd/system/celerybeat.service >/dev/null

sudo systemctl daemon-reload
sudo systemctl enable celerybeat.service

Thank you, much appreciated for any advice

@auvipy
Copy link
Member

auvipy commented Aug 21, 2023

from your comments it seems to be an issue with debian not celery, right?

@houmie
Copy link
Author

houmie commented Aug 21, 2023

Hello Asif,

Thank you for your reply. This is the problem though, if I take this to Debian, they will say check with the Celery team.

Ipv6 is disabled on the our VPS. I have a suspicion that the initial lag has something to do with Celery trying to connect via Ipv6 and after it fails for a long time, it switches to IPv4. This issue could be a bug that didn't happen with Debian 11.

Is there an option in Celery to force it using IPv4 when it connects to Redis?

I will then test more and report back.
Thank you

@celery celery locked and limited conversation to collaborators Aug 31, 2023
@auvipy auvipy converted this issue into discussion #8467 Aug 31, 2023

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Projects
None yet
Development

No branches or pull requests

2 participants