How do you run a worker with AWS Elastic Beanstalk?

django amazon-web-services celery amazon-elastic-beanstalk

15,229

Solution 1

As @chris-wheadon suggested in his comment, you should try to run celery as a deamon in the background. AWS Elastic Beanstalk uses supervisord already to run some deamon processes. So you can leverage that to run celeryd and avoid creating a custom AMI for this. It works nicely for me.

What I do is to programatically add a celeryd config file to the instance after the app is deployed to it by EB. The tricky part is that the file needs to set the required environmental variables for the deamon (such as AWS access keys if you use S3 or other services in your app).

Below there is a copy of the script that I use, add this script to your .ebextensions folder that configures your EB environment.

The setup script creates a file in the /opt/elasticbeanstalk/hooks/appdeploy/post/ folder (documentation) that lives on all EB instances. Any shell script in there will be executed post deployment. The shell script that is placed there works as follows:

In the celeryenv variable, the virutalenv environment is stored in a format that follows the supervisord notation. This is a comma separated list of env variables.
Then the script creates a variable celeryconf that contains the configuration file as a string, which includes the previously parsed env variables.
This variable is then piped into a file called celeryd.conf, a supervisord configuration file for the celery daemon.
Finally, the path to the newly created config file is added to the main supervisord.conf file, if it is not already there.

Here is a copy of the script:

files:
  "/opt/elasticbeanstalk/hooks/appdeploy/post/run_supervised_celeryd.sh":
    mode: "000755"
    owner: root
    group: root
    content: |
      #!/usr/bin/env bash

      # Get django environment variables
      celeryenv=`cat /opt/python/current/env | tr '\n' ',' | sed 's/export //g' | sed 's/$PATH/%(ENV_PATH)s/g' | sed 's/$PYTHONPATH//g' | sed 's/$LD_LIBRARY_PATH//g'`
      celeryenv=${celeryenv%?}

      # Create celery configuraiton script
      celeryconf="[program:celeryd]
      ; Set full path to celery program if using virtualenv
      command=/opt/python/run/venv/bin/celery worker -A myappname --loglevel=INFO

      directory=/opt/python/current/app
      user=nobody
      numprocs=1
      stdout_logfile=/var/log/celery-worker.log
      stderr_logfile=/var/log/celery-worker.log
      autostart=true
      autorestart=true
      startsecs=10

      ; Need to wait for currently executing tasks to finish at shutdown.
      ; Increase this if you have very long running tasks.
      stopwaitsecs = 600

      ; When resorting to send SIGKILL to the program to terminate it
      ; send SIGKILL to its whole process group instead,
      ; taking care of its children as well.
      killasgroup=true

      ; if rabbitmq is supervised, set its priority higher
      ; so it starts first
      priority=998

      environment=$celeryenv"

      # Create the celery supervisord conf script
      echo "$celeryconf" | tee /opt/python/etc/celery.conf

      # Add configuration script to supervisord conf (if not there already)
      if ! grep -Fxq "[include]" /opt/python/etc/supervisord.conf
          then
          echo "[include]" | tee -a /opt/python/etc/supervisord.conf
          echo "files: celery.conf" | tee -a /opt/python/etc/supervisord.conf
      fi

      # Reread the supervisord config
      supervisorctl -c /opt/python/etc/supervisord.conf reread

      # Update supervisord in cache without restarting all services
      supervisorctl -c /opt/python/etc/supervisord.conf update

      # Start/Restart celeryd through supervisord
      supervisorctl -c /opt/python/etc/supervisord.conf restart celeryd

Solution 2

I was trying to do something similar in PHP however for whatever reason I couldn't keep the worker running. I switched to a AMI on an EC2 server and have had success ever since.

Solution 3

For those using Elasticbeanstalk with Rails & Sidekiq. Here's a collection of ebextensions that ultimately did the trick for me:

https://gist.github.com/ctrlaltdylan/f75b2e38bbbf725acb6d48283fc2f174

15,229

Maxime P

Updated on June 06, 2022

Comments

Maxime P about 2 years
I am launching a Django application on AWS Elastic Beanstalk. I'd like to run a background task or worker in order to run celery.

I can not find if it is possible or not. If yes how could it be achieved?

Here is what I am doing right now, but this is producing an event type error every time.
```
container_commands:
  01_syncdb:
    command: "django-admin.py syncdb --noinput"
    leader_only: true
  50_sqs_email:
    command: "./manage.py celery worker --loglevel=info"
    leader_only: true
```
- EsseTi over 11 years
  
  what kind of error do you have?
- Chris Wheadon over 11 years
  
  I suspect you need to run celery in daemon mode: docs.celeryproject.org/en/latest/tutorials/… which would require a custom AMI for your beanstalk. This is not for the fainthearted as suggested here: docs.aws.amazon.com/elasticbeanstalk/latest/dg/…
- Zaar Hai over 11 years
  
  I think you can find an answer here: stackoverflow.com/questions/12813586/…
- DataGreed about 4 years
  
  If you want something lighter than celery, you can try pypi.org/project/django-eb-sqs-worker package - it uses Amazon SQS for queueing tasks.
Admin almost 10 years

Thank you for posting this! Celery and EB have been a challenge, but your solution seems to work! I found an issue however: if there's a % sign in an environment variable supervisord throws a formatting error. I believe % is escaped by adding an additional %, like %%. Is there any way to format the env vars to add that extra % to all %? github.com/Supervisor/supervisor/issues/291
yellowcap almost 10 years

In that case you could add an additional find/replace piece to the part where the environmental variables are parsed. For instance, sed 's/%/%%/g' will replace any % with %%. The command chain at the beginning of the script does a bunch of string replacements to make the env vars list supervisord compatible. So try adding it after the first command: cat /opt/python/current/env | tr '\n' ',' | sed 's/%/%%/g' | ...
neurix almost 9 years

@yellowcap Thank you for the great and detailed answer!
AliBZ almost 8 years

This definitely works but there are some issues with it. If you do this, your web and worker instances are tied to each other. So if the load on your workers increases, you are scaling both your web and workers instances. The other issue is if you have a celery beat task, you will end up with duplicate tasks if you scale up. You must only have 1 instance running your celery beat. I know the second issue is not related to what this question is about, but a project with celery workers can have celery beat as well.
yellowcap almost 8 years

Yes of course ideally you would have two separate instances running! The above setup is useful if you don't have the resources to buy several servers and you want to squeeze out as much as you can from each instance. I am running a low traffic Django app on a single small instance, for that it works great. And even if you have several instances, you might not want to "reserve" one just for the worker. That depends entirely on the use case. Agreed on the celery beat side, that would duplicate tasks so it would not be a good solution for celery beat if you have multiple instances.
Cagatay Barin almost 8 years

I've created a script named "99-celery.config" and copied your script but it didn't work. Can you help me? Should I configure anything about supervisor on my local computer? stackoverflow.com/questions/38566456/…
Evan Chu almost 8 years

somehow in my ec2, supervisorctl is not available as a command...but I got it working, thanks a bunch. OP should accept this answer.
Dr Manhattan almost 8 years

for the duplicate tasks, use a central cache server like redis or memcached and create a lock so that other instances dont reun the same task twice
smentek over 7 years

This is great help but like it was mentioned scalability requires execution on main node only. So container_commands should be used instead since it allows usage of leader_only option. I used 2 commands. First for creating the bash file, then second for executing it. This is my solution for django app: stackoverflow.com/questions/41161691/…)
Paul Wasson over 7 years

Your code worked fine until I decide to migrate some variables which were in my settings.py to my Elastic Beanstalk environment properties. Indeed, I have the following error when the script is called : for \'environment\' is badly formatted'>: file: /usr/lib64/python2.7/xmlrpclib.py line: 800 celeryd: ERROR (no such process) Thanks for the help.
Yasser Sinjab over 4 years

I did the same too