Q & A of connecting Apache Kafka and Django
Problem #1
Kafka consumer should always be listening for new messages in the queue. The consumer should be running in parallel to GraphSpace app (producer).
Solution
There are multiple ways to do this.
- We create another Django project say called "GraphSpace notification consumer" which starts along the GraphSpace application and establishes a connection with Kafka. This is an overkill for a simple consumer.
- We can use celery for multi-threading. Celery itself uses Redis or RabbitMQ as a queue for tasks. This setup might work for a large scale multi-threading system but for a simple setup of running a consumer this is a overkill. We will be running 2 queues (Redis and Kafka), one for queuing task and another for queuing content. I tried this setup with the following architecture:
- I will recommend keeping the architecture simple, run the consumer code on a different thread in daemon mode.
Problem #2
Where to start this consumer thread in Django?
Solution
The current setup of GraphSpace app with notification is as follows
The consumer thread is started in the wsgi settings. Django's internal server runs the wsgi.py file on loading the project. The location of this file is given in the settings file under WSGI_APPLICATION variable. We could have started the consumer thread in the settings.py* file but this might lead to a circular dependency problem. This is caused as we have to save the models in the consumer thread which requires all the settings file (settings.py, local.py/production.py). To start the consumer we need settings like the Kafka server location, this will be different for local and production environment. By starting the thread in the wsgi.py file we keep our code modular.
*the settings.py file can be a folder meaning having multiple configurations/setting files for different environments
The consumer thread is started in the wsgi settings. Django's internal server runs the wsgi.py file on loading the project. The location of this file is given in the settings file under WSGI_APPLICATION variable. We could have started the consumer thread in the settings.py* file but this might lead to a circular dependency problem. This is caused as we have to save the models in the consumer thread which requires all the settings file (settings.py, local.py/production.py). To start the consumer we need settings like the Kafka server location, this will be different for local and production environment. By starting the thread in the wsgi.py file we keep our code modular.
*the settings.py file can be a folder meaning having multiple configurations/setting files for different environments
Great write up, really informative.
ReplyDeleteI have tried to setup implement similar solution but then after starting the consumer thread in wsgi.py, I keep getting 504 http response via api endpoint. Do you have or know of a work around it?
A bit too late for a reply, but have a look at: https://github.com/addu390/django-kafka
Delete