None

Django analytics middleware


Server - side tracking with Django, Celery and google measurement protocol.

By Kostas Koutsogiannopoulos

Introduction

The most widely used analytics implementation is including a tracking code in every web page you want to track. Your analytics provider is generating the code and a relevant tracking id which identifies your application. When a client program (like web browser) visits your page, the tracking code is running at the same time, sending data to the provider about the specific tracking id. This is the client - side way of user tracking.

Why server - side?

  1. Proxy servers or "ad blocker" browser plugins like ABP, uBlock usualy block access to your analytics provider. In fact, your users are blocking analytics depending on how sensitive they are about their privacy. Consequently the data is not reliable.
  2. You probably want to provide your analytics provider, with the data required for your own analysis. Not necessarily with all the data they are able to gather running code on your user's machine.
  3. A browser running javascript may be not be the case with your application. You may want to gather data about an API usage, mobile requests, custom events etc. You may want to follow your own tracking logic.

Django middleware

Django middleware is a good point to do the tracking job because it stays between every HTTPRequest and HTTPResponse. It is basicaly a regular python class with some methods called at request phase and some  other called at response phase. For details about django middleware check the documentation: https://docs.djangoproject.com/en/

Tracking logic, Requirements, Dependencies

So our middleware need to do the following things:

  1. Generate a unique tracking id for each user making requests
  2. Ensure that every request from one user is tracked with the same tracking id
  3. Exclude some preset web pages from tracking (for example, admin pages or rss pages)
  4. Exclude error HTTP responses from tracking (track only HTTP status code 200)
  5. Do the tracking asyncronously (obviously we do not want our users to wait for en external resource)

We are going to implement the 2. by using cookies technology, so another requirement occurs. We are required by law to:

    1. Inform the user that we are using cookie technology for tracking him.

The "Django messages framework" (https://docs.djangoproject.com/en/2.0/ref/contrib/messages/) is handy for us for implementing the 2.1. requirement.

We will also use the Celery infrastructure described in another article for implementing the 5. requirement (asyncronous tracking).

For this demonstration will use Google analytics as analytics provider and meausurement protocol.

Of course, we suggest you to use your own analytics infrastructure and be the owner of your data. You can use both proprietary and open source products like  or . Using Google Analytics, obviously, you are aware that your data is retained "for ever" and may be used by Google and/or its associates.

Analytics application

Assuming that you already have a project in the current directory, create the new app:

$ python manage.py startapp analytics

Create a file named "tracker.py". We will use it as our tracking library:

 ./analytics/tracker.py

import random
import uuid

from django.conf import settings

VERSION = settings.ANALYTICS_API_VERSION
COOKIE_NAME = settings.ANALYTICS_COOKIE_NAME
COOKIE_PATH = settings.ANALYTICS_COOKIE_PATH
COOKIE_AGE = settings.ANALYTICS_COOKIE_AGE
ANALYTICS_ID = settings.ANALYTICS_ID


def cookie_exists(request):
    cookie = request.COOKIES.get(COOKIE_NAME)
    if cookie:
        return True
    else:
        return False


def set_cookie(visitor_id, response):
    response.set_cookie(
        COOKIE_NAME,
        value=visitor_id,
        max_age=COOKIE_AGE,
        path=COOKIE_PATH,
    )
    return response


def build_params(request, path=None, event=None, referer=None, visitor_id=None, site=None):
    meta = request.META
    site = site
    referer = referer or request.GET.get('r', '')
    path = path or request.GET.get('p', '/')
    user_agent = meta.get('HTTP_USER_AGENT', 'Unknown')
    cookie = request.COOKIES.get(COOKIE_NAME)
    visitor_id = visitor_id or cookie
    visitor_ip = meta.get('REMOTE_ADDR', '')
    try:
        pagetitle = request.current_page.get_page_title()
    except:
        pagetitle = None

    params = {
        'v': VERSION,
        'z': str(random.randint(0, 0x7fffffff)),
        't': 'pageview',
        'dt': pagetitle,
        'dh': site,
        'dr': referer,
        'dp': path,
        'tid': ANALYTICS_ID,
        'cid': visitor_id,
        'uip': visitor_ip,
        'ua': user_agent,
    }

    return params

 

Lets create the Celery task that will submit every pageview to our analytics provider:

 ./analytics/tasks.py

from celery.decorators import task
import requests


@task(ignore_result=True)
def submit_tracking(params, provider_url):
    response = requests.post(
        provider_url, data=params)
    response.raise_for_status()

 

The middleware class contains two methods.

  • process_request() running at request phase and
  • process_response() running at response phase.

 ./analytics/middleware.py

from django.conf import settings
from analytics.tracker import build_params, set_cookie, cookie_exists
from analytics.tasks import submit_tracking
from django.contrib import messages
from django.contrib.sites.shortcuts import get_current_site
import uuid


provider_url = settings.ANALYTICS_PROVIDER_URL

class AnalyticsMiddleware(object):
    def process_request(self, request):
        if not cookie_exists(request):
            site = get_current_site(request)
            messages.info(request, '<strong>Welcome!</strong> ' + site.domain + '\
                is using <strong>cookie</strong> technology\
                for tracking everything you do.',
                          extra_tags='alert-info')
            request.session['visitor_id'] = str(uuid.uuid4())
            request.session['site'] = site.domain
        else:
            return None

    def process_response(self, request, response):
        httprspcode = response.status_code
        if not httprspcode == 200:
            return response

        if hasattr(settings, 'ANALYTICS_IGNORE_PATH'):
            exclude = [p for p in settings.ANALYTICS_IGNORE_PATH
                       if request.path.startswith(p)]
            if any(exclude):
                return response

        path = request.path
        referer = request.META.get('HTTP_REFERER', '')
        visitor_id = request.session.get('visitor_id')
        site = request.session.get('site')
        params = build_params(request, path=path, referer=referer, visitor_id=visitor_id, site=site)
        response = set_cookie(visitor_id, response)
        submit_tracking.delay(params, provider_url)
        return response

 

Now, all we have to do is to register the middleware class, the new application and some variables in our project's setting file:

 ./project/settings.py

MIDDLEWARE_CLASSES = (
   # ...
    'analytics.middleware.AnalyticsMiddleware',
   # ...
)

# ...

INSTALLED_APPS = (
   # ...
   'analytics',
   # ...
)

# ...

# Analytics configuration
ANALYTICS_COOKIE_NAME = 'project_stats'
ANALYTICS_COOKIE_PATH = '/'
ANALYTICS_COOKIE_AGE = 31556926 # 1 year in seconds
ANALYTICS_ID = 'UA-xxxxxxxx-x'
ANALYTICS_API_VERSION = '1'
ANALYTICS_IGNORE_PATH = ['/page1/', '/page2/']
ANALYTICS_PROVIDER_URL = 'https://www.google-analytics.com/collect'

 

For rendering any message with your base template you can use something like this:

 ./project/templates/base.html

{% if messages %}
    {% for message in messages %}
    <div class="alert {{ message.extra_tags }} alert-dismissible" role="alert">
        <button type="button" class="close" data-dismiss="alert" aria-label="Close">
            <span aria-hidden="true">&times;</span>
        </button>
       {{ message|safe }}
    </div>
   {% endfor %}
{% endif %}

 

Notice that we are using bootstrap alerts for displaying our messages to the user. The specific boltstrap alert class is specified by the messages framework with an extra_tag.  This way, we can characterize our messages as "info", "warning" etc. rendering them with the related color.

Downside

There are some things you have to consider when you are thinking about server-side tracking:

  • You will need some extra processing in every request-responce cycle. This will have its impact on performance.
  • In this demo we used a tracking cookie to identify clients. This introduced some caching releated complications. For example if you willing to use varnish or a CDN service in front of Django, you are making your pages not cacheable. If you are caching using Django’s cache framework there is no problem but it is slower than a front - end caching solution. Of course, client - side tracking (when user allows it) is working well in both of these cases.

View epilis's profile on LinkedIn Visit us on facebook X epilis rss feed: Latest articles