/ python

Remove metadata from user-uploaded images in Django

When working with user-generated content in the form of uploaded images, it's a good idea to strip the metadata that is often embedded in the image files. This metadata can include camera information and settings as well as geotags that describe where a photo was taken. In the interest of protecting users' privacy, this data should be removed before the photo is stored and shown to other users. It's fairly straightforward to do this using the well-known exiftool and integrate it into a Django web application using the standard forms framework or in a Django REST Framework serializer.

The code below was developed and tested on Python 3.x / Django 1.8.

models.py

from django.db import models


class Image(models.Model):
	image = models.ImageField()

image_utils.py

from io import BytesIO
import subprocess


def strip_metadata(fp):  # fp is a Django UploadedFile
	args = ['exiftool', '-All=', '-']
	p = subprocess.Popen(args, stdin=subprocess.PIPE, stdout=subprocess.PIPE)
	out, err = p.communicate(input=fp.read())
	return BytesIO(out)

forms.py

from django import forms

from . import models
from .image_utils import strip_metadata


class ImageForm(forms.ModelForm):
    class Meta:
        model = models.Image

    image = forms.ImageField()

    def clean_image(self):
        f_orig = self.cleaned_data['image']
        fn = f_orig.name
        sanitized_image = strip_metadata(f_orig)
        f_new = File(sanitized_image)
        f_new.name = fn
        return f_new

serializers.py (if using Django REST Framework)

from django.core.files import File

from rest_framework import serializers

from . import models
from .image_utils import strip_metadata


class ImageSerializer(serializers.ModelSerializer):
    class Meta:
        model = models.Image

    image = serializers.ImageField()

    def save(self, **kwargs):
        f_orig = self.validated_data['image']
        fn = f_orig.name
        sanitized_image = strip_metadata(f_orig)
        f_new = File(sanitized_image)
        f_new.name = fn
        self.validated_data['image'] = f_new
        return super().save(**kwargs)

As the image data is processed fully in memory on-the-fly by exiftool using a pipe, there is no need to create any temporary files on the filesystem - the newly scrubbed image transparently replaces the original upload and is ready to go. However, since this can potentially be a bottleneck and a vector for denial-of-service attacks using very large files, you should explicitly set the maximum upload/POST body size to a reasonable value via your web server and/or check the file size using a validator function on the ImageField.