Friday, September 23, 2011

Profiles: Breaking Normalization

In the summer of 2010 I either saw this pattern or cooked it up myself. It is specific to the Django profiles system and helps me get around some of the limitations/features of django.contrib.auth. I like to do it on my own projects because it makes so many things (like performance) so much simpler. The idea is to replicate some of the fields and methods on the django.contrib.auth.model.User model in your user profile(s) objects. I tend to do this usually on the email , first_name , last_name fields and the get_full_name method. Sometimes I also do it on the username field, but then I also ensure that the username duplication is un-editable in any context.

Sure, this breaks normalization, but the scale of this break is tiny. Duplicating four fields each with a max of 30 characters for a total of 120 characters per record is nothing in terms of data when you compare to avoiding the mess of doing lots of profile-to-user joins on very large data sets.

One more thing, I've found that most users don't care about or for the division between their accounts and profiles. They are more than happy with a single form, and if they aren't, well you can still use this profile model to build both account and profile forms.

Alright, enough talking, let me show you how my Profile models tend to look:

from django.contrib.auth.models import User
from django.db import models
from django.utils.translation import ugettext_lazy as _

class Profile(models.Model):
    """ Normalization breaking profile model authored by Daniel Greenfeld """
    user = models.OneToOneField(User)
    email = models.EmailField(_("Email"), help_text=_("Never given out!"), max_length=30)
    first_name = models.CharField(_("First Name"), max_length=30)
    last_name = models.CharField(_("Last Name"), max_length=30)

    # username field notes:
    #     used to improve speed, not editable! 
    #     Never changed after original auth.User and profiles.Profile creation!
    username = models.CharField(_("User Name"), editable=False) 

    def save(self, **kwargs):
        """ Override save to always populate changes to auth.user model """
        user_obj = User.objects.get(username=self.user.username)        
        user_obj.first_name = self.first_name
        user_obj.last_name = self.last_name =
        user_obj.is_active = self.is_active       

    def get_full_name(self):
        """ Convenience duplication of the auth.User method """
        return "{0} {1}".format(self.first_name, self.last_name)

    def get_absolute_url(self):
        return ("profile_detail", (), {"username": self.username})

    def __unicode__(self):
        return self.username

All of this is good, but you have to be careful with emails. Django doesn't let you duplicate existing emails in the django.contrib.auth.model.User model so we want to catch that early and display an elegant error message. Hence this Profile form:

from django import forms
from django.contrib.auth.models import User
from django.utils.translation import ugettext_lazy as _

from profiles.models import Profile

class ProfileForm(forms.ModelForm):
    """ Email validation form authored by Daniel Greenfeld """
    def clean_email(self):
        """ Custom email clean method to make sure the user doesn't use the same email as someone else"""
        email = self.cleaned_data.get("email", "").strip()

        if User.objects.filter(email=email).exclude(username=self.instance.user.username):
            self._errors["email"] = self.error_class(["%s is already in use in the system" % email])
            return ""            

        return email

    class Meta:
        fields = (
        model = Profile


Cezar said...

Would it be prudent to just inherit from the User model for the profile model and still do the same save method? Cut off some code that way.

pyDanny said...

Cezar - I should have put this in my post and that is "Don't inherit from the auth.User model." Yeah, it is how we should be able to do it, but you end up with weirdness and problems. Hence the get_profile() model and this blog post.

Chris Adams said...

Two notes:
* I'd add a post_save signal handler on User which would propagate changes forward if there's any possibility that something else will change those values directly
* in the save() method you can simply do a User.objects.filter(… denormalized fields…) to save a database query.

pyDanny said...

Chris - I use signals as little as possible. I keep getting bit by them. They are generally documented poorly, can make system migrations unreasonably difficult. And in this case, they can cause infinite loops because your will change the User which will change the Profile which will change the User, ad infinitum...

I do like the User.objects.filter(… denormalized fields…) idea. :)

Malcolm Tredinnick said...

Ok, I have issues. Enough that I'm motivated to post a comment for once. Sorry.

And I'll say up front: I found the article clear and it's good to see code that is being used in practice put out there. Take the rest as some old guy griping "I wouldn't do it that way and why are you on my lawn?!"

Firstly, which version of Django are you using that doesn't allow duplicate emails in the User model? The email field is deliberately not marked as unique because it's hardly uncommon for people to share emails (which is also why emails as username is often a flawed concept. Django originally prevented that and only recently let people shoot themselves in the foot on that front).

I'd also quibble with the claim of joining to the users table being a mess. Turns out databases are really good at joins. It's part of the whole "relational" thing: they optimise for it. In rare cases this might be a measurable impact, but I'm skeptical that it's as common as people claim (hasn't been in cases where I've profiled it deliberately).

Finally, the one form per user thing seems a bit weird. You can do that regardless of your particular model layout. Model representation in the database is (should be) separate from any visual representation that the user sees. I'm concerned that this is driven by some over reliance on automatic form for model creation. In which case: urgh!

I'm not saying this is a universally bad approach, but I'm not sure it's a "first tool to reach for" strictly good approach, either. The trade-offs that are being made should be deliberate, particularly when denormalising. Would have been nice to see some timings from before and after setups on a properly tuned "real" database (e.g. PostgreSQL).