Programmatically populating django with random and real data



Introduction

One of the best features of Django is the speed it provides to develop fast prototypes. It does not mean that it is a framework to develop prototypes but it is very helpful for building quick and fast prototypes when you need one. It is my first when I need to build a proof of concept for something or when I want quickly prepare something for a presentation. Yet one feature that I was always missing is the ability to quickly populate datatabase with some random (or real) data just to see how it all fits together while you are just shaping your ideas.

You might argue that django has fixtures and it is a solution to this problem but I find fixtures to be a bit to cumbersome and it becomes a chore to maintain consistency when you change your models quickly. I wanted something that allowed me to load data programmatically where I could create data on the fly like calculating different types of pricing, creating users, creating images and adding them to db and just plain old load testing by creating a lot of data.

I have been using this approach for a few years now and recently when I was working with a team which was waiting long time for the data to be provided. And when I adviced them to just fill the data randomly (for now) to see how it all fits together they were suprised that this was an option. I hope this will help some people to who have not tried this approach before.

Setup

There are different ways that structuring this could be approached but my preffered way is store all the files in a '__generator' directory inside the root project. I mark all non esential folders that could be easily deleted without affecting the project with double underscores. This way it stands out from the rest of the apps and I always know that I can easily delete all such folder. This is a place where all dirty code and unsafe practices are allowed and MUST be deleted before project goes into initial production release. This code is unsafe and never be used with production or preproduction code.

Inside the __generator folder I usually create ‘images’ and ‘data’ folders. Images folder is used as a pool of images to load during each cycle and data is mostly used to store text files and excel files that store real information to load. Then I add a ‘scratch.sh’ bash script to load the scripts and usually one py file per app or per task. I used a bash file because I was not building a system. The main goal was to automate some small tasks and at the time it was a good solution to run couple of commands and it never grew out of it. You could easily build use a python script with all the good practices but then it again becomes a chore instead of a quick and small system. Also it is very beginner friendly.

mkdir -p __generator/{data,images}

Python scripts

Each python script is just a simple file that does all the data loading for a specific model or an app. Unfortunately you cannot just a load your models from created django apps as it won’t work. The scripts need a small setup that will allow to call django and your apps directly. I use this header in each file and even though it is not DRY and found it to be more appropriate for quick scripts like these. Here is how header looks like:

import os
import sys
import django

current_file = os.path.realpath(__file__)
base_path = os.path.dirname(current_file)
project_main_path = os.path.realpath(base_path + '/../')


sys.path.append(project_main_path)
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'project_main.settings')
django.setup()

After you have added this header you can write your regular django code as if this file was regular view file with only difference is that it will be called directly. For example let add users to our system. After the header part add code similar to this:

from django.contrib.auth import get_user_model

User = get_user_model()

user = User(
    is_staff=True,
    is_superuser=True,
    username='alfred', 
    phone='123123123', 
    name = 'Alfred',
    email = 'alfred@test.com',
    lastname = 'Salahov',
    address = 'Sezame Street 42',
)
user.set_password('Sdio89sDKwopkkj')
user.save()


for index in range(100):
    user = User(
        is_staff=False,
        is_superuser=False,
        username='user{}'.format(index), 
        phone='123999{:03d}'.format(index), 
        name = 'John{}'.format(index),
        email = 'jonh{}@example.com'.format(index),
        lastname = 'Smith{}'.format(index),
        address = 'Backe Street {}'.format(index),
    )
    user.set_password('abcXYZhome{}'.format(index))
    user.save()

This approach could be done to any app or model within the app and even if you need to do some complex logic that loads different models from different apps that generate data based on each other. Now you are on the land of programming and can do pretty much anything. Load data from your google drive, from an API, get data from local excel file or any file format that you work with.

Now you can just launch this script with regular

python users.py

from within the __generator directory. Just make sure you source environment so that the script know about django.

Bash script automation

The next step in automation of boring task is using a bash script so that you won’t be launching the same tasks again and again. I want to repeat that if you want you can easily ditch bash scripting and just do all this from python but for me specifically it is more comfortable to launch these tasks from bash script. Here is a version of the script:

pushd ../

find . -path "*/migrations/*.py" -not -name "__init__.py" -delete
find . -path "*/migrations/*.pyc"  -delete

rm db.sqlite3
./manage.py makemigrations
./manage.py migrate
popd

pushd ../../media
rm -r *
popd

python users.py
# python products.py
python products_random.py
python pages.py
python banners.py
python product_collection.py
python configurations.py

First of all since we are inside ‘__generator’ directory I go up one directory to be inside the root directory of the project. From here I first delete all migrations, remove the database (remember I told you not use it with anything production related?) and recreate them from scratch. The reason I do this is because I do a lot of crazy changed to the models that would sometimes break migrations and used to require me spend some time thinking about how bring the app migrations to consistent state. And this is something that I don’t want to spend time on when I am just exploring.

Then I go to my media folder that lives outside of root directory to remove all the media files. It is because each each script invocation creates random data which will be obsolete on the next rerun and since I run this script quite often it would quickly fill my drive otherwise.

After that I just run all the the scripts that generate data. Most of the data is random but in this script I showed that somethimes I get real data that was stored for example in excel file. So I could quickly switch between random and real data exploration.

Video

Here is a short video I created for my friends to illustrate how I use it. It uses slightly different scripts as I separated them by real and fake data. Fake data is smaller in product count but takes more time to generate because it uses quite a few images per product and generate some images for product specific filters. Real data does not have those and thus works a bit faster

Clearing Postgresql

Sometimes limitations of sqlite does not allow to use it for even in debug versions of your app because your app might use features of SQL that are not supported by it, for example esasy searching for data over multple tables with could contain data in in different languages. In those case you have to use something other that sqlite. If you also use postgresql for your experiment project you could substitute remove sqlite database with these script if you use postgesql:

sudo systemctl restart postgresql

DB_NAME="some_database_name"
DB_USER="some_db_user"
DB_PASS="1231))(*)S(Df3"

sudo -u postgres psql -U postgres  -c "drop database $DB_NAME;"
sudo -u postgres psql -U postgres  -c "drop user $DB_USER;"
sudo -u postgres psql -U postgres  -c "create user $DB_USER;"
sudo -u postgres psql -U postgres  -c "create database $DB_NAME;"
sudo -u postgres psql -U postgres  -c "alter role $DB_NAME with password $DB_PASS;"
sudo -u postgres psql -U postgres  -c "grant all privileges on database $DB_NAME to $DB_USER;"
sudo -u postgres psql -U postgres  -c "alter database $DB_NAME owner to $DB_USER;"

** PLEASE DON’T USE** this on production setups as you will most definitely lose your valuable data and I will not take any responsibility for that. You have been warned!

Conclusion

The programmatic approach to adding data to you django application is a lot of fun and speeds up the development quite a bit. If you still not using something like this for your testing and explorations you definetely should at least try once to see if you like it.

I think everything that I wrote should be simple and self expanatory but if you still could make it work just email me and will upload to github a simplified working version to github to play around.