Tag Archives: django

Spawning Django Subprocesses

We have some maintenance tasks that require some run time, that we’d like to launch from the web browser. Programmatically, the most natural thing is to spawn a process that performs the task and completes asynchronously. The results are recorded in the database for later harvesting.

As far as I can tell, the same general rules for forking apply when forking from within Django: close database connections, close open file handles, and release other resources that cannot be shared across process boundaries.

Django, apparently, will automatically re-connect to the database if the connection is closed. This makes the job much simpler. Different web sites say that the parent process should close its database connection, and others say that the child process should close its database  connection.

In the face of this conflicting information, I chose to close the parent process’ database connection before calling os.fork(). Reöpening database connections incurs a small penalty that are not a concern as this is done once.

Before forking:

[python]
from django.db import connection

Don’t fork with a database connection open.

connection.close()

new_pid = os.fork()
if not new_pid:

Child process

[/python]

Thus far there seem to be no side effects from taking this approach. As always, additional information is welcomed.

Django 1.7, PyCharm, and Apache

I’ve had some ongoing problems with PyCharm support for Django since the 1.7 release. Here is a summary.

WSGI Broke

The entire app failed to run under Apache. This caused some moments of terror. The WSGI file needed to be edited.

This code started on version 1.3. The old WSGI configuration worked until 1.7. This discussion thread described the behaviour, which bites projects started pre-1.4. See also Issue #23437 describes the WSGI problem, and the official release notes under app-loading changes.

The correct call to initialize WSGI is:

[python]
from django.core.wsgi import get_wsgi_application
application = get_wsgi_application()
[/python]

 

PyCharm Testing Broke

Tests in PyCharm broke. This a case of being bitten by older code again. Django uses the standard Python unittest module now. I had to go through dozens of files and tweak the import statements.

Update: LH asked for more information.

In my unit testing, I used the old-style Django test classes and had import statements like:

[python]
from django.test import TestCase
[/python]

which must be changed to:

[python]
from unittest import TestCase
[/python]

in order to use the built-in Python unit testing core.

PyCharm Django Console Broke

This is kind of annoying. The problem isn’t so much PyCharm itself but Django “fixed” a long-standing “bug”. Now what used to work fine doesn’t. Per the app-loading changes in the 1.7 release notes, one must, every time they start the Django console from within PyCharm, execute the following commands:

[python]
import django
django.setup()
[/python]

This became annoying enough I hacked the Django helper

/Applications/PyCharm.app/helpers/pycharm/django_manage_shell.py

and added the two lines to that file. Problem resolved.

 

Django and AngularJS POST

I’m converting some code that uses jQuery to post data to Django over to AngularJS. The POSTed data wasn’t coming across as expected.

By default AngularJS and jQuery use two different methods of packaging data. According what I’ve gleaned from googling:

  • jQuery posts using a content type of application/x-www-form-urlencoded and serializes the data using the foo=bar&fuz=baz method.
  • AngularJS posts using a content type of application/json and serializes the data in the body.

Because this is an Angular single-page application, rather than using Django’s templating, I’m making calls to a web-based API (implemented in Django). I use a simple helper class for the repetitive task of extracting POSTed data, converting it to Python objects, and validating it.

I made a simple change to this object, and voilà — Django started receiving AngularJS data.

[python]
# CONTEXT:
# This function extracts the received data and
# returns it as a Python dict.
if ‘CONTENT_TYPE’ in request.META and request.META[‘CONTENT_TYPE’].startswith(‘application/json’):
return json.loads(request.body)
elif request.method == ‘POST’:
return request.POST.dict()
elif …
[/python]

If you wish to use Django’s unit testing to exercise this, use something like the following. Remember to send a JSON serialized string into post() as it will not do this for you.

[python]
from json import dumps
from django.test.client import Client


client = Client()
self.assertTrue(client.login(username=user, password=password))
response = client.post(url, dumps(data), content_type=’application/json’)
[/python]

Updates

2014-02-21
Fixed bug in sample code.
2014-06-05
Added unit testing example.

Adding Sphinx Documentation to a Django Project

I hand not used Sphinx to generate documentation for a Django project before. Unfortunately I found the documentation web to be in the no man’s land between cryptic and sparse. After a number of mistakes, I was successful in producing auto-generated documentation from the Python code.

These are my notes for creating a basic set of documentation derived from the source code.

Caveat

I would prefer to keep all of Sphinx’s files in a subdirectory relative to the project root, but it appears to my untrained eye that Sphinx requires its configuration file and root document in the project root. Sphinx was happy to put its intermediaries and final product into a subdirectory, however.

Initial Installation

Running the Sphinx configuration script was straightforward. By default the autodoc (automatic documentation generation) is not enabled. Be sure the answer “y” to it.

Because I was trying (unsuccessfully) to get Sphinx to operate exclusively out of a subdirectory, I had to manually move the conf.py and index.rst files to the project root, then hack the config file so that templates_path, exclude_patterns, and html_static_path contained the subdirectory path.

Only at this point was I able to perform a build on this bare-bones set of documentation.

Overall Hierarchy

The structure that I got to work was a Sphinx index.rst file in the project root and every Python package directory. This forms a tree of index.rst files corresponding to packages.

Contents of the index.rst File

Header

Every index.rst file must contain a header. For example:

[code]
Foo
===

[/code]

Sub-Packages

Each index.rst file contains the following if the package contains a sub-package:

[code]

.. toctree::
:maxdepth: 2

{sub-package-name-1}/index.rst
{sub-package-name-2}/index.rst

{sub-package-name-n}/index.rst

[/code]

Indentation is significant in a Python sense. Additionally, blank lines are significant.

Modules

Each index.rst file contains the following for each Python module in the same directory as the index.rst file:

[code]

.. automodule:: {full Python module name}
:members:

[/code]

Remember to ensure this is isolated from its neighbours by a blank line.

The autodoc process actually loads the Python module to perform introspection. From what I understand in the Sphinx documentation the full Python module name is required. For example, given the following file hierarchy:

[code]

package1/__init__.py
package1/foo.py
package1/package2/__init__.py
package1/package2/bar.py
package1/package2/index.rst
package1/index.rst

[/code]

The Sphinx file package1/index.rst would contain:

[code]

.. automodule:: package1.foo
:members:

[/code]

The Sphinx file package1/package2/index.rst would contain:

[code]

.. automodule:: package1.package2.bar
:members:

[/code]

Non-Generated Documentation

Documentation that is not automatically generated can be kept in subdirectories and included in the ..toctree:: section in the same manner as sub-package .rst files are.

Update

I encountered a nice write-up here, which includes much more helpful information that I provide in these notes.

Resolving a Django “Memory Leak” Problem on Long-Running Intensive Processes

Data conversions tend to be straightforward but detail-oriented processes that push systems in unexpected places. When importing several hundred thousand records into a Django development environment — entailing reshaping a number of important data sets — memory use continually escalated until the system slowed to a crawl.

The root cause is the environment is in debug mode, and Django connections log the queries that are executed. The import executes millions of database statements and runs for nearly a half hour. I prefer to not continually fiddle with the configuration file in my development environment, so  I did the following.

For any given data dump, every 60 seconds it prints a status of its progress: number of records completed, total number of records in the dump, and percent completed. The SQL statement log is not interesting, and is merely consuming memory, so it’s flushed thusly:

[python]
from django import db

db.reset_queries()
[/python]

This has resolved the creeping memory problem. If there were any other unusual circumstances requiring releasing resources, this would be an opportune time.

 

 

Django QuerySet Filtering Notes

These are notes that condense the Django 1.5 filter annotation documentation (mostly Field lookups), along with various points of detail I’ve encountered.

General Matching

Annotation Notes
x__exact=value Exact match
x__exact=None IS NULL
x__iexact=value Case insensitive string match
x__isnull=True x IS NULL
x__isnull=False x IS NOT NULL
x__contains=value Case sensitive substring match.
x__icontains=value Case insensitive substring match.
x__in=[value1, value2, …] Match any in a list
x__gt=valuex__gte=valuex__lt=valuex__lte=value greater thangreater than or equallesser thanlesser than or equal
x__startswith=value LIKE ‘value%’
x__istartswith=value Case insensitive LIKE ‘value%’
x__endswith=value LIKE ‘%value’
x__iendswith=value >Case insensitive LIKE ‘%value’

Inverting Logic (NOT)

The logical inverse is provided by using the .exclude() method instead of .filter().

Chaining Filters

Complex queries are formed by chaining .filter(), .exclude(), and other operations.

Ranges (BETWEEN)

Ranges use the “__range” annotation set to a 2-tuple that contains the beginning and ending values. For example:

[code lang=”python”]
results = A.objects.filter(measurement__range=(10, 20))
[/code]

will translate into

[code lang=”sql”]
SELECT … WHERE measurement BETWEEN 10 AND 20;
[/code]

Dates

Dates always have the potential to be a little tricky.

First of all, the Django 1.5 docs warn that dates and DateTimeField should not be mixed because of the implicit “00:00:00” for the end of range date, meaning the end of range date effectively is skipped except for midnight exactly.

Date Ranges

Date ranges use the __range annotation with a datetime.date() object.

[code lang=”python”]
start_date = datetime.date(2005, 1, 1)
end_date = datetime.date(2005, 3, 31)
results = Entry.objects.filter(pub_date__range=(start_date, end_date))
[/code]

Date Precision

Djano allows querying by year, month, or day precision. This is done by appending an extra annotation to the field name.

  • __year
  • __month
  • __day
  • __week_day – 1=Sunday 7=Saturday

For example,

[code lang=”python”]
results = Entry.objects.filter(pub_date__year=2005)
[/code]

will match all records with pub_date between 1 January 2005 and 31 December 2005.

Subqueries

Subqueries are formed with the “__in” annotation. Instead of providing a list of values, provide another inner QuerySet.

The QuerySet should be model objects, or a single field value. A single field value can be obtained by using the .values() function.

Example

This performs a subquery on model objects.

[code lang=”python”]
inner = A.objects.filter(foo__exact=’bar’)
result = B.objects.filter(a__in=inner)
[/code]

Example

This performs a subquery against a field value.

[code lang=”python”]
inner = A.objects.filter(foo__exact=’bar’).values(‘name’)
result = B.objects.filter(a__name__in=inner)
[/code]

Master/Detail Joins

These I’m still a little fuzzy on. The Django docs have some information on how they are handled in Following relationships “backward”.

One query I wished to perform was to get a list of suppliers with orders, ignoring those suppliers from whom no orders were placed. The two models are related this way:

[code lang=”python”]
class Supplier(models.Model):

class Order(models.Model):
supplier = models.ForeignKey(Supplier)

[/code]

Note that Django creates an implicit “order” attribute on the Supplier class. This can be used for filtering with the caveat that this performs an outer join, so one must run the result through distinct().

[code lang=”python”]
results = Supplier.objects.filter(order__isnull=False).distinct()
[/code]

Credits

Thanks to FunkyBob on the #django IRC channel on freenode for his insights.

Setting MySQL to Default to Unicode

When running unit tests in Django, I was getting a strange MySQL failure when attempting to insert non-ASCII Unicode characters into the database, for example:

[code light=”true”]
Warning: Incorrect string value: ‘\xE2\x89\xA5 %’ for column ‘value’ at row 1
[/code]

What is happening is that Django creates a new schema from scratch for testing. This new schema picks up the MySQL defaults. All my test tables ended up with Latin-1 encoding instead of UTF-8 encoding.

I needed to change mysqld to default to unicode internally so Django will run unit tests involving correctly.

In /etc/my.cnf I added the following:

[code lang=”c” light=”true”]
[client]
default-character-set=utf8

[mysql]
default-character-set=utf8

[mysqld]
collation-server = utf8_unicode_ci
init-connect=’SET NAMES utf8′
character-set-server = utf8
[/code]

h/t stackoverflow