Resolving a Django “Memory Leak” Problem on Long-Running Intensive Processes

Data conversions tend to be straightforward but detail-oriented processes that push systems in unexpected places. When importing several hundred thousand records into a Django development environment — entailing reshaping a number of important data sets — memory use continually escalated until the system slowed to a crawl.

The root cause is the environment is in debug mode, and Django connections log the queries that are executed. The import executes millions of database statements and runs for nearly a half hour. I prefer to not continually fiddle with the configuration file in my development environment, so  I did the following.

For any given data dump, every 60 seconds it prints a status of its progress: number of records completed, total number of records in the dump, and percent completed. The SQL statement log is not interesting, and is merely consuming memory, so it’s flushed thusly:

[python]
from django import db

db.reset_queries()
[/python]

This has resolved the creeping memory problem. If there were any other unusual circumstances requiring releasing resources, this would be an opportune time.

 

 

Leave a Reply