Data conversions tend to be straightforward but detail-oriented processes that push systems in unexpected places. When importing several hundred thousand records into a Django development environment — entailing reshaping a number of important data sets — memory use continually escalated until the system slowed to a crawl.
The root cause is the environment is in debug mode, and Django connections log the queries that are executed. The import executes millions of database statements and runs for nearly a half hour. I prefer to not continually fiddle with the configuration file in my development environment, so I did the following.
For any given data dump, every 60 seconds it prints a status of its progress: number of records completed, total number of records in the dump, and percent completed. The SQL statement log is not interesting, and is merely consuming memory, so it’s flushed thusly:
from django import db ... db.reset_queries()
This has resolved the creeping memory problem. If there were any other unusual circumstances requiring releasing resources, this would be an opportune time.