Resource Management in Python Generators
September 03, 2022
Wow, I’ve never thought I’d struggle with resource management in Python of all languages. But I did, while writing the HTML preprocessor for this website a couple days ago.
As my preprocessor parses a file, the source text is sometimes meant to
be directly written as output and sometimes saved in memory for later.
I have two functions, parse_file
and
parse_with
, to handle the two contexts. The source file is
passed between them. For example, once parse_file
finds
{with}
, a directive that indicates the start of a block,
on a line, it’ll call parse_with
to process the text until
the end of that block.
However, if I simply iterated over the file with a for loop in either
function, I could only start with the line after the line with
the {with}
directive. I need to pass the remainder
of the directive line as a parameter to parse_with
and
have it process the remainder before iterating through the file.
Unfortunately, this addition turned a simple for loop into an unwieldy
combination of a while loop and a try-except clause:
def parse_with(file, remainder):
line = remainder
while True:
[process line]
try:
line = next(file)
except StopIteration:
return
Never until then have I longed for a do-while loop in any language… Even then, it probably would’ve still been clunky, as iterators always throw an exception upon exhaustion. This was where generators came in. All I needed was an iterator that “prepends” the remainder before the items generated by the file iterator. Here’s the first generator function I wrote:
def prepend_iter(remainder, file):
if remainder:
yield remainder
yield from file
Simple, isn’t it? Just yield the remainder if any, then delegate the
rest of the work to the file with a fancy transparent bidirectional
connection! (as this Stack Overflow
answer puts it) Then, I could have a nice clean for loop in
parse_with
:
def parse_with(file, remainder):
for line in prepend_iter(remainder, file):
[process line]
But then, my joy was suddenly cut short by this:
Traceback (most recent call last): File "/usr/share/nginx/html/jcfp.site/scripts/parse-file", line 166, in <module> parse_file(file, sys.stdout) File "/usr/share/nginx/html/jcfp.site/scripts/parse-file", line 109, in parse_file line = parse_with(src, include_subs, subs, line) File "/usr/share/nginx/html/jcfp.site/scripts/parse-file", line 57, in parse_with line = parse_file(src, str_buf, subs, match[1] + "}", File "/usr/share/nginx/html/jcfp.site/scripts/parse-file", line 121, in parse_file line = parse_file(src, str_buf, subs, "{endq}", line) File "/usr/share/nginx/html/jcfp.site/scripts/parse-file", line 74, in parse_file for line in prepend_iter(rem, src): File "/usr/share/nginx/html/jcfp.site/scripts/parse-file", line 26, in prepend_iter yield from src ValueError: I/O operation on closed file.
How could I possibly have closed the file? My topmost
parse_file
hasn’t even returned, so Python couldn’t have
closed it for being out of scope. After hours of reading Stack Overflow
threads and scouring through PEPs, I found the answer in this little section of PEP 380. (I had actually
skipped it the first time because its first paragraph seemed to imply
the opposite of what it says.) Essentially, a yield with
statement closes the subiterator if the outer generator gets
closed, even if the subiterator hasn’t been exhausted. So when
parse_with
returns, the generator created by
prepend_iter
gets freed, closing the file iterator used by
the generator. The solution, fortunately, was just one extra line:
def prepend_iter(remainder, file):
if remainder:
yield reminder
for line in file:
yield line
Clearly, yield with
does much more than delegate
iteration. The Stack Overflow answer I
linked before shows some more sophisticated usages. It seems like
magic that just works, but that also means more unexpected pitfalls. Be
especially careful when yielding from a shared iterator!