Resource Management in Python Generators

September 03, 2022

Wow, I’ve never thought I’d struggle with resource management in Python of all languages. But I did, while writing the HTML preprocessor for this website a couple days ago.

As my preprocessor parses a file, the source text is sometimes meant to be directly written as output and sometimes saved in memory for later. I have two functions, parse_file and parse_with, to handle the two contexts. The source file is passed between them. For example, once parse_file finds {with}, a directive that indicates the start of a block, on a line, it’ll call parse_with to process the text until the end of that block.

However, if I simply iterated over the file with a for loop in either function, I could only start with the line after the line with the {with} directive. I need to pass the remainder of the directive line as a parameter to parse_with and have it process the remainder before iterating through the file. Unfortunately, this addition turned a simple for loop into an unwieldy combination of a while loop and a try-except clause:

def parse_with(file, remainder):
    line = remainder
    while True:
        [process line]
        try:
            line = next(file)
        except StopIteration:
            return

Never until then have I longed for a do-while loop in any language… Even then, it probably would’ve still been clunky, as iterators always throw an exception upon exhaustion. This was where generators came in. All I needed was an iterator that “prepends” the remainder before the items generated by the file iterator. Here’s the first generator function I wrote:

def prepend_iter(remainder, file):
    if remainder:
        yield remainder
    yield from file

Simple, isn’t it? Just yield the remainder if any, then delegate the rest of the work to the file with a fancy transparent bidirectional connection! (as this Stack Overflow answer puts it) Then, I could have a nice clean for loop in parse_with:

def parse_with(file, remainder):
    for line in prepend_iter(remainder, file):
        [process line]

But then, my joy was suddenly cut short by this:

Traceback (most recent call last):
  File "/usr/share/nginx/html/jcfp.site/scripts/parse-file", line 166, in <module>
    parse_file(file, sys.stdout)
  File "/usr/share/nginx/html/jcfp.site/scripts/parse-file", line 109, in parse_file
    line = parse_with(src, include_subs, subs, line)
  File "/usr/share/nginx/html/jcfp.site/scripts/parse-file", line 57, in parse_with
    line = parse_file(src, str_buf, subs, match[1] + "}",
  File "/usr/share/nginx/html/jcfp.site/scripts/parse-file", line 121, in parse_file
    line = parse_file(src, str_buf, subs, "{endq}", line)
  File "/usr/share/nginx/html/jcfp.site/scripts/parse-file", line 74, in parse_file
    for line in prepend_iter(rem, src):
  File "/usr/share/nginx/html/jcfp.site/scripts/parse-file", line 26, in prepend_iter
    yield from src
ValueError: I/O operation on closed file.

How could I possibly have closed the file? My topmost parse_file hasn’t even returned, so Python couldn’t have closed it for being out of scope. After hours of reading Stack Overflow threads and scouring through PEPs, I found the answer in this little section of PEP 380. (I had actually skipped it the first time because its first paragraph seemed to imply the opposite of what it says.) Essentially, a yield with statement closes the subiterator if the outer generator gets closed, even if the subiterator hasn’t been exhausted. So when parse_with returns, the generator created by prepend_iter gets freed, closing the file iterator used by the generator. The solution, fortunately, was just one extra line:

def prepend_iter(remainder, file):
    if remainder:
        yield reminder
    for line in file:
        yield line

Clearly, yield with does much more than delegate iteration. The Stack Overflow answer I linked before shows some more sophisticated usages. It seems like magic that just works, but that also means more unexpected pitfalls. Be especially careful when yielding from a shared iterator!