21

It is a simple question, but since I didn't find any answers for it, I assume the answer would be negative. However, to make sure, I'm asking it:

Does it make the python code more efficient to set the variables to None after we're done with them in a function?

So as an example:

def foo(fname):
    temp_1, temp_2 = load_file_data(fname)

    # do some processing on temp_1, temp_2

    temp_1 = None
    temp_2 = None

    # continue with the rest of the function

Does the answer change if we do this at the end of the function (since I assume python itself would do it at that point)?

oxtay
  • 3,990
  • 6
  • 30
  • 43
  • 4
    How much other work are you doing in the function? How big are the objects you are deleting? You could just use `del temp_1, temp_2` here, btw. – Martijn Pieters Sep 25 '14 at 19:51
  • "Changed in version 2.4: Assignments to None are illegal and raise a SyntaxError." Is this relevant? – Sterling Archer Sep 25 '14 at 19:52
  • 3
    But on the whole: sounds like a premature optimisation. Locals are cleared anyway when the function is done. – Martijn Pieters Sep 25 '14 at 19:52
  • 5
    @SterlingArcher: what's illegal is `None = something`, not `something = None`. – DSM Sep 25 '14 at 19:52
  • @DSM ah sorry, I was thinking opposite. [Was reading this](https://docs.python.org/2/library/constants.html) – Sterling Archer Sep 25 '14 at 19:53
  • 2
    Both of Martijn's comments are right: `del temp_1` is almost always better than `temp_1 = None` (it expresses what you're attempting better), and this is very likely to be a premature optimization—plus, even if it isn't, you'd probably do better by refactoring the function into separate pieces, so these locals wouldn't be in scope longer than necessary in the first place… But yes, this will work. – abarnert Sep 25 '14 at 19:54
  • Thank you. The comments helped me very much. I had forgotten about `del` command. It seems from the comments that this might not add too much to memory efficiency. But I'm not sure yet. @Martijn, to answer to your first comment, these are rather large lists and dictionaries and in the rest of the function, there is a `for loop` that creates another set of class instances and new dictionaries. – oxtay Sep 25 '14 at 20:45
  • 1
    @oxtay: then why not create a separate function to handle the large data handling? Or refactor to use iterators and not build the whole thing in memory in the first place. – Martijn Pieters Sep 25 '14 at 20:47
  • @MartijnPieters, I believe I have already done that (one can never be too sure though :p). But I was curious to know if I can add more efficiency to it or if it will not have any effects. – oxtay Sep 25 '14 at 21:05
  • @oxtay: Did you understand the second half of his comment? In many cases, the only thing you're doing with a giant list like `temp_1` is iterating over it once and then throwing it away—no random access, no repeated iteration, etc. In that case, you shouldn't be building the list in the first place; just return an iterator. Not allocating the memory is always going to be better than freeing it as nicely as possible… – abarnert Sep 25 '14 at 21:58

2 Answers2

20

It depends on what you mean by "more efficient".

Setting the variables to None, assuming they're the only references to their values, will allow the garbage collector to collect them. And in CPython (which uses ref counting for its garbage collector), it will even do so right away.

But on the other hand, you're also adding more bytecodes to the function that have to be executed by the interpreter, and that make the code object harder to keep in cache, and so on.

And keep in mind that freeing up memory almost never means actually freeing memory to the OS. Most Python implementations have multiple levels of free lists, and it usually sits on top of something like malloc that does as well. So, if you were about to allocate enough additional memory to increase your peak memory size, having a lot of stuff on the free list may prevent that; if you've already hit your peak, releasing values is unlikely to make any difference. (And that's assuming peak memory usage is what matters to your app—just because it's by far the easiest thing to measure doesn't mean it's what's most relevant to every problem.)

In almost all real-life code, this is unlikely to make any difference either way. If it does, you'll need to test, and to understand how things like memory pressure and cache locality are affecting your application. You may be making your code better, you may be making it worse (at least assuming that some particular memory measurement is not the only thing you care about optimizing), most likely you're having no effect but to make it longer and therefore less readable. This is a perfect example of the maxim "premature optimization is the root of all evil".


Does the answer change if we do this at the end of the function (since I assume python itself would do it at that point)?

You're right that Python frees local variables when the function returns. So yes, in that case, you're still getting almost all of the negatives with almost none of the positives, which probably changes the answer.


But, all those caveats aside, there are cases where this could improve things.* So, if you've profiled your app and discovered that holding onto that memory too long is causing a real problem, by all means, fix it!

Still, note that del temp_1 will have the same effect you're looking for, and it's a lot more explicit in what you're doing and why. And in most cases, it would probably be better to refactor your code into smaller functions, so that temp_1 and friends go out of scope as soon as you're done with them naturally, without the need for any extra work.

* For example, imagine that the rest of the function is just an exact copy of the first half, with three new values. Having a perfect set of candidates at the top of the free lists is probably better than having to search the free lists more deeply—and definitely better than having to allocate more memory and possibly trigger swapping…

abarnert
  • 354,177
  • 51
  • 601
  • 671
  • 1
    "If it does, you'll need to test, and to understand how things like memory pressure and cache locality are affecting your application." → this is wrong. Please show me an example (*any example*) where the speed of an application is noticeably affected (more than ~1%). The only thing you should care about is memory usage, and that does not need any kind of special analysis. I can only believe you're overthinking this. – Veedrac Sep 25 '14 at 20:26
  • @Veedrac I'm sorry, but I don't see why you mention speed in your comment... the quote doesn't mention speed. – SethMMorton Sep 25 '14 at 20:31
  • @Veedrac: Write `def load_file_data(_): return 0, 0`, then run the code above. With 64-bit CPython 3.4.1 on my laptop, %timeit gives me 339ns per loop. Comment out the `= None` lines, and now it gives me 271ns. That's a 20% improvement. – abarnert Sep 25 '14 at 20:32
  • @SethMMorton Why would you care about cache locality if not for speed? – Veedrac Sep 25 '14 at 20:34
  • @SethMMorton: No, he's right on that part. Unless you're on a 32-bit platform or don't have VM, the main reason to care about memory use is that it causes things like swap thrash, page table churn, etc.—that is, slowness. And the only reason _anyone_ cares about cache locality is speed. – abarnert Sep 25 '14 at 20:35
  • @Veedrac Don't ask me... I don't care. I just didn't understand where your comment came from. – SethMMorton Sep 25 '14 at 20:35
  • @abarnert Thanks for the explanation. I didn't know how the two were correlated. – SethMMorton Sep 25 '14 at 20:36
  • @abarnert I'm only getting a 5% increase in time, but it still begs the question of why it matters. If the only point is "it's 5% the cost of unpacking a tuple and a function call", I struggle to see the point. If we cared about such meaningless times (I admit 5% is a little more than ~1%), we shouldn't be using Python. The arguments you have made regarding the time it takes to run are irrelevant, and the only arguments that matter are those of readability and memory usage. – Veedrac Sep 25 '14 at 20:43
  • @Veedrac: If you're arguing that micro-optimizations rarely make any difference in Python, I agree with you 100%; I even said as much in the answer. I'm sure I could find a contrived case that gave an even bigger increase, but it would be even less likely to be relevant to the OP's real code, so why bother? Until you have an actual problem to solve, you shouldn't be worrying about what's "optimal" here, you should write what's readable and maintainable. – abarnert Sep 25 '14 at 20:46
  • And my point is that you are dismissing a potential macro-optimization in memory usage because of a potential micro-optimization in time. Instead of talking about caches (which I still don't believe you're ever going to affect with this), you should mention that unless the object is large it's not worth the readability and maintainability cost (and even then it's rarely so). I just think your points about free lists and caches and whatnot are pretending there's an effect where there is not, which is at best confusing and at worst misleading. – Veedrac Sep 25 '14 at 20:51
  • @abarnert, "it would probably be better to refactor your code into smaller functions" sounds like a good solution. I have done most of that, but now it appears to me that if my function is such that I have to free the variables in the middle of it, the structure of the function is under question and I should break it into smaller functions. Is that what you meant? – oxtay Sep 25 '14 at 21:08
  • @oxtay: Yes, that's what I meant. _Usually_ it will make more sense to write a smaller function that uses and immediately disposes of the temporary variables and returns the stuff you built out of them. But sometimes it won't, and then (again, if it matters) you have to to decide whether doing so artificially is an improvement over explicit `del`. – abarnert Sep 25 '14 at 21:57
1

I disagree that it would be faster, unless you are running into a situation where you are running out of memory.

In a normal application as soon as the variables in your function leave scope they will be flagged as no longer used, freed, or whatever the specific python interpreter does. Setting to None would mean more work for python as this would allow the memory pointed to by your variable to be free'd, but not the variable itself.

Also, in general python uses reference counting, not garbage collection so once the reference count falls to zero the object would be free'd.

Clarus
  • 2,259
  • 16
  • 27
  • 2
    -1. OP asks about the impact to memory while your question talks about speed of execution. So you have not answered the question. Also, your statement that "python uses reference counting, not garbage collection" is a distinction without a difference because reference counting is one strategy of garbage collection. – Steven Rumbalski Sep 25 '14 at 20:01
  • 2
    "in general python uses reference counting, not garbage collection" is not true; PyPy, Iron, and Jython don't use refcounting (except maybe as a component within a larger scheme in some of PyPy's experimental alternative collectors, maybe?). It's also misleading, because refcounting _is_ a form of garbage collection. – abarnert Sep 25 '14 at 20:12
  • CPython uses reference counting and it is the predominant python interpreter, thus the statement "in general". Reference counting is a very different form of generalized GC as it tightly controls when and where things are free'd. Reference counting does not imply GC, although it may be a form of GC. – Clarus Sep 25 '14 at 20:18
  • I don't know what you think "general" means. That's like saying, "In general, operating systems provide UTF-16 versions of their APIs" just because Windows is the most predominant operating system. – abarnert Sep 25 '14 at 20:36
  • 2
    Also, ref counting does imply GC. What else would it mean? (OK, I suppose you could be counting references to something other than the objects that may become garbage, but then a mark&sweep collector could be marking those other things just as easily…) Of course refcounting is a very different form of GC from other forms. And copying collectors are very different from non-copying collectors. So what? – abarnert Sep 25 '14 at 20:43