4

What's an accurate way of checking whether an object can be atomically pickled? When I say "atomically pickled", I mean without considering other objects it may refer to. For example, this list:

l = [threading.Lock()]

is not a a pickleable object, because it refers to a Lock which is not pickleable. But atomically, this list itself is pickleable.

So how do you check whether an object is atomically pickleable? (I'm guessing the check should be done on the class, but I'm not sure.)

I want it to behave like this:

>>> is_atomically_pickleable(3)
True
>>> is_atomically_pickleable(3.1)
True
>>> is_atomically_pickleable([1, 2, 3])
True
>>> is_atomically_pickleable(threading.Lock())
False
>>> is_atomically_pickleable(open('whatever', 'r'))
False

Etc.

Ram Rachum
  • 84,019
  • 84
  • 236
  • 374
  • 1
    not possible. "atomically pickleable" is ambiguous - you can not define it in this manner. – vonPetrushev Nov 16 '10 at 23:58
  • I think you should give details on what you are actually trying to do. That way we can judge whether this is actually a good solution. (It looks like a strange request right now). If not we can propose a better solution. – Winston Ewert Nov 26 '10 at 16:30
  • @Winston Ewert: This is a sub-task for this: http://stackoverflow.com/questions/4080688/python-pickling-a-dict-with-some-unpicklable-items I am following the winning solution and I want a better implementation for the my `persistent_id` function. – Ram Rachum Nov 26 '10 at 17:48
  • Ok, after you figure out that you can't pickle say a file, what are you going to do then? You've got to return something to use as the persistent_id. I don't see where identifying the object as unpicklable really helps you there. – Winston Ewert Nov 26 '10 at 18:16
  • I will just put some `FilteredObject` thing instead of the file, saying "There was a file object here but it couldn't be pickled." This helps me a lot because then only the file object will be cut out of the pickle instead of having the entire pickle operation for the `GuiProject` object fail. – Ram Rachum Nov 26 '10 at 18:29
  • But when your objects try to use the file then things will fail regardless. Essentially, you are going to be creating broken objects. What are you doing that makes that a useful proposition? – Winston Ewert Nov 26 '10 at 18:37
  • As I said in the other question: I'm letting the user define his own objects in the Python shell that comes with the program. I want as many of these objects preserved for his next session. If some of them can't be, cutting them out would be alright, but I want to preserve as many objects as possible. – Ram Rachum Nov 26 '10 at 19:26

5 Answers5

3

Given that you're willing to break encapsulation, I think this is the best you can do:

from pickle import Pickler
import os

class AtomicPickler(Pickler):
  def __init__(self, protocol):
    # You may want to replace this with a fake file object that just
    # discards writes.
    blackhole = open(os.devnull, 'w')

    Pickler.__init__(self, blackhole, protocol)
    self.depth = 0

  def save(self, o):
    self.depth += 1
    if self.depth == 1:
      return Pickler.save(self, o)
    self.depth -= 1
    return

def is_atomically_pickleable(o, protocol=None):
  pickler = AtomicPickler(protocol)
  try:
    pickler.dump(o)
    return True
  except:
    # Hopefully this exception was actually caused by dump(), and not
    # something like a KeyboardInterrupt
    return False

In Python the only way you can tell if something will work is to try it. That's the nature of a language as dynamic as Python. The difficulty with your question is that you want to distinguish between failures at the "top level" and failures at deeper levels.

Pickler.save is essentially the control-center for Python's pickling logic, so the above creates a modified Pickler that ignores recursive calls to its save method. Any exception raised while in the top-level save is treated as a pickling failure. You may want to add qualifiers to the except statement. Unqualified excepts in Python are generally a bad idea as exceptions are used not just for program errors but also for things like KeyboardInterrupt and SystemExit.

This can give what are arguably false negatives for types with odd custom pickling logic. For example, if you create a custom list-like class that instead of causing Pickler.save to be recursively called it actually tried to pickle its elements on its own somehow, and then created an instance of this class that contained an element that its custom logic could not pickle, is_atomically_pickleable would return False for this instance even though removing the offending element would result in an object that was pickleable.

Also, note the protocol argument to is_atomically_pickleable. Theoretically an object could behave differently when pickled with different protocols (though that would be pretty weird) so you should make this match the protocol argument you give to dump.

Laurence Gonsalves
  • 137,896
  • 35
  • 246
  • 299
  • @Laurence Gonsalves: Don't you think this can solved more simply and much more cheaply by using `.__reduce__` and `.__reduce_ex__` on the object? – Ram Rachum Nov 22 '10 at 14:38
  • (Also, you can do `except Exception:` to skip `SystemExit` and `KeyboardInterrupt`.) – Ram Rachum Nov 22 '10 at 14:38
  • @cool-RR: There are a bunch of cases where pickling doesn't go through __reduce__ or __reduce_ex__. eg: `[].__reduce__()` will incorrectly tell you "TypeError: can't pickle list objects". Rather than replicating the logic in `Pickler.save` it seems safer to just delegate to it. – Laurence Gonsalves Nov 22 '10 at 23:20
  • I think this is just bunch of special cases that the `pickle` module is aware of. I prefer to imitate the logic in `Pickler` rather than to delegate to it, because I don't want to do any actual pickling. I don't think there is any reason to manipulate the actual object in question, only its type. – Ram Rachum Nov 24 '10 at 12:47
  • @cool-RR If you mean just checking if `__reduce__` or `__reduce_ex__` exist, then your results will be much less accurate. If you're actually invoking them then I doubt it'd be much more efficient than what I proposed. You're right that it's a "bunch of special cases that the `pickle` module is aware of" which is exactly why it's better to delegate than to replicate. – Laurence Gonsalves Nov 24 '10 at 23:02
  • @Laurence Gonsalves: I think that invoking `__reduce__` or `__reduce_ex__` is not the expensive part. They just return a `(function, parameters)` tuple. But `Pickler` calls its `.save_reduce` method which actually pickles this tuple, which I think is the expensive part. – Ram Rachum Nov 24 '10 at 23:31
1

Given the dynamic nature of Python, I don't think there's really a well-defined way to do what you're asking aside from heuristics or a whitelist.

If I say:

x = object()

is x "atomically pickleable"? What if I say:

x.foo = threading.Lock()

? is x "atomically pickleable" now?

What if I made a separate class that always had a lock attribute? What if I deleted that attribute from an instance?

Laurence Gonsalves
  • 137,896
  • 35
  • 246
  • 299
  • Your `x` is atomically pickleable anyway. It is the lock attribute which is not atomically pickleable, according to my definition. – Ram Rachum Nov 18 '10 at 22:18
  • @cool-RR: So all Python objects are "atomically pickleable" then? – Laurence Gonsalves Nov 18 '10 at 23:47
  • @Laurence Gonsalves: No. Locks aren't atomically pickleable. Files aren't either. You know that you try to pickle something and get a `pickle.PicklingError: Can't pickle 'lock' object: `? (Even if you were pickling something that only referred to the lock object.) Then the object type that appears in the error message is the atomically unpickleable type. – Ram Rachum Nov 20 '10 at 15:10
  • @cool-RR: Sorry if I wasn't clear, but when I said "Python objects" I meant objects defined in Python. Files and locks aren't Python objects. In any case, my point is that if you consider a Lock to be "atomically unpickleable" but a (Python) object that has a lock attribute to be "atomically pickleable" then your definition breaks encapsulation. What if I have a "MyLock" class that exists only to hold onto a lock? What good is it that it is "atomically pickleable" when removing the lock effectively destroys the object's purpose? – Laurence Gonsalves Nov 20 '10 at 23:18
  • @Laurence Gonsalves: Breaking encapsulation is fine. I have no problem that `y` will be non-atomically-pickleable, and that `x` will refer to `y`, but `x` *will* be atomically-pickleable. If you fed `x` to `dumps`, it would choke on `y`, not `x`. "Atomically pickleable" is a weaker condition than "pickleable". – Ram Rachum Nov 21 '10 at 18:50
1

I think the persistent_id interface is a poor match for you are attempting to do. It is designed to be used when your object should refer to equivalent objects on the new program rather then copies of the old one. You are attempting to filter out every object that cannot be pickled which is different and why are you attempting to do this.

I think this is a sign of problem in your code. That fact that you want to pickle objects which refer to gui widgets, files, and locks suggests that you are doing something strange. The kind of objects you typically persist shouldn't be related to or hold references to that sort of object.

Having said that, I think your best option is the following:

class MyPickler(Pickler):
    def save(self, obj):
        try:
             Pickler.save(self, obj)
        except PicklingEror:
             Pickle.save( self, FilteredObject(obj) )

This should work for the python implementation, I make no guarantees as to what will happen in the C implementation. Every object which gets saved will be passed to the save method. This method will raise the PicklingError when it cannot pickle the object. At this point, you can step in and recall the function asking it to pickle your own object which should pickle just fine.

EDIT

From my understanding, you have essentially a user-created dictionary of objects. Some objects are picklable and some aren't. I'd do this:

class saveable_dict(dict):
    def __getstate__(self):
        data = {}
        for key, value in self.items():
             try:
                  encoded = cPickle.dumps(value)
             except PicklingError:
                  encoded = cPickle.dumps( Unpickable() )
        return data

    def __setstate__(self, state):
       for key, value in state:
           self[key] = cPickle.loads(value)

Then use that dictionary when you want to hold that collection of objects. The user should be able to get any picklable objects back, but everything else will come back as the Unpicklable() object. The difference between this and the previous approach is in objects which are themselves pickable but have references to unpicklable objects. But those objects are probably going to come back broken regardless.

This approach also has the benefit that it remains entirely within the defined API and thus should work in either cPickle or pickle.

Winston Ewert
  • 44,070
  • 10
  • 68
  • 83
  • (Regarding why I want this, I answered in a comment on the question itself.) – Ram Rachum Nov 26 '10 at 19:29
  • Wow, I didn't think of this approach. It sounds much more elegant than the `persistent_id` thing. The only question is whether it works. And compatibility with `cPickle` is a must. (I know I need to encapsulate rather than inherit, but will that be enough?) – Ram Rachum Nov 26 '10 at 19:31
  • @cool-RR sadly, only persistent_id is implemented in cPickle to make that work. – Winston Ewert Nov 26 '10 at 19:40
  • @cool-RR, second solution added in edit which I think will actually work better for your case. – Winston Ewert Nov 26 '10 at 19:50
  • @Winston Ewert: Your second solution will not work, because the existing memo will be not be used in the sub-pickling, so we'll have recursion problems. (I mentioned this in my previous question.) – Ram Rachum Nov 26 '10 at 20:16
  • @cool-RR, in that case the only thought I have left is why you need to use cPickle? – Winston Ewert Nov 26 '10 at 21:11
  • Because it's much faster. This will be used in "Save" and "Load" in a GUI program and with `pickle` it's too slow. Don't worry though, I think the `persistent_id` approach will work out. I'm working on it. But I'd still like to have a smart `is_atomically_pickleable` implementation... – Ram Rachum Nov 26 '10 at 21:38
  • @cool-RR If you want an efficient implemenation of is_atomically_pickleable, you should look at the save method in pickle.py in the standard library. You should duplicate that logic if you want to determine if something can be pickled. – Winston Ewert Nov 26 '10 at 21:46
  • @Winston Ewert: Yes, that's exactly what I was thinking. I've been debugging around this function for a while now, and a few minutes ago I got the `persistent_id` approach to work and I can save and load :) There's still much work to be done, and probably many bugs to uncover, but now I'm optimistic about it. So regarding duplicating the logic: I was hoping someone who's more familiar with `__reduce__` and `__reduce_ex__` will help, since there are many things I don't understand about them and there is almost no documentation about them! – Ram Rachum Nov 26 '10 at 22:11
0

dill has the pickles method for such a check.

>>> import threading
>>> l = [threading.Lock()]
>>> 
>>> import dill
>>> dill.pickles(l)
True
>>> 
>>> dill.pickles(threading.Lock())
True
>>> f = open('whatever', 'w') 
>>> f.close()
>>> dill.pickles(open('whatever', 'r'))
True

Well, dill atomically pickles all of your examples, so let's try something else:

>>> l = [iter([1,2,3]), xrange(5)]
>>> dill.pickles(l)
False

Ok, this fails. Now, let's investigate:

>>> dill.detect.trace(True)
>>> dill.pickles(l)
T4: <type 'listiterator'>
False
>>> map(dill.pickles, l)
T4: <type 'listiterator'>
Si: xrange(5)
F2: <function _eval_repr at 0x106991cf8>
[False, True]

Ok. we can see the iter fails, but the xrange does pickle. So, let's replace the iter.

>>> l[0] = xrange(1,4)
>>> dill.pickles(l)
Si: xrange(1, 4)
F2: <function _eval_repr at 0x106991cf8>
Si: xrange(5)
True

Now our object atomically pickles.

Mike McKerns
  • 33,715
  • 8
  • 119
  • 139
0

I ended up coding my own solution to this.

Here's the code. Here are the tests. It's part of GarlicSim, so you can use it by installing garlicsim and doing from garlicsim.general_misc import pickle_tools.

If you want to use it on Python 3 code, use the Python 3 fork of garlicsim.

Here is an excerpt from the module (may be outdated):

import re
import cPickle as pickle_module
import pickle # Importing just to get dispatch table, not pickling with it.
import copy_reg
import types

from garlicsim.general_misc import address_tools
from garlicsim.general_misc import misc_tools


def is_atomically_pickleable(thing):
    '''
    Return whether `thing` is an atomically pickleable object.

    "Atomically-pickleable" means that it's pickleable without considering any
    other object that it contains or refers to. For example, a `list` is
    atomically pickleable, even if it contains an unpickleable object, like a
    `threading.Lock()`.

    However, the `threading.Lock()` itself is not atomically pickleable.
    '''
    my_type = misc_tools.get_actual_type(thing)
    return _is_type_atomically_pickleable(my_type, thing)


def _is_type_atomically_pickleable(type_, thing=None):
    '''Return whether `type_` is an atomically pickleable type.'''
    try:
        return _is_type_atomically_pickleable.cache[type_]
    except KeyError:
        pass

    if thing is not None:
        assert isinstance(thing, type_)

    # Sub-function in order to do caching without crowding the main algorithm:
    def get_result():

        # We allow a flag for types to painlessly declare whether they're
        # atomically pickleable:
        if hasattr(type_, '_is_atomically_pickleable'):
            return type_._is_atomically_pickleable

        # Weird special case: `threading.Lock` objects don't have `__class__`.
        # We assume that objects that don't have `__class__` can't be pickled.
        # (With the exception of old-style classes themselves.)
        if not hasattr(thing, '__class__') and \
           (not isinstance(thing, types.ClassType)):
            return False

        if not issubclass(type_, object):
            return True

        def assert_legit_pickling_exception(exception):
            '''Assert that `exception` reports a problem in pickling.'''
            message = exception.args[0]
            segments = [
                "can't pickle",
                'should only be shared between processes through inheritance',
                'cannot be passed between processes or pickled'
            ]
            assert any((segment in message) for segment in segments)
            # todo: turn to warning

        if type_ in pickle.Pickler.dispatch:
            return True

        reduce_function = copy_reg.dispatch_table.get(type_)
        if reduce_function:
            try:
                reduce_result = reduce_function(thing)
            except Exception, exception:
                assert_legit_pickling_exception(exception)
                return False
            else:
                return True

        reduce_function = getattr(type_, '__reduce_ex__', None)
        if reduce_function:
            try:
                reduce_result = reduce_function(thing, 0)
                # (The `0` is the protocol argument.)
            except Exception, exception:
                assert_legit_pickling_exception(exception)
                return False
            else:
                return True

        reduce_function = getattr(type_, '__reduce__', None)
        if reduce_function:
            try:
                reduce_result = reduce_function(thing)
            except Exception, exception:
                assert_legit_pickling_exception(exception)
                return False
            else:
                return True

        return False

    result = get_result()
    _is_type_atomically_pickleable.cache[type_] = result
    return result

_is_type_atomically_pickleable.cache = {}
Ram Rachum
  • 84,019
  • 84
  • 236
  • 374