Overview
I want to serialize my complex objects. It looks simple but every step creates a different problem.
In the end, other programmers must also be able to create a complex object inherited from my parent object. And this object should be pickleable, for Python 2.7 and Python3.x.
I started with a simple object and used pickle.dump
and pickle.load
with success.
I then created multiple complex objects (similar but not identical), some of which can be dumped, and a few cannot.
Debugging
The pickle library knows which objects can be pickled or not. In theory this means pdb
could be customized to enable pickle debugging.
Alternative serialization libraries
I wanted a reliable serialization independent of the content of the object. So I searched for other serialization tools:
- Cerealizer which selftest failed and seems to be outdated.
- MessagePack which is not available for Python 3.
- I tried JSON and got the error:
builtins.TypeError: <lib.scan.Content object at 0x7f37f1e5da50> is not JSON serializable
- I looked at Marshal and Shelve but all refer to Pickle.
Digging into using pickle
I have read How to check if an object is pickleable which did not give me an answer.
The closest I found was How to find source of error in Python Pickle on massive object
I adjusted this to:
import pickle
if _future_.isPython3():
class MyPickler(pickle._Pickler):
def save(self, obj):
try:
pickle._Pickler.save(self, obj)
except:
print ('pick(3.x) {0} of type {1}'.format(obj, type(obj)))
else:
class MyPickler (pickle.Pickler):
def save(self, obj):
try:
pickle.Pickler.save(self, obj)
except:
print('pick(2.x)', obj, 'of type', type(obj))
I call this code using:
def save(obj, file):
if platform.python_implementation() == 'CPython':
myPickler = MyPickler(file)
myPickler.save(obj)
I expect the save is executed until an exception is raised. The content of obj
is printed so I can see exactly where the error orcurs. But the result is:
pick(3.x) <class 'module'> of type <class 'type'>
pick(3.x) <class 'module'> of type <class 'type'>
pick(3.x) <class 'Struct'> of type <class 'type'>
pick(3.x) <class 'site.setquit.<locals>.Quitter'> of type <class 'type'>
pick(3.x) <class 'site.setquit.<locals>.Quitter'> of type <class 'type'>
pick(3.x) <class 'module'> of type <class 'type'>
pick(3.x) <class 'sys.int_info'> of type <class 'type'>
...
This is just a small part of the result. I do not comprehend this. It does not help me which detail is wrong to pickle. And how to solve this.
I have seen : http://docs.python.org/3/library/pickle.html#what-can-be-pickled-and-unpickled but it does not help me much if I cannot detect which line in my code cannot be pickled.
The code in my complex object works as expecting, in the end running a generated code as:
sys.modules['unum']
But when pickling it seems the 'module' is not read as expected.
Attempt at a solution
Some background to clear what I mean. I have had programs who worked, and suddenly did not work. It might be an update or an other change resource. Programs who work for others and not for me and opposite.
This is a general problem so I want to develop a program to check all kind of resources. The amount of different kind of resources is huge. So I have one parent object class with all general behaviour. And a as small as possible detail class for the specific resources.
This is done in my child resources classes.
These resources have to be checked with different versions f.e. Python 2.7 or Python 3.3 If you run with Python 2.7.5 the resource is valid if Python 2.7 and higher is required. So the check must be a bit more then an equal value. This is specified as a single statement in the custom config file. There is a specific config file for each program, which must be as small as possible to be used. One resource is checked with a single statement in the config file.
The general class is about 98% of the code. The specific resources and config is just about 2% of the code. So it is very easy to add new resources to check, and new config files for new programs.
This child resources :
class R_Sys(r_base.R_Base):
'''
doc : http://docs.python.org/3/library/sys.html#module-sys
sys.modules returns only a list of imported module
statement :
sys.modules['psutil'] # may return false (installed but not imported
but the statements :
import psutil
sys.modules['psutil'] # will return true, now psutil is imported
'''
allowed_names = ('modules', 'path', 'builtin_module_names', 'stdin')
allowed_keys_in_dict_config = ('name',)
allowed_operators = ("R_NONE", "=", 'installed') # installed only for modules
class_group = 'Sys'
module_used = sys
def __init__(self, check_type, group, name):
super(R_Sys, self).__init__(check_type, group, name)
called by this config statement :
sc.analyse(r.R_Sys, c.ct('DETECT'), dict(name='path'))
can be succefull pickled. But with config statement :
sc.analyse(r.R_Sys, c.ct('DETECT'),
dict(name='modules', tuplename='unum') )
it fails.
This means in my opinion that 98% main code should be ok, otherwise the first statement would fail as well.
There are class attributes in the child class. These are required to function properly. And again in the first call the dump execute well. I did not do a load yet.