Serializing QGraphicsScene, and JSON
-
@SGaist
Then if I understand right yourmy_method_to_get_the_data(my_scene)
is precisely what I called creating my "standalone" semi-copied object, which I have to construct off copying what I actually do want out of each node and reproducing the desired nesting?I'm guessing are traversing your nodes, construct a dictionary with everything your want in it and then generate the XML, correct ?
No, not for this. I would (probably) do (imaginary language):
xmlStream.writeStartElement("MyGraphicsItem") xmlStream.writeElementWithValue("saveThisProperty", 999) xmlStream.writeStartElement("saveThisObject") ... xmlStream.writeEndElement("saveThisObject") xmlStream.writeEndElement("MyGraphicsItem")
So you see I am doing it "incrementally" and "selectively" to stream. I do not bother to create a dictionary or in-memory object for what I want to save.
That is precisely what I am asking about JSON serialization: can I do it bit-by-bit like for XML stream, or do I need to create a complete in-memory object with just what I want it in order to call a single JSON serialize on the top-level object?? I you could just answer that, I'll know how to proceed.
P.S.
For example, as I start my Googling, question without solution https://stackoverflow.com/questions/46895020/python-serialize-only-specific-fields-to-jsonThis gives me a json with all the fields.
For one operation I need this json with all the fields , for another I only need name and age .
Is there a way I can specify which fields to ignore and reuse the same class without having to recreate ?
Or this one https://groups.google.com/forum/#!topic/django-users/w7PINeiSAVE:
is there any way to serialize models and remove some fields? I.e. I
would like to serialize User for example, but I definitely don't want
the email to be there.you can just create your own dict variable, and then using simplejson
to convert it to jsonOf course, but that's exactly what I'm trying to avoid...
Sure enough, I'm finding plenty of examples of JSON serialization from Python which want to serialize the whole object, but no luck on how you go about serializing only some of its properties/sub-objects.... :(
-
Well, if you have only a set of reduced properties, I'd go with having a method that returns said data that you feed to the selected serialiser so you separate your concerns.
-
@SGaist
I believe I understand you correctly, and this is the conclusion I came to earlier today having done some reading. So now each object type in the hierarchy has a serialization method which is responsible for returning a reduced, "shadow" object which is a copy of those properties it actually wants saved. A single top-level object, "shadow hierarchy", is produced which can be passed toJSON.dump(object, stream)
to actually serialize at the end. Feels to me like a kludgy way to serialize, but that does seem to be the Python way to do it.In C++ you have the
<<
&>>
archive/serialization operators. Am I right: for any class you can override them and write whatever you like to the stream for the object serialization? None of this "you have to produce another object containing just what you want serialized"? That's the approach I'm used to (C#/.NET serialization). -
If you are thinking about the streaming operator, you will have the same issue. For example, the QTextStream operators, you pass it what you want from the object but if you want to serialise for various different format like XML or JSON, you'll have to do that yourself.
So what would be more generic is to define a serialiser base class and then implement serialisers for different format. You would then pass the serialiser to the class and use it to dump whatever data it wants to dump.
You can take inspiration from Django's REST Framework. You define your views, serialisers and decide what renderer to use. So the format used is independent of the data. Note that in the case of this project the "view + serialiser" would be your class and the renderer matches your serialiser.
-
@SGaist
Yep, I think I get that. I have only ever needed to support one serialization format at a time, and this time I know it will be JSON and JSON only. It's not worth coding for flexible alternatives here.The point nonetheless is that the JSON way of serializing, at least via the standard Python JSON.dump, requires you to produce a copy of your object with just the bits you want to be serialized in order to proceed is different from the others, where a custom object simply writes its serialization to the stream.
If I'm not boring you yet :), you wrote that I would expect to be doing:
scene_dump = my_method_to_get_the_data(my_scene) with open("some_file.json", "wt") as json_file): json.dump(scene_dump)
It is the way I have now done it for JSON. But that's not the way I'm used to for serialization. That would be:
with open("some_file.json", "wt") as json_file): method_to_walk_objects_serializing_what_they_want_to_stream(my_scene, json_file)
Anyway, I have a direction to proceed for now.
-
You can go on with the second method too. You have
json.dumps
which takes a dictionary and dumps the content in a string. Since each widget will be its own "entity" you can then prepend it to the file content. -
@SGaist
I'm not quite sure what you mean. I usejson.dump()
to dump to a stream. The docs for that state:Note Unlike
pickle
andmarshal
, JSON is not a framed protocol, so trying to serialize multiple objects with repeated calls todump()
using the same fp will result in an invalid JSON file.I'm not sure what they mean by (not) a "framed" protocol, but aren't they saying I cannot call
dump()
recursively at each level in the recursive descent? Or, perhaps they just mean I cannot do a second, separatedump()
to append another object to the final output, because there isn't a top-level, enclosing node?Meanwhile, I have raised https://stackoverflow.com/questions/59103160/python-json-serialize-excluding-certain-fields. There are suggestions there that I can achieve via
json.dump(object, default=serialization_handler)
, but I don't get how yet....I found no problem with C# or C++ serialization. I'm finding Python/JSON wilfully obscure, and no examples for something I would have thought I would not be the only person to want... :(
-
Warning: sneaky one char difference: json.dumps <- see the
s
. It's not the same functionality. This one returns a valid JSON string from your dictionary.I currently don't know how the dump is implemented for the JSON module but my guess would be that if you call dump twice you will have:
{"first_object": "test"}{"second_object": "other_test"}
Which is not valid JSON. You will have to write yourself the start and end of the document as well as proper separation between the different dumped objects.
-
@SGaist
Yeah, I get that bit if that is what they mean by "so trying to serialize multiple objects with repeated calls todump()
".The
dumps()
vsdump()
should just serialize to string instead of stream, I take it to be a special case which just uses a string stream instead of a file.I do not have one single dictionary for my hierarchy. I have a whole various-classes hierarchy which needs to be descended to produce the serializaton. The top-level caller of
dump()
does not even know what classes will be encountered, can'timport
all classes, and is not the place to write the code for serializing each class anyway. At each class level I need some method in that class to be called which does know how to serialize that class's properties, and call to recurse into its sub-objects. Like, I don't know, say some__toJson__()
method in each class. And havedump
/dumps()
know to call that as it goes. Like C++ would know to call<<
/>>
on each class object as it serializes. -
I know that you don't have one dict for your whole hierarchy.
What I was suggesting was that you could build that dict up traversing your hierarchy and at the end call
json.dump
on the returned object. That way, if you need to change the output at some point you don't have to re-implement the traversal, just the "dict to output" part. -
@SGaist
And that is precisely what I implemented on Friday, because I don't know any other way of doing it!I require all my serializable classes to offer a
def json_dump_obj(self) -> dict
method. And the recursive descent walker goesif hasattr(obj, 'json_dump_obj'): serialized_obj = obj.json_dump_obj()
, and uses that in the serialized object itreturn
s.So I do a pre-pass complete traversal, returning a "shadow, serializable" tree into a single object which at the end can be passed to
json.dump(shadow_obj)
.My "uneasiness" is this does not scale nicely (memory-wise) when my object tree has 1,000,000 nodes!
In C++ serialization could have worked this way too. But it does not. For
<<
/>>
you would override the operator in each class and each object would serialize itself direct to the archiving stream, not return some "shadow" object for later serialization in one go. No "one pass to get a serialization representation in "shadow" objects built (json_dump_obj()
), and then a second call/pass (json.dump()
) to serialize that to stream". This is the nub of my question about approach.... -
The dict is nothing JSON specific. Just call that method to_dict or something like that, that will keep it's purpose generic.
Since you may have that many items, I might indeed be unfriendly. What about writing a small serialiser/marshaller class that would manage the file and its content ?
During the traversal you would pass that object along to a
serialize
method of your class. That would follow your original design more closely.Note that with a small example we could devise a nice way to do that more easily. Can you write down a dummy small version of your current use case ?
-
@SGaist
Thedict
may not be JSON specific but it is serialization specific, since it is only populated with some subset of properties which are to be serialized. And to work for JSON all its properties/sub-objects must be JSON-serializable. That's why I have at least named the required serialization methodjson_dump_obj
.I don't really have a million items, or classes :) I will have, say, a dozen items. But there are various classes for the nodes, and various other classes for their sub-objects. I want to keep the serialization of each class inside each class.
When I have time, I will show what I have. Thank you kindly for looking, I will reply to your name here when available :)
-
@SGaist
OK, I believe I have finally achieved what I wanted/expected for the serialization approach. The key is the (optional)default=global_serialization_method
argument tojson.dump()
ordumps()
.Remember that, for my one million items :), I want an approach which serializes as it descends, rather than a first pass which returns some complete object hierarchy followed by a call to
json.dump()
to dump that produced hierarchy.Briefly, code outline is now like:
class ModelScene(QGraphicsScene): # Serialize whole scene to JSON into stream def json_serialize(self, stream) -> None: # Get `json.dump()` to call `ModelScene.json_serialize_dump_obj()` on every object to be serialized json.dump(self, stream, indent=4, default=ModelScene.json_serialize_dump_obj) # Static method to be called from `json.dump(default=ModelScene.json_serialize_dump_obj)` # This method is called on every object to be dumped/serialized @staticmethod def json_serialize_dump_obj(obj): # if object has a `json_dump_obj()` method call that... if hasattr(obj, "json_dump_obj"): return obj.json_dump_obj() # ...else just allow the default JSON serialization return obj # Return dict object suitable for serialization via JSON.dump() # This one is in `ModelScene(QGraphicsScene)` class def json_dump_obj(self) -> dict: return { "_classname_": self.__class__.__name__, "node_data": self.node_data } class CanvasModelData(QAbstractListModel): # Return dict object suitable for serialization via JSON.dump() # This one is class CanvasModelData(QAbstractListModel) def json_dump_obj(self) -> dict: _data = {} for key, value in self._data.items(): _data[key] = value return { "_classname_": self.__class__.__name__, "data_type": self.data_type, "_data": _data }
The point here is:
- Every "complex" class defines a
def json_dump_obj(self) -> dict:
method. - That method returns just the properties/sub-objects wanted in the serialization.
- The top-level
json.dump(self, stream, default=ModelScene.json_serialize_dump_obj)
causes every node visited to be incrementally serialized tostream
, via static methodModelScene.json_serialize_dump_obj
. And that calls myobj.json_dump_obj()
if available, else default JSON serialization of basic object type.
Interestingly, I came across someone with the same concerns as me. From What is the difference between json.dump() and json.dumps() in python?, solution https://stackoverflow.com/a/57087055/489865:
In memory usage and speed.
When you call
jsonstr = json.dumps(mydata)
it first creates a full copy of your data in memory and only then youfile.write(jsonstr)
it to disk. So this is a faster method but can be a problem if you have a big piece of data to save.When you call
json.dump(mydata, file)
-- without's'
, new memory is not used, as the data is dumped by chunks. But the whole process is about 2 times slower.Source: I checked the source code of
json.dump()
andjson.dumps()
and also tested both the variants measuring the time withtime.time()
and watching the memory usage in htop. - Every "complex" class defines a
-
-