Hello, Nim community!

I'd like to introduce you my little serialization library that allows to perform [de]serialization of objects in two steps:

Step 1: annotate your object as serializable or convert it to serializable via toSerializable macro

serializable:
  type MyObject = object
    my_string: string
    my_int: int16

Step 2: use special procedures generated by the macro in your code

let testObject = MyObject(my_string: "Hello, world!", my_int: 1024)
var targetStream = newStringStream()
testObject.serialize(targetStream) # serialize testObject and put result to targetStream
...
targetStream.setPosition(0) # reset the stream position to read from begining
let targetObject = MyObject.deserialize(targetStream) # read data from targetStream and represent it as MyObject

More examples can be found in the documentation and tests.

The serialization is being performed in back-to-back style without any gaps. Serialized data is not human readable due to using actual representation of basic types in memory.

For those who is already familiar with NESM here goes a list of changes that have been done since the library first appearance in the nimble package list.

  • Changed syntax of serialization settings

The previous syntax caused compilation error when applied to tuples, so it has replaced to this:

serializable:
  type TestTuple = tuple
    set: {endian: bigEndian} # here we set endian to bigEndian
    testValue: seq[int32]
    set: bool                # name 'set' still can be used as a field name

  • Added support of enums and sets

After first version released I'd got an request for enum support. So here it come. The sets support is a bonus =)

  • Possibility to convert existing types to serializables

It may be useful to avoid reinventing types that already exist in standart library.

  • Optional size definition for basic types

In most cases the previous change would be useless because almost all standart library types have no size specifiers (int32 vs int) and can not be converted to serializables. To make a situation a bit better the -d:allow_undefined_type_size compiler switch was added. This switch disables strict checking of size specifier and allows the size to be equal to <type>.sizeof. So the NESM can not guarantee anymore that on different architectures the object will be [de]serialized in proper way. Use this switch on your own risk.


There are also a few other changes and bugfixes. The latest version for a now is v0.3.1.

2017-05-09 10:30:41

looks like this is a kind of intrusive serializer. I usually avoid using intrusive serializer and prefer non-intrusive one. But I believe your library can be modified to support non-intrusive mode too, because you already using macros, and Nim macros can easily handle non-intrusive serializer.

why I prefer non-intrusive serializer?

Imagine I already using a huge library with tons of objects. Then one day I decide that I need to serialize those objects, it would be painful if the serializer only in intrusive mode because I have to annotate each of the objects. But if the serializer also support non-intrusive mode, I don't need to modify my huge library. I just need to serialize it and done.

anyway, what you have done is awesome.

2017-05-09 11:23:40

But if the serializer also support non-intrusive mode, I don't need to modify my huge library. I just need to serialize it and done.

But you need to ensure it's actually serializable anyway (for example, doesn't have a field of type Socket/File/stream/callbacks) so you gain very little. In fact, I can hardly imagine a huge library with tons of objects that happens to be serializable out of the box.

2017-05-09 20:01:04
Awesome work, I hope I can try it soon.
2017-05-09 20:02:36

(for example, doesn't have a field of type Socket/File/stream/callbacks)

in this case, user intervention is needed, but using non-intrusive serializer, the intervention is minimal.

And the code to manage this special case can be put among the other serializer routines, no need to modify the serialized library.

I have done this before using only specialized generics, that's why I believe using macros will be no problem to handle this special case.

2017-05-10 01:04:36

@Araq thanks! Your feedback is highly appreciated!

@jangko well, as far as I can understand you want something like this:

from mycode import MyType
from nesm import serialize
... # there is no code generation of the serialize procedure for MyType
let obj = MyType(...)
obj.serialize(target_stream) # The serialize is the template which actual body is being generated on demand
It looks quite possible. Despite I would prefer keep explicit declaration of serializable objects, non-intrusive mode will be added to my TODO list for next version.

2017-05-10 15:59:48
Honest question: Why not just use msgpack? E.g.

You could argue that a serializer retains object types, but since Nim is not a dynamic language, I don't see the value of that. And I see tremendous value in using a well-known, simple, fairly efficient, easily-parsed binary format.

2017-05-14 19:42:29

@cdunn2001 msgpack just can not cover all the fields that NESM does. For example deserialization of third-party file format or making a model of IP packet.

Here is a little demo of NTP packet model created by @FedericoCeratto. Can msgpack do something like this? I'm not much familiar with it but something tells me that it is not the purpose msgpack made for.

In two words: while the msgpack is a protocol, the NESM is a protocol maker. They're made for different purposes.

2017-05-17 20:14:58
@janko, msgpack4nim is your work, right? Could you explain the difference between that and NESM?

https://github.com/jangko/msgpack4nim

@xomachine, I have to say, even if your architecture is better in some way, I still wish that the NESM binary format were simple msgpack. You could encode type-info as extra fields in msgpack to create a full RPC protocol.

I worked with serialization a lot at Amazon, and I really appreciate being able to parse binary data by eyeball. (E.g. google-protobuf encoded integers are unreadable, which means that string-lengths are also unreadable.) I also love a simple parser, and the greatest strength of msgpack is that the prefix of any element includes the length of that element, so you can always pre-allocate exact buffer sizes for efficiency.

2017-05-21 15:33:51

@cdunn2001, well, it looks like I haven't made my point clear. Let me show you another example. One day I needed to repack the Unity3d assets file of a third-party game. So, I've written the Unity3d asserts packer using NESM for the file deserialization. The Unity3d assets spec can be found here. As you can see there is no type information before fields, but the format itself is already known and described in specs.

The NESM is made exactly for cases like this, when you have a third-party file/packet/whatever format and you need to easily [de]serialize it. I see no point for me to make another clone of the msgpack or the protobuf, because they're already exist. The possibility of using NESM instead of of them is just a side-feature. It was not developed as the replacement.

2017-05-21 18:39:01
<<<••12••>>>