Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Kaitai Struct: declarative binary format parsing language (kaitai.io)
91 points by mpweiher on April 22, 2017 | hide | past | favorite | 13 comments


The format gallery has some lovely examples. I was initially skeptical, but this is quite nice.

For other people who might not look as far, it seems that the main advantage of this over other solutions (PB, Cap'n Proto, Flatbuffers, etc.) is that you can describe types that don't originate in the Kaitai ecosystem, and come out with code that can correctly read and manipulate those datatypes.

One slightly unfortunate item -- I don't see anything about using the generated code in a zero-copy situation. The docs are not very detailed on the point of the generated API, so perhaps this capability is present and just not documented in an easy-to-find way. This is not fatal, it's just a slightly less than optimal state of affairs.

Reading the stream API ([1]), it seems this is correct. There is nothing like Cap'n Proto-like direct-access semantics, nor even anything on the level of ZeroCopyInputStream [2].

[1]: https://github.com/kaitai-io/kaitai_struct_cpp_stl_runtime/b...

[2]: https://developers.google.com/protocol-buffers/docs/referenc...


Reminds me of Emacs Lisp's "bindat" library which truly rocks. Scroll down to "The following is an example" on this page for a tangible example:

https://www.gnu.org/software/emacs/manual/html_node/elisp/Bi...


Sounds like Erlang's bitstring operations.

I wrote a set of OCaml macros to do the same thing: https://people.redhat.com/rjones/bitstring/html/Bitstring.ht...


Erlang's bitstrings are one of the things I love about the language. I've never seen an equivalent in other languages (not saying they don't exist, just that I haven't seen any).


Looks very nice. Too bad there isn't a C API.


What would it take to make a C API to a binary format parsing package? If you have an itch, you are the best one to know how to scratch it. Everyone else might not see the simplicity that you see.


This is very useful for forensics. I was impressed they already had a module for reading Windows registry hives. I did a parser manually myself (2x times) and remember the difficulty of doing that.

Their registry hive module looked correct and extremely simple. Was also impressed with the ISO support to parse/extract files. All in all, seems like a very solid option for parsing binary files.


I'm generally skeptical of YAML-based languages(YAMLangs as I say), but this seems to be one of the more readable YAMLangs I've encountered. I especially like how the author(s) solved the variable length/repeat fields situation.


I was looking for something like this, to autogenerate a HTTP/2 and a gRPC parser/serializer in Swift!


Can't all these data structures be described just as well using ASN.1, which has the advantage of being thoroughly documented, tried and tested, and just as declarative?


You could do this but it would be more trouble than it is worth. ASN.1 ECN can encode arbitrary protocols but the process is fairly complicated [0]. An ASN.1 compiler1 for embedded/space systems [1] mentions that there is only one implementation of ECN.

[0] https://www.itu.int/en/ITU-T/asn1/Documents/ECN_Introduction...

[1] http://web1.see.asso.fr/erts2012/Site/0P2RUC89/7C-4.pdf

Edited to add the second link


There is nobody that I hate enough to ever recommend that they subject themselves to the Hell that is ASN.1!


Hmm, I don't think it's _that_ bad...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: