New data format

I finally got around implementing the new data format, which decreased the file sizes by 69-78%. For example, the Sibenik dataset went from 1.6GiB to 483MiB. The rendering time increased from 153ms to 160ms, however, the variation between scenes varies from 93ms to 210ms.

The format has become the following:

struct octree {
    uint32_t avgcolor:24;
    uint32_t bitmask : 8;
    uint32_t child[0];

The child array with length 0 denotes a variable length array. As it has size 0, the size of the struct is 4 bytes, plus 4 bytes for every child. Which child nodes exist is stored in the bitmask. Usually, the child array stores the index to the child nodes, such that given an array octree[N] a child node is stored at, for example, octree[node.child[2]]. In the last layer I don’t store pointers to the leaf nodes, but just store the colors of the leaf nodes (or’ed with 0xff000000, so they are kind of hard coded leaf nodes).

For those who want to test the new file format, I’ve pushed the source code to github. It still includes the two small test files ‘sponge’ and ‘sign’, which have been converted to the new format.

Edit: I’ve updated the git-repo, such that the master branch points to the version with the new storage format. I’ve also updated the readme file.


10 thoughts on “New data format

    • There has been a windows port by someone else, but I would have to look it up and update it to the newest version. I’ve been thinking about porting it to windows (initially gcc), which should not be too difficult, given the code of the previous port. Though at home I rarely boot windows. If I can get the engine to be more than just ‘promising’, I might put some time into porting it.

      • Great, I just see windows release from github. I tried to run make on macos, but I must be rusty because I was not able to run it…

        On Windows I will be able to make a Visual Studio project. I’m currently testing Euclideon engine (which is amazing) for my company, but it’s not open source…

        Let me know if you need help, I’m ready to support you.

      • I’m looking into it, though, porting it to msvc is not that easy. I have to rewrite the core to use SSE2 directly instead of through GCC’s vector extension.

        What do you intend to use it for?

  1. Even though its slower, i believe its worth looking for a smaller data representation. Of what use would a renderer be thats fast, but requires 128gb RAM?
    Now, why didnt your rendering time go down? I wonder how the structs are acually stored. I guess they are put in places where there is space. They are probably scattered in memory without any order. The good thing about Morton code is that the locality is preserved. This way more relevant points are cached at the same time. Maybe this effect doesnt apply here?

    • The decrease in rendering performance is likely because of the increased instruction count in the traversal code. For each layer in the octree, the structs are ordered following a 3d-hilbert-curve, which has a slightly better spatial data locality than Morton code. This has always been the case.

      It is unlikely that the memory bandwidth is a problem. One of the datasets is a menger-sponge which fits in 448 bytes and the rendering speed is comparable (5-10ms slower in the new format).

      • oh ok, you are even using hilbert curve… well then i guess you are right. its probably not the memory bandwith. i just cant believe uclideon uses so much less instructions.
        have you considered SIMD (SSE)?

      • There are some vector instructions in my code, though it is all integer based, so I’m not sure whether there exist equivalent SIMD instructions.

        Still, don’t let Euclideon fool you. They do have less instructions, because they render all points at atleast 2×2 pixels, which effectively reduces the resolution. Also, they have multi threading.

      • from what i have read so far, most simd instruction sets support integers. MMX is even integer only.

        i know about the 2×2 Pixels. how much does that help your algorithm? have you tried how fast it would be?
        where is the information Form that they use more than one thread? have you thought about multithreading? the way i implemented the renderer, im not sure how to parallelize it.

      • I found out that SSE2 has exactly the instructions that I’m currently using through GCC’s vector extensions, though I’ll try porting my code to use them directly. Maybe I can still get some increase in speed out of it.

        Parallelization is conceptually easy, just split up the viewport in multiple rectangles and let each rectangle be rendered by a separate thread. Though, I might have to do double buffering to get rid of most of the inter-thread communication overhead.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s