New data format

I finally got around implementing the new data format, which decreased the file sizes by 69-78%. For example, the Sibenik dataset went from 1.6GiB to 483MiB. The rendering time increased from 153ms to 160ms, however, the variation between scenes varies from 93ms to 210ms.

The format has become the following:

struct octree {
    uint32_t avgcolor:24;
    uint32_t bitmask : 8;
    uint32_t child[0];

The child array with length 0 denotes a variable length array. As it has size 0, the size of the struct is 4 bytes, plus 4 bytes for every child. Which child nodes exist is stored in the bitmask. Usually, the child array stores the index to the child nodes, such that given an array octree[N] a child node is stored at, for example, octree[node.child[2]]. In the last layer I don’t store pointers to the leaf nodes, but just store the colors of the leaf nodes (or’ed with 0xff000000, so they are kind of hard coded leaf nodes).

For those who want to test the new file format, I’ve pushed the source code to github. It still includes the two small test files ‘sponge’ and ‘sign’, which have been converted to the new format.

Edit: I’ve updated the git-repo, such that the master branch points to the version with the new storage format. I’ve also updated the readme file.


Next steps

Its been a while since my last post. I’ve been working on changing the octree storage format, to obtain better compression. The current format stores 8 child pointers per internal node (leaf nodes are not stored), using the following 64 byte struct:

struct octree { 
    uint32_t child[8];
    int32_t avgcolor[8];

However, this is a rather wasteful format if nodes have 4 children on average or less. Therefore I considered changing it to the variable length struct:

struct octree {
    uint32_t bitmask: 8;
    uint32_t color  :24;
    uint32_t child[0];

This struct is only 4 bytes per node + 4 bytes per child pointer. With this, the data file size decreases with about 75% 60%. The drawback is that the traversal code becomes more complicated, as the children still need to be traversed in front to back order, which depends on the camera orientation.

Currently my voxel renderer can output at 5 FPS reliably. This is nowhere near the 30 FPS that I would require for an interactive game. And changing the octree file format is not going to change that. Even worse, it might actually reduce the rendering speed. I might be able to further optimize my algorithm, but without conceptual changes, I would be able to reach 10 FPS tops (rumor has it that this is the frame rate that UD runs at). As I lack hope that I can get my renderer running at the required 30 FPS, changing the storage format does not seem worth the effort.

One conceptual change that might be an option is point caching. However, I don’t like it, as it seems to much of a dirty hack to me and I have little hope that I can get it working properly. The conceptual change that I’m currently considering is to port my algorithm to the GPU. Due to the amount of recursion, that is not a trivial task. However, I’ve done some GPU programming before (using GLSL) and I think I might be able to change my algorithm such that it can run on the GPU. OpenCL seemed like the most obvious language to program it in, however, it is bulky and I have no experience with it. Fortunately, there happen to be some OpenGL extensions that make OpenGL a viable choice, such as ‘compute shaders‘, which makes a lot of GPGPU functionality directly available in OpenGL; ‘indirect rendering‘, which eliminates costly round trips from GPU to CPU and back; and ‘atomic counters‘, which I can use to fill a vertex buffer with a variable amount of points or triangles.