A sorting based rendering method

Rasterization, ray tracing, ray casting and splatting are well-known rendering methods. At Euclideon they claim to have found a new method, which they call ‘searching’ or ‘sorting’. I think sorting is a good name and believe that it works as described below.

Scene graph

First you need a scene that you want to render. We want to be able to break down this scene into smaller pieces indefinitely, hence this scene needs to be represented as a tree of infinite size and height and no leaf nodes. It is stored in memory as a directed graph of finite size. Each node models a part of the scene by describing how it is made out of other nodes. As such, the graph effectively describes a fractal. One of the nodes in the graph is the root node and contains the entire scene. We will refer to this graph as the scene graph.

A few trivial examples of scene graphs:

The line segment is made of two scaled down versions of itself. The cube is made similarly. The triangle graph consists of two nodes (marked with a and b) as this eliminates the need for rotation. Based on the cube graph, we can convert an octree into a valid scene graph by letting its leaf nodes point to themselves. Sets of line segments and triangles can be stored in K-D trees or BSP trees.

Culling

Culling is the process of removing parts from the scene that are not visible. This is generally used to speed up rasterization and splatting algorithms and to ensure that parts near the camera are not obscured by parts further away. The sorting rendering method is a combination of frustum culling and occlusion culling.

With frustum culling the parts that are outside the frustum of the camera are removed. This is usually done using bounding spheres and a dot product. We use a different approach. First we chose an origin and chose the xy-axes to be parallel to the sides of the viewing window. Then the bounds of the viewing window projected on the xy-plane are computed. In the image below these are denoted x1 and x2 for the x-axis. Furthermore we compute how these bounds change when the origin is moved. Those bounds and their changes are then used to determine whether the model (shown in orange) is within the frustum.

With occlusion culling, the parts that are fully obscured by other parts are removed. The best-known method for this is the z-buffer, which works on the pixel level. It can be changed to work on the part level by computing a bounding rectangle and the lowest z-value and then check for each pixel in the rectangle whether it is lower than the lowest z-value of the part, in which case it fully obscured and thus can be occluded. This is obviously very slow and can be sped-up by using a quadtree as z-buffer, in which the leaf nodes are the original z-buffer and internal nodes have the maximum value of its children. This quadtree is known as a hierarchical z-buffer.

Another way to speed up the occlusion culling is to sort the scene parts in front to back order, which maximizes the number of parts that are obscured and thus do not need to be rendered. If the parts are ordered perfectly, such that later parts can never obscure earlier parts, then we can use a bitmap, which describes whether a pixel is rendered, instead of a z-buffer.

Performing culling tests for each individual part of your scene can be almost as slow as the rendering itself. Therefore these tests are generally performed on groups of parts. This is where our scene graph comes into play, which defines how the entire scene is constructed out of some groups, which are recursively constructed out of smaller groups.

The new rendering method

The rendering method can be described as searching (for the model parts that need rendering), sorting (put those parts in buckets, known as pixels) and mass connected processing (doing this for many parts simultaneously). It is a combination of the above described frustum and occlusion culling, that entirely eliminates the need for rendering.

The rendering starts by throwing the root node of your scene graph into the root node of the occlusion quadtree. Then the following recursive process starts:

  • If the scene part is larger than the quadtree node it is in, then it is broken into smaller parts (by traversing the parts outgoing edges). The parts that pass the frustum and occlusion culling tests for the current quadtree node are reinserted in front to back order.
  • If the scene part is smaller than the quadtree node and the quadtree node is not a leaf node, then it is inserted in those quadtree child nodes for which the part passes the frustum and occlusion culling tests.
  • If the scene part is smaller than the quadtree node and the quadtree node is a leaf node, then the pixel corresponding to that leaf node is colored using the average color of the part.

The occlusion culling test is fast, because it only needs to check a single boolean (or, if the parts are not sorted perfectly, compare two integers). The frustum culling is fast as it only needs a small number of comparisons (and is usually easy to implement). Furthermore, during the recursion, the new bounds of the projected view window can also be computed in constant time.

One nice thing about this rendering method is that it entirely removes the need for perspective projection. You only need to implement the frustum culling test and how the projected view window bounds change during the recursion. For some scene graphs, including the octree, these operations can be performed with only additions, subtractions and bit shifts. (Btw. this method effectively performs a long tail division, which is used to create the perspective projection.)

I suspect that the Atomontage Engine is also using this sorting approach.

(The images were made using ipe and pdftocairo.)

Advertisements

41 thoughts on “A sorting based rendering method

  1. Hi!
    Thanks for this nice description of your approach.
    One question I had, did you do some profiling of your algorithm, to find how it spends most of its
    time? You seem to be concerned with little things like arithmetic operations. For large trees I
    expect the algorithm to be mainly memory bound, and it may be beneficial to do more computations
    and less traversal, if possible. For example, in step 2 you insert the same octree node into
    multiple quadtree nodes, which means traversing them several times.

    • I have implemented this algorithm, such that it renders a cubemap, but only those parts of the cubemap that are actually in view (sourcecode is available at github). The algorithm takes about 50ms on average per frame. I did some profiling, but did not store the results. What I remember is that each octree node is visited for 8 different quatree nodes on average, for a worst case frame.

      I have no idea in how the used arithmetic operations affect the running time, but speed is not why I find them important. I mention them becase at Euclideon they claim to have a rendering algorithm that only uses said arithmetic operations. So if my algorithm also only uses those operations, that means that it could be similar to theirs.

      • I must say that I find the 50ms figure not entirely precise. You also have to speak about the resolution. While you render into a 1024×768 screen, you actually stop traversing the tree much earlier than the node gets to one pixel size. It’s especially noticeable with far away nodes, which are rendered as rather large squares, or maybe it’s just an impression. So in your program I can see these LOD switches clearly as I approach an object. I did a screenshot and counted pixels, I think your pixel is about 6×6, that’s 36 times less than the pixel-perfect resolution would require. Perhaps the 50ms should be multiplied by that factor, if you want to compare to UD.

        I tried to tweak the effective resolution. It seems to have something to do with the number 2 in the line ” if (d <= 2*ONE) { ", and indeed if I make it higher, the resolution seems to increase with a proportional drop in performance. However, it introduces other problems, so maybe it's not the right way to do it.

        In any case, I think this can be improved, hence my question. Maybe retraversing the tree 8 times can be avoided.

      • I’ve been stuck at the current performance for some time now. However, with the algorithm described above I can get rid of the cubemap,
        which I consider is the most important ‘flaw’ in my implementation currently.

        Also the implementation is rather conservative, which causes most points to generally be rendered twice as big. I think I should make the size of the frustums half a pixel smaller on each side.

        Once I’ve removed the use of the cubemap and have tweaked the code a bit, I will do a new movie, some benchmarking and some profiling. Do you have some recommendations for profiling tests?

        I have looked into removing the 8 times traversal. It might be possible, however, the quadtree is at the core of the algorithm and thus cannot be replaced or removed.

  2. Yes, the way you organize the traversal you must revisit the nodes. I’m implementing an alternative version, which uses divisions but visits the nodes only once. Unfortunately it’s hard to compare performance, since I traverse octree until the node is at most one pixel size, to achieve maximum detalisation. So it’s naturally slower than your program for the reason I mentioned above.

    For benchmarking and profiling I use the tower dataset for the lack of a better one. We should really get something similar to Dell’s island, with the points forming continuous surfaces, and filling more space. This will be more relevant to gaming use cases.
    As for the profiler itself, I use AMD CodeXL. On my Intel CPU most of the advanced features are not available, though… And it’s pretty difficult to understand what’s going on. Currently I suspect that the CPU is waiting for the memory about 70% of time in my implementation, as almost every tree access is a complete cache miss.

    • Is your implementation open source?

      I’ve thought about using divisions, such that I can traverse each node only once, because I already implemented a Fenwick tree for occlusion.

      The performance comparison will have to wait until I got rid of the cubemap.

      Once I’m satisfied with the performance on the models I currently have I will start on making one similar to the demo island.

      In my case I believe that branching was the biggest performance problem. Rendering a Menger sponge is approximately equally fast as rendering a 4Gb dataset.

  3. No, it’s not open source, at least not yet. If you are interested, I can send you the octree traversal code privately, just write me a email. There’s nothing revolutionary there, though, just front to back traversal with an occlusion buffer in a quadtree, which is very similar to yours. Each node is projected and splatted as a point. 1024×1024 takes about 350ms worst time in the tower dataset now.

    For the dataset, I thought to try to translate some standard benchmark datasets into point clouds and then SVOs. Like those: http://graphics.cs.williams.edu/data/meshes.xml . Ideally our renderers should outperform modern GPUs on those, while having the same image quality and level of detail 🙂 That would be a convincing demonstration of the power of the approach.

  4. Have you considered using a simplex subdivision grid instead of an octree? This could reject more empty space per subdivison, and coupled with barycentric coordinates could reduce this to a binary search.

    • I only recently discovered that I could replace the octree with other data structures, so no, I have not considered a simplex subdivision grid. However, keep in mind that a balance needs to be found in speed, memory usage and also complexity.

  5. how do you guys handle rotation? i mean how do you put the octree into view/world space, without applying calculations to every node? i’m only able to render when the tree is axis aligned.

    @Sheshbazzar
    what you have there is very interesting. especially the link to meagher stuff. but your code is … unreadable, sry.

  6. batman,
    I apply calculations to every node. You only really need to consider jumping from a parent node to one of its children, and change the position accordingly. There are only 8 possible direction, so with some precalculation you can reduce the required computations to a few additions. To obtain the screen position, you have to do two divisions.

    • thx ine! i already figured it out. i transform the root into viewspace and then traverse the rest of the tree with very few calculations, just like you said.

      • Would you be interested in a performance comparison? I’ve transformed the sibenik dataset into fine sparse voxels, which fill the entire screen at highest detail (1 node = 1 pixel) in 1024×1024 if you’re inside the building. This is more like what you would see in a game. I get about 250ms per frame worst stable case in these conditions. I can share the file with points if you (or someone else) are interested; it’s quite large though.

      • Sounds cool. How many points is it? And what size? I’m interested in testing my algorithm with it. Any idea how you want to/can send it to me?

      • unfortunately i encountered a problem i somehow couldnt solve, so i have nothing to show. but i have a new idea. i have to think that through and then ill implement it soon. if i have that, i will post my results.

        but i have something else for you, if you dont know it already… here is one of the datasets shown in atomontage and ud videos (at least i think its the same one). its over 8gb:

        http://kos.informatik.uni-osnabrueck.de/3Dscans/tower.xyz

        there are others to be found on: http://kos.informatik.uni-osnabrueck.de/3Dscans/

      • I have used that model frequently. However, I’ve never seen it being used by atomontage or ud. If you have links to those movies, please post them here.

      • ok, im probably wrong then. atomontage and ud show construction sites, i thought it was this one. sry for the confusion.

      • bcmpinc,
        I’ll try to upload it to dropbox or google drive tomorrow, then I’ll tell you the details. I think in binary and compressed it should not exceed 1Gb.

      • Here it is: https://www.dropbox.com/s/zdy53f7ywfs3mls/sibenik.7z
        It’s only 167Mb, 75Mpoints. Instructions included.
        Don’t mind the “dead pixels” on the screenshot, that’s a problem of my renderer. The dataset has no holes. The lame light is baked in. I suppose no one of us is doing realtime shading yet? Otherwise I could provide the normals as well.
        Later I’ll probably make larger versions.

        Screenshot:

      • I’m trying to convert it into my octree format and noticed that the points are 10 bytes. The first six are the coordinates, the second 3 the color and the last is a padding byte.

  7. I have uploaded some images from my results of directional hashing algorithm.

    This is a view of the regions created in screen space, think of them as wonky cones extending from the eye to inf.

    52,000 / 4million points

    Query:
    Start hashing points based on their normalised direction from the viewer. Continue until a threshold is reached ( certain number of pixels are filled) If the points are not sorted, the map storing hashed values must do depth compare ( manhatten distance)
    Render:
    for each screen region, calc normalized direction hash value. Intersect against possible intersectees

    Once th hash is calculated, the view can be moved with 6DOF.

    Problems:
    Hashing speed, std::unordered_map not ideal, as tear down is slow. Possibly a fixed size hash map may provide better speed at the expense of memory.

    Pros:
    Multisample very cheap. View independant. Collapses 3d space to 1d hash ( streamable )
    Multithreadable

    Cons:
    Moving the entire universe around the ship, or at least subsets of it 😉

  8. Forgot to mention, Postgres is supporting patch based point cloud data and raster data, but the take away from their presentation was;
    Compress the points heavily, use region relative addressing, store in LE format (makes radix searches faster), store in reasonable (400×400) chunks (with non overlapping AABB’s).

    • what do you mean by “moved with 6DOF” (Depth of Field)? did you mean 6 LODs? how does your approach perform when more points are visible? lets say pixel LOD for 800×600 = 480000 points.

      • Hi, i mean you can change pitch roll and yaw (6 degress of freedom).
        I have a search engine, so how many points are displayed depends on ;
        The density of the area of cloud you are looking at.
        The level of detail required, down to sub pixel if so desired, but it runs at 6fps at max resolution ;(

        A brief description is, from an unindexed point cloud, select regions that are in the viewers field of view. Project selected points to a unit sphere, checking depth as needed. Finally, draw only pixels to screen that are relevent.

        Screen resolution is not a problem, as each point is hashed only once from a given view position, direction stays the same. It is a fixed cost per point hashed, and fixed cost per pixel, and with a bit of SIMD rejiggery, It’ll be able to hash 4 points at a time.

        I have a bloom filter to determine which points have been hashed already from a given viewpoint, and std::unordered_map for the unit sphere projected points.

        I will make a video and post it so you can see it working. I am trying to make something that works with memory contraints + bandwidth contraints on unindexed, or live data, so now I have to try to create a spatial data structure on the fly, that can be synched between client and server so only points that are relevent are transmitted ( ie the difference between 2 positions ). Any suggestions? I’m leaning towards HBVH, but still searching for a memory contrained spatial data structure.

  9. bauke, you said the frustum culling was easy and simple. can you explain how you implemented it? or can you tell me where to find it in your code? my implementation is not too complicated either, but its also not 100% correct.

  10. Can we agree on one test scenario for performance comparison?
    screen width and height, certain point cloud, with/without optimizations (simd etc.)…

    i implemented something that works, but its kind of slow. i tested it on an i5-3337U.
    i get around 6fps with ~438000 points visible. no manual optimization. just compiler settings set to optimize.

    • As benchmark model we should use https://www.dropbox.com/s/zdy53f7ywfs3mls/sibenik.7z as it is available for everyone and can losslessly be converted into an octree. We can make a list of positions + rotation matrices and use these to measure render speed. After that things get difficult.

      Everyone is using different resolutions. We can chose one or more fixed resolutions for the benchmarks, but that does not help much. The render methods also differ a lot in produced image quality. For example, my cubemap renderer did 25fps at 1024×768, my new renderer gets 6fps at that resolution, but the image quality is much higher.

      And then there are other details. Such as caching. I preload the scene into cache by rendering the scene once and then I take the average of another 5 renderings of the same scene.

      Last issue is hardware. It differs for everyone.

      As optimization, I use the highest level that I could think of, which is hardware specific and link time optimization for maximum speed. This is -Ofast -march=native -flto for gcc.

      • the translation/rotation matrices is a good idea. i think choosing one resolution would be good too.
        on the other hand, you are right about the rest. its hard to compare.
        but at least it could give a rough indication on performance.

        btw. i think atomontage uses the graphicscard. on his homepage, he claims to have ~2000 fps @ 720p. what do you think about that?
        do you guys think real time shading is possible on ud?

  11. Yeah, I saw that about atomontage. Sounds wierd, probably that’s with nothing updating on the screen.
    Real time shading is possible, of course. Just process your pixels with a pixel shader after they’ve been rendered by the other algorithm. Most likely you’ll need normals as well, which will increase the storage requirements. Dell’s UD doesn’t do this.

    • i found information on the atomontage site: http://www.atomontage.com/?id=tech_overv
      he states, that the cpu transformes the atoms into another 3d format, the gpu can process. he says its 3D -> 3D on the cpu and then 3D -> 2D on the gpu.
      unfortunately i dont know too much about gpu programming. and i dont have an idea yet, how to split my algorithm to hand something to the gpu, but im convinced, that this is important. just think about it: ud gets 30fps on 720p or something on an i7. so you need a high end cpu and low resolution to use ud/software atom rendering with low fps. thats bad for a game or any kind of simulation. right?

  12. The main problem with GPU for me is that you need to transfer the data there, which is rather slow. It’s best done in large batches and requires a lot of optimization. I think the Atomontage guy is not interested in these issues. He never claimed to work with huge datasets, and probably stores the whole scene in memory at all times.
    Also I have the impression that in Atomontage the detail level is much lower. I think he uses smart rendering of the leaf level to make smooth surfaces, maybe similar to ESVO technique described by NVidia people. This can be very powerful I suppose, but it’s also more complicated.

  13. Hey bcmpinc,

    I was looking through your code, I’m trying to implement your divisionless perspective projection into my code. From your code, it looks like your projecting the 8 corners of the root on to the screen and then somehow, for each corner you calculate a l,r,t,b value which allows you to figure out how those 4 values change when the center moves in each axis.

    The first thing you do is you calculate the quad tree bounds which I’m not sure what its used for but from just factoring in the variables, it is calculated using the below formula:

    quadtree_bounds[0] = screenwidth  / 2
    quadtree_bounds[1] = screenwidth  / 2 - quadtree::size
    quadtree_bounds[2] = screenheight / 2 - quadtree::size
    quadtree_bounds[3] = screenheight / 2
    

    I’m assuming from the name that this is supposed to be the quadtree projected into world space but the math seems to be doing something completely different and I’m not sure what it is.

    After some setup, the next thing you do is you shift the coords of every corner to the left by 26 (scene_depth). You then put this newly computed corner into camera space and you calculate its l,r,t,b values using the below formula:

    left   = -(z * quadtree_bounds[0] - x)
    right  =   z * quadtree_bounds[1] - x
    top    = -(z * quadtree_bounds[2] - y)
    bottom =   z * quadtree_bounds[3] - y
    

    The calculation of dx, dy, dz makes sense given that the values found in bounds are l,r,t,b but all of the above is confusing. I know perspective divide followed by scaling is what is usually used to achieve perspective but in your case, you multiply z by a constant and then subtract x/y which doesn’t actually perform the divide at all. You also shift the corners values by scene depth which makes them extremely large values. I tried remaking your code in my own visual studio project and the dx,dy,dz values vary in numbers that are 7 digits long which seems incorrect. Would you be able to explain what the above steps do to calculate the l,r,t,b at each point?

    • Your computation of quadtree_bounds is incorrect, the values for left (0) and bottom (3) should be within the range [0,1]. In any case, the values of quadtree_bound should be interpreted as follows:
      – hold the quadtree in front of you at 1 distance;
      – such that the left bottom corner of the quadtree coincides with the left bottom corner of the screen;
      – and one pixel corresponds to one leaf node of the quadtree,
      then the values at index 0,1,2,3, are the distance from the center of your view to respectively the left, right, top and bottom border.

      The next step I multiply with 1<<26 and convert it to an int32_t variable. I’m doing fixed point arithmetic here, where 1<<26 represents 1.0, so think of the result as a 32-bit binary real number like +000 1.001 0100 1001 0010 0010 0010 0001b (‘+’ is the sign bit).

      In the last step, for each corner C of the octree root node, I first project the quadtree onto the plane through C. I do this using multiplication by z as I’m projecting from screen space into world space, rather than its inverse (division by z which is used when projecting from world space into screen space). Then I translate the quadtree projection, such that the origin is moved to coincide with C.

      • Ahh I made a mistake when I wrote that post, all of quadtree_bounds should be divided by screen_height. The thing is, when you do view_left, right, top and bottom, for all axis you divide by near which you set to screen_height. Is there a reason why you chose to set it at screen_height since that’s really far from the camera. It does make view_top = 1/2 and view_bottom = -1/2.

        So if the quadtree_bounds is just supposed to be the quadtree projected on to the parallel plane at a distance of 1 and also share the bottom left corner with the screen, shouldn’t it be going from -1 to 1 in both axis? You keep the bottom left corner in the same reference frame as the actual screen but the top and right are almost shifted differently when you multiply them by q::size/surf.width and surf.height. I’m assuming this is to convert the coordinates such that 1 pixel is 1 quadnode.

        So I believe that I understand now the general idea of your perspective projection. Basically, for every corner in the root, you project the quadtree on to it and then measure the l,r,t,b from each edge of the screen to the corner. You then figure out how the l,r,t,b change since they change linearly as the screen increases and decreases linearly when you move in each z direction and those are the dx,dy,dz values.

        To split up the octree node into its 8 children, all you need to do is scale the quadtree to the position of each child’s corners using dx,dy,dz and then you will have the l,r,t,b for the corner. To render the nodes, you simply shrink the quadtree down into its 4 quadrants and then shift the corners of the current octree node based on which of the 4 quadnodes we are examining. You then check if a corner is in each of the 4 nodes and if it is, you recurse. Once you get to the bottom level of the quadtree, you just directly render.

        There are still some stuff that confuse me. First off, in your recursion function, the first loop you shift the bound by 1 to the left which is the same as multiplying by 2. In that loop, I am assuming that your decomposing the octree node to its children but instead of making each child half the size of the root, you make the original bound twice as large and then you don’t divide dx,dy,dz by 2. Is there a reason for doing it this way?

        In your quadtree traversal, you divide the bound and dx,dy,dz by 2. Then for each of the 4 child quad nodes, you use the blend to shift the octree bound and its deltas to keep everything in reference to the current quadnode. All of that seems fine but in that part of your code, you don’t seem to be occlusion culling. You check if the node is inside the frustum of each quadnode but you don’t check if that quadnode is already occupied by another node. Is this something you just never wanted to add in due to performance or have I missed the piece of code which does this? Also, you only check if the bound value is visible in each quad node but the bound is just the farthest corner of the octree node. Wouldn’t it be possible that a quadnode doesn’t have bound in its view but another corner of the octree node instead which your algorithm would label as empty and not recurse?

        Sorry for all the questions, I’m really interested by your approach. I’ve been going at this problem all summer and I was looking at how different people were approaching this problem (epic point cloud, yellow mice demo, etc). I implemented a variant of epic point cloud and you can find some depth renders below (http://i.imgur.com/XneJjoy.png, http://i.imgur.com/VD2mR0O.png, http://i.imgur.com/na0052T.png). Currently, I use a normal z buffer instead of a quadtree to keep things simpler. I basically traverse front to back, at each node I render the AABB which encompasses the entire node and check for occlusion. If its visible, I traverse or render, else I reject. I did this in such a way where I can scale and rotate my octree and it still renders correctly. The issue is that in the above images, I get artifacts when I’m super close to the octree and when I’m inside, the performance drop is drastic for my fractal (1s per frame sometimes). When I’m farther out it ranges based on screen coverage. 40% of my runtime is taken up by computing the AABB of each octree node in screenspace so when I saw that your doing this without division, I was immediately interested. Thanks a lot for the help so far. I hope I can get my algorithm down to 100ms soon which I don’t think is unreasonable if I speed up my splatting.

      • Setting the near plane to some high value, doesn’t really matter as I’m not doing near-plane clipping. If this is a problem, you can always divide all the frustum values by SCREEN_HEIGHT, such that the near plane is at 1.

        The reason I multiply the bounds by two, rather then dividing the dx,dy,dz by two, is that this way the computations don’t depend on the depth in octree. It also has the benefit of providing one bit accuracy for each recursion. And overflow doesn’t happen (afaik).

        Occlusion culling is a very important part of my rendering algorithm. Though its implementation is so simple that it is easy to mis: if (mask&(1<<i)). So if a quadnode has been fully rendered, my algorithm won’t traverse into that quadnode.

        QuadNode.frustum contains information about how the bounds change when moving to other corners of the octree node. This is included in the frustum check such that the quadnode’s bounds is checked agains the bounding box of the octree node (which is the bounding box of the bounding box of the eight corners of the octree node).

        My algorithm is now doing ~100ms per frame on average and I’m kind of stuck there. I don’t think I can get it any faster, unless I start doing multi-threading or port it to the GPU.

  14. Euclideon renders Voxels, not points.

    As in, voxels are like pixels, just a list of values, only with 3 resolutions to access the array with instead of 2 (usually colours, or normals or density for instance) but ‘atoms’ usually refers to data that is based on pitch, yaw and depth, instead of just a list of colours. Sometimes the terminology can be confusing since people assume that since a voxel is rendered by the nearest point (point sampling?) that all forms of ray casting or rasterization of a voxel is an atomic renderer. Some people think that a voxel is only a voxel if you polygonise it first, which isn’t necessarily true. Also, it is not hard at all to animate voxels, because you’ve all seen the NES right? That had animation on it, either through sprite sheets or just having multiple sprites. I don’t know why people are spreading the false information that animation is difficult with voxels, there is no difference between voxels and pixels, only the amount of memory you need to hold the large voxel data sets in. Transformation with 3D primitives is actually much much more faster than with many 2D primitives, since you would need many triangles to make a model of the level of quality that a good, high resolution voxel data set has.

    Also, I’m not sure since they never say, but I think Atomontage engine is a ‘slice based’ renderer. I think that it uses trees like Oct trees or something to speed up the CPU renderer, which essentially just makes some flat shapes within the volume being rendered, giving the vertices of each flat shape voxel based co ordinates, and making it face the screen. Then, each flat shaped is rasterized by the GPU, since a CPU would have major issues rasterizing multiple full screen polygons (with Transparency as well) at any decent high resolution. You can notice also, when they are adding holes into that white/silver car (the one with the open top) the voxel detail level changes, it seams to jump to a lower quality or something. If I had to guess, I would say that the voxels are streamed to the GPU based on view position, much like how Unreal streams in textures, and until the higher level of detail tree is streamed in after changing, it displays a lower resolution voxel texture instead.

    Also, don’t be afraid to aim for low resolution when you are working on these software renderers, performance is more important than the quality level of the screen. I’m actually aiming for 320 by 200 for my voxel ray caster! That’s 64,000 ray casts per frame! I might even have to lower it further to a pathetic number like 64 by 64! I hope not though.

    • I’ve been working on this for 2.5 years now, and am quite capable of understanding the meaning of Bruce’s words. His statements are bald, and slightly exaggerated, but also mostly true. I know that Euclideon uses sparse voxel octrees to render voxels. Though that is only because it was easy to implement and work with, not because it is a limitation of the algorithm that Bruce invented.

      The description I gave above is similar to what Euclideon describes in their patent. However, they have a different occlusion method that doesn’t use a quadtree. Also, where my algorithm is perspective and uses only integer addition, subtraction and bitshifts, Euclideon also uses multiplications and orthogonal projection.

      With regards to tilesets, please take a look at this post: https://bcmpinc.wordpress.com/2014/11/24/cubesets/ . So yes, I know that cube based animations are an option. Though, with an atom-based scene graph you can also do unimaginably cool rotation and shifting based animations.

      With regard to the resolution and performance of my renderer, I’m at an unstable 10fps at a resolution of 1024×768. However, I want to use the renderer in a game. So a resolution of 1024×768 and a stable frame rate of at least 30 fps is an absolute minimum. Though I’d obviously would prefer 1920×1080 at 60fps with anti-aliasing and transparency. I hope to achieve that by porting the algorithm to the GPU. See https://bcmpinc.wordpress.com/2015/08/09/moving-towards-a-gpu-implementation/ , including the comments.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s