triangle back-face culling was a one line change
My current triangle initialization code looks like this:
p_triangle init_triangle(float4 positions[3], float4 colors[3]) {
p_triangle new_tri = {0};
new_tri.positions[0] = positions[0];
new_tri.positions[1] = positions[1];
new_tri.positions[2] = positions[2];
new_tri.colors[0] = colors[0];
new_tri.colors[1] = colors[1];
new_tri.colors[2] = colors[2];
new_tri.edges[0] = new_tri.positions[2] - new_tri.positions[1];
new_tri.edges[1] = new_tri.positions[0] - new_tri.positions[2];
new_tri.edges[2] = new_tri.positions[1] - new_tri.positions[0];
float4 ortho_v = cross(new_tri.edges[0], new_tri.edges[1]);
if (f32_eq(vec3_len(ortho_v), 0.f)) {
new_tri.normal = (float4){0.f, 0.f, 1.f, 0.f};
} else {
new_tri.normal = vec3_normalize(ortho_v);
}
return new_tri;
}
I thought I'd have to add a function to do a cross product of my edges, but just checking the z of the computed normal did the trick:
{
p_triangle t1 = init_triangle(vertices, colors);
if (t1.normal.z < 0) {
goto triangle_end;
}
bounds_t tri_bounds = triangle_bounds(t1);
raster_triangle_scaled(actx, arena, t1, tri_bounds, bitmap, pitch, scale);
}
Just this one line t1.normal.z < 0 that I added and I got back-face culling basically for free.

Also, I should mention I prefer explicitly assigning to manually enumerated indexes in code like this instead of a for loop as it avoids a dependency on an iterator variable although it may be possible that compilers can recognize it and optimize the for loop away, but I'm not sure. If this were zig I could have a compile time loop, but it's C which is more to my liking overall.