The difference is, this is "normal" geometry, with fixed 64x64 textures (or similar) mapped with hard edges (nearest-neighbour filtering, no bilinear/trilinear/blur).
Voxels is quite different: the world is made of cubes of 1 size or various size cubes (any texturing method can apply here, or without, some even use gradient shaders on the cubes).
So you get that pixel art vibe, mapped into the 3D world with smooth movements. The 3D window isn't made of stairs.
In this case, this is done "properly". Because some developers just use normal geometry, normal texturing with bilinear filtering, and slap a post processing shader on the camera view, which makes the view "blocky" without the world actually being blocky - which is cheap, and typically looks cheap.