It's wrong because Z-buffering involves pixel data comparisons, not polygon comparisons. What's described above is closer to hidden surface removal, which operates by comparing surface normals via the use of simple trig functions (although angle-based lookup tables can speed things up) so that surfaces facing away from the direction of view are not drawn into the z-buffer at all. One of the main advantages of Z-buffering is that one does not need to sort the polygons in the scene, which is very useful when scene complexity is high. However, a degree of sorting is sometimes used at the object-level when parts of the scene have transparent properties.
A simple description of z-buffering might be this: imagine, for each pixel on the screen, a ray being 'fired' into the scene. There are two possibilties: the ray will either hit the 'background' (eg. sky) or it will hit an object in the scene. Whatever the case, one seeks the colour of the object that is struck and that is the colour which should be displayed on the screen.
First, the polygons that make up the scene are scan converted, in no particular order, into lines and finally points. Then, each point is compared to the 'current' colour for a particular pixel location. If the point being tested is closer than the point referred to by the colour & position data currently in the z-buffer, then the new colour and position data replace the old, giving a new colour for that point. In this way, one eventually ends up with a colour that represents a point on the object which is 'closest' to one's viewpoint. In practice, it's not quite as simple as this. For example, colours are sometimes combined, not replaced. A typical example is transparency effects: if the point being tested is part of a transparent object, then the Alpha value from the RGBA data is used to determine how the RGB colour value (eg. perhaps that of a prized crystal orb) is combined with the colour currently in the z-buffer (eg. a recessed dark brown wall cavity in Zelda).
Here's a detailed description of how z-buffering works; this is slightly reworded information from Foley, vanDam, Feiner & Hughes:
Z-buffering uses a frame buffer (memory) with a colour value for each pixel and a z-buffer, with the same number of entries, in which a z-value is stored for each pixel. The z-buffer is initialised to zero, representing the z-value at the back clipping plane, and the frame buffer is initialised to the background colour. The largest value that can be stored in the z-buffer represents the z of the front clipping plane. Polygons are scan converted into the frame buffer in arbitrary order. During the scan-conversion process, if the polygon being scan converted at (x,y) is no farther from the viewer than is the point whose colour and depth are currently in the buffers, then the new point's color and depth replace the old values.
Note that z-buffering is greatly helped when hidden surface removal is also used (speeds up many tasks by a factor of two). It's also important to note that z-buffering does not require that objects be polygon-based. As long as a shade and z-value can be calculated for each projected point, then z-buffering can be used, no matter how the objects are actually represented (eg. RGB volumes).
A common sneaky method is to scan convert the display in strips, so that only enough z-buffer memory for the strip being processed is required, at the expense of performing multiple passes through the objects.
Note: an example where a 16bit z-buffer isn't good enough is a situation in which a 3D world is represented to millimeter accuracy, yet one has objects placed more than a kilometer apart. For such problems, a 32bit z-buffer must be used.