That's correct, and there's often a comment at the beginning or end of the voice list stating that the actual number of notes that can be played at once depends on how many elements are used per voice, as well as whether 2 voices are layered together, etc.
The "maximum polyphony" represents how many inidividual tone generators the instrument has in all, and each tone generator can play 1 sound sample at any one time. Each voice element is a separate sound sample, so playing a voice that uses 2 elements is just like layering 2 single-element voices together.
So if you were playing only with voices that use 4 elements each, the maximum polyphony would essentially be reduced to 128/4=32 notes. Of course, most of the time you'll be playing with voices that use different numbers of elements, and some voices will be used for playing more notes than others-- e.g., a clarinet voice that's playing a monophonic line of melody, versus a piano voice that's playing a polyphonic part-- so the calculation isn't that simplistic.