Although I think the 386's BSR/BSF instructions might be useful... CISC instructions though, so they might not even be faster than a discrete instruction version.
Anyway it's probably easier this way:
MOV CX(ECX),16 (32)
MOV COUNTER,0
BEGIN:
SHR AX (EAX),01
;this is what I changed:
jnc zerobit
INC COUNTER
ZEROBIT:
LOOP BEGIN
TEST COUNTER,01
Why? Because shr/shl will automatically load the discarded bit into the CARRY flag. So what you now do is, you slowly load each bit into the CARRY flag using SHR. Then each time through you check if the CARRY flag is set or reset, if it's set increment the counter.
"Information has a tendency to be free. Which means someone will always tell you something you don't want to know."