Code Readability FAIL
Fefe mentioned a change log entry of gzip 1.4:
gzip -d would fail with a CRC error for some valid inputs.
So far, the only valid input known to exhibit this failure was
compressed “from FAT filesystem (MS-DOS, OS/2, NT)”. In addition,
to trigger the failure, your memcpy implementation must copy in
the “reverse” order.
I was curious what kind of code leads to this kind of behavior. In gzips git-Repository, the fix is to be found here. The problem was that memcpy was called for overlapping regions of memory. The behavior of memcpy is not defined when source and target overlap (e.g.: source address > target address and memcpy walks the buffer from its end to its beginning).
What caught my attention was not the bug (this kind of bug is hard to spot and I would not blame anyone for missing it), it was the line right before the changed one:
n -= (e = (e = WSIZE - ((d &= WSIZE-1) > w ? d : w)) > n ? n : e);
There are four assignments in this expression, two of them using implicit operations, two ternary expressions, and two of the assignments change the same variable, using it as temporary storage! None of the four variables (d, e, n, w) involved has a name that describes
I disassembled the line:
d &= WSIZE-1; tmp = WSIZE - (d > w ? d : w); e = tmp > n ? n : tmp; n -= e;
I assume that the first line should convert something like 000100002 (1610) to a mask like 000011112. So, WSIZE better has a value that can be represented by 2x.
Let’s further simplify the line by using the functions max() and min():
d &= WSIZE-1; e = min(WSIZE - max(d, w), n); n -= e;
Changing the variable names to something meaningful is left as exercise. (Extra exercise: git blame)