Generated code is correct.
One register is used for loop counter (r3) and another one is a write dest (r0 = p[i]).
The instruction you marked as buggy is a so-called post-indexed addressing - r0 is automagically incremented by #1 after write (*p++ = data).