ASM & two pixels at a time

I'm still messing around with optomizing some code using assembly...
And I've seen how you can blit 2 pixels at a time to increase the speed, but what if the situation is where you can't grab a pair of pixels at once?
If I did it two at a time, I would have something like this
calc first pixel
grab 1 pixel
calc second pixel
grab 1 pixel
shift and or pixels together
put 2 pixels to screen
Ok, so as you can see, I would sill have the same number of grabs, one less put, but I'd have to shift and or the pixels together.
Would the overhead of a write to memory worth the effort with the other things considered? I guess it could depend on the caching as well, which I don't understand fully.
I guess I could try it and bench mark it, but currently, I'm using all the registers, so it's not as simple as adding a couple of lines.
Thanks for anyone's opinions...
And I've seen how you can blit 2 pixels at a time to increase the speed, but what if the situation is where you can't grab a pair of pixels at once?
If I did it two at a time, I would have something like this
calc first pixel
grab 1 pixel
calc second pixel
grab 1 pixel
shift and or pixels together
put 2 pixels to screen
Ok, so as you can see, I would sill have the same number of grabs, one less put, but I'd have to shift and or the pixels together.
Would the overhead of a write to memory worth the effort with the other things considered? I guess it could depend on the caching as well, which I don't understand fully.
I guess I could try it and bench mark it, but currently, I'm using all the registers, so it's not as simple as adding a couple of lines.
Thanks for anyone's opinions...