If you are modifying the whole surface, for very fast performance, rather than calling Lock/Unlock, you might want to try modifying the PF sources and add a function to return a BackBuffer pointer.
Calling Lock/Unlock copies the display to a tempery buffer, and back again on calling Unlock.
The memory copying alone is a performance crippler!
Hope this helps.