I think this will need some big code changes. But also seems the feature is very important, and can be reused for a number of "game enhancing" things. I suggest to start implementing it, It will take long (maybe) to be ready, but once is finished will be a great adition.
The change could be along these lines:
The server could have a buffer with a serverside "dem" (serverside dem's contains the status of all entities). The server can work in two modes: realtime... the normal mode, the client updates are based on the current state of the server; delayed (5 seconds), the server use the "dem" data to update the state of the clients.
Two clients could be in delayed mode, and will be watching the exact same timestamp frame. But his screens will be different, because the camera will be in different positions.