Yes this is possible.
As with all peer to peer networking with authoritative host, each peer will send only their input states to the host, such as mouse position and button inputs.
The host, upon receiving the input state from any peer, can determine what to do with the object.
For example, upon receiving a "click" and "position" from a peer, the host will go through the following process:
Conditions:
Where did the peer click? Is it on an object?
Is the peer holding the mouse down?
Where is the peer's mouse position moving to?
Action:
Move object to new location.
Using object syncing, the object's new position will be reflected to all peers.
You will have some additional considerations, such as "ownership" of an object, so multiple people can't manipulate the same object simultaneously, as well as local input prediction so the object's movement will be instantly reflected locally on a peer to hide lag.