The problem (in my eyes & opinion) in 'that' is the 'overlap' check with a family.
Speaking about the loops behind the scenes.
(i know you know all that, so excuse me for building the argument from nothing)
I you check an object overlapping a bunch of other objects, then you actualy start a loop (simular to 'for each) that evaluates each object in that bunch and ads/substracts that object (member of a family/instance) to/from the SOL.
So, first off, there is a loop. And using a family, that loop can get big. On top, if you check overlap between an instance/famely and another instance/famely, that loop is a nested loop. Evaluating 10 instances with 10 instances gives you allready an iteration of 100, just to check the overlaps.
Besides that, checking 1 overlap is also no more then a loop. It runs trough the boundarys and checks that with the other boundarys.
As a result, checking overlaps (but also collissions) is a time consuming thing, and in your case, it can even be slower then just z-order the whole darn layer.
What i suggested is a bit differend in some ways.
Pick a bunch of objects with a 'pick by comparing' based on distance loops also trough the whole family, but without the 'overlap loop'. It is fast. The filterd SOL is also done only once, You perform the same loop when you actual do the z-ordening, in probaly the same tick.
With only a few objects in the SOL, the z-ordening is awwsome fast. Even every tick should not have a big impact.
Hope i made my point, the general point, i have no idea if that is applicable in your project.