I can reproduce a performance difference in quadissueperf:
r148: 205k
r149: 205k
r150: 172k
This points to a change in r150, not r152. I filed an issue to investigate.
I cannot explain any performance difference in fillrateperf at all. That test is completely different and is solely bottlenecked on the GPU hardware's memory bandwidth. So it shouldn't change much no matter what we do to the engine. Somehow I could measure an improvement in r150 with fillrateperf:
r149: 2480
r150: 2792
This still doesn't make any sense, so my best guess is that test is actually fairly high variance and so the results should not be considered accurate. Maybe it depends on the hardware temperature and power state or something.