What the blog post describes is still the case.
It's not guaranteed to be faster to check for collision first, but I would suspect it usually is. If you have thousands of spread-out instances and a series of conditions that are costly to check followed by an overlap check, it will likely be much faster to do the overlap check first. However if you have thousands of instances all in the same place (hence in the same collision cell) and a cheap condition that can quickly filter them down to just a few, then it's probably a bit faster to do the collision check last. However the first case will potentially make a big difference and the latter case probably only a small difference, hence the advice to prefer the former. As ever the only way to get the correct answer on performance questions is to measure it.