After sorting the step stack so that dependencies can be popped before
their dependants are popped, there is still a situation left to handle
correctly:
Example:
A depends on:
B
C
D depends on:
E
F
They will be ordered like this:
A B C D E F
If there are 6+ cores, then all of them will be evaluated at once,
incorrectly evaluating A and D before their dependencies.
Starting evaluation of F and then E is correct, but waiting until they
are done is not correct because it should start working on B and C as
well.
This commit solves the problem by computing dependants in the dependency
loop checking logic, and then having workers queue up their dependants
when they finish their own work.