Most of the online resources including Wikipedia on Dutch National Flag, which is a specific case of Quicksort algorithm for three categories of elements, could be could be slightly improved especially on presorted arrays by checking if the pointer to the top of lowest values does does not equal to the running pointer. There is a cost of the branch operation, however CPU pipelining might play to our advantage:
procedure three-way-partition(A : array of value, mid : value): i ← 0 j ← 0 n ← size of A - 1 while j ≤ n: if A[j] < mid: if j != i // Slight improvement. swap A[i] and A[j] i ← i + 1 j ← j + 1 else if A[j] > mid: swap A[j] and A[n] n ← n - 1 else: j ← j + 1
This is a good candidate for microbenchmarking, might get to it when I have more time.