While the previous approach of a static thread pool worked, it was sub-optimal in its resource usage. Many of the threads would never see a single task, and simply permanently sleep. This seems like a good idea, except that sleeping threads still end up in the scheduler, and thus waste a tiny amount of resources.
It is better to instead dynamically spawn threads when needed and only keeping the bare minimum around all the time. These dynamically spawned threads are also explicitly set to background priority which further reduces scheduling overhead. Finally optimizing the memory layout to prevent unwanted false sharing should also keep sporadic wake ups at a minimum.
This new model should be able to handle many more tasks than ever before, but is still not as optimal as it could be.
As the Alpha channel is completely ignored and possibly destroyed by denoising algorithms, we should restore the Alpha channel manually. Linear interpolation was chosen here as it will behave like Point if the size matches, and properly interpolate if the size doesn't match.
Fixes: #646