The high priority CUDA stream causes libOBS to be at a lower priority than the tracking, which is not what we want. Instead we want tracking to be incomplete in those cases, rather than slowing down encoding and other things.
Geometry updates are also now done once per frame instead of one per tracking update, which should improve the smoothness without affecting performance too much. Additionally all tracking info is now in the 0..1 range, which drastically simplifies some math - especially with texture coordinates.
To deal with tracking and updates being asynchronous, a very simple approximation of movement velocity has been added. This is mostly wrong, but it can bridge the gap where tracking updates are slower, as the values are all filtered anyway.
Adds a new CMake option "ENABLE_PROFILING" which enables all CPU and GPU performance profiling available in StreamFX for tracking what's actually causing things to be slow.
Asynchronous rendering allows the GPU to perform work while the CPU performs other work, and is significantly faster than lockstep immediate rendering. By reusing existing render targets we can see a performance improvement of up to 500%, while still doing the same things.
The new libOBS API allows us to directly access the underlying API instead of having to mess around in memory. By using it we can avoid crashing in case the compiler for it is different, or in case the actual back end structure changes.
Additionally the mostly unimplemented and unused options have also been removed, which streamlines the use of this class even further and reduces both shader and code complexity.
Finally by optimizing the use of the internal render target we can achieve a speed up of up to 3000% over the old way, allowing for many more mipmapped filters.
Q_INIT_RESOURCE and Q_CLEANUP_RESOURCE can't be called from within a namespace and instead have to be in outside of the namespace, so by moving them into small inline functions we can fulfill this restriction.
Related: #192#155
Fixes rendering at unexpected sizes by first rendering to a render target and then rendering the contents of that render target to the frame buffer instead. This also prevent rendering twice or more, which might cause severe FPS impact.
Implements support for various new UI features that weren't possible up until now, such as an 'About StreamFX' window with a thank you to everyone that supported the project up until now.
Through converting the code to a threaded asynchronous approach, the libOBS video renderer no longer has to wait on our tracking code to run, and we can enjoy a little bit of extra calculation time before we actually have to do anything.
However due to the remaining synchronization with the Direct3D11/OpenGL context, it is not entirely safe to spend a full frame tracking as libOBS will then start skipped/dropping frames. Even though the priority of the stream is now increased, this still means that we can't just sit around and have to quickly finish all work.
Related #150
Load additional functions from CUDA and add new enumerations to support them:
* cuDevicePrimaryCtxSetFlags allows us to sched scheduling mode for the GPU.
* cuCtxgetStreamPriorityRange allows us to check which priority levels are supported.
* cuStreamCreateWithPriority allows us to create streams with non-default priority.
The scheduler mode is now set to yield so that other threads can do work when we hit an eventual stalling problem. Streams can also now be created with higher priority and different flags, if necessary. In most cases this should allow CUDA resources to execute even while the GPU is under heavy load.
Previously sources had to manually implement migration code, which resulted in unresolvable regression issues due to the lack of version and commit tagging. With the new migration code, all sources automatically have this version and commit tagging at all times, and as such can now have a temporary regression fixed without the user needing to change any values manually.
As OBS Studio locks some mutexes in a different order depending on what actions are being done, using modified_properties for GPU work causes things to freeze in place. Instead have users manually click the refresh button when they changed files in order to prevent this freeze from happening.
Fixes: #118
With this, GCC 8 and above should now be able to compile the project both in obs-studio and as a standalone install. Some features are currently still not fully supported and require extra work, but the majority of things are supported and work out of the box. Exact feature parity can be looked up here on the wiki: https://github.com/Xaymar/obs-StreamFX/wiki/Platform-Feature-Parity
Related: #119#98#30