Moves the menu for StreamFX to the primary menu, so that it is not hidden under tools. This makes it clearer to the user if their installation of StreamFX is working correctly, in addition to reducing the steps necessary to use the menu.
Also the 'About StreamFX' dialog now actually shows up for every update, as expected.
Adds support for enumerations, a different way of selecting how something should behave in a shader. Enumerations rely on a continuous list of values, and will automatically detect how many values there are in the enumeration. Only non-vector types are supported as enumeration entries, and array/vector parameters can have each member set to a different enumeration value.
Furthermore suffixes now are properly assigned, and 'bool' no longer causes shaders to stop rendering. Additionally by inlining some functions and using std::string_view we can achieve a slightly better performance than before.
Using std::string_view over std::string (and const std::string&) has the advantage that we skip potential temporary std::string objects that are immediately thrown away, thus slowing down the code. It can also be implicitly cast to std::string, which makes it compatible with existing code that uses std::string.
The high priority CUDA stream causes libOBS to be at a lower priority than the tracking, which is not what we want. Instead we want tracking to be incomplete in those cases, rather than slowing down encoding and other things.
Geometry updates are also now done once per frame instead of one per tracking update, which should improve the smoothness without affecting performance too much. Additionally all tracking info is now in the 0..1 range, which drastically simplifies some math - especially with texture coordinates.
To deal with tracking and updates being asynchronous, a very simple approximation of movement velocity has been added. This is mostly wrong, but it can bridge the gap where tracking updates are slower, as the values are all filtered anyway.
Adds a new CMake option "ENABLE_PROFILING" which enables all CPU and GPU performance profiling available in StreamFX for tracking what's actually causing things to be slow.
Asynchronous rendering allows the GPU to perform work while the CPU performs other work, and is significantly faster than lockstep immediate rendering. By reusing existing render targets we can see a performance improvement of up to 500%, while still doing the same things.
The new libOBS API allows us to directly access the underlying API instead of having to mess around in memory. By using it we can avoid crashing in case the compiler for it is different, or in case the actual back end structure changes.
Additionally the mostly unimplemented and unused options have also been removed, which streamlines the use of this class even further and reduces both shader and code complexity.
Finally by optimizing the use of the internal render target we can achieve a speed up of up to 3000% over the old way, allowing for many more mipmapped filters.
Q_INIT_RESOURCE and Q_CLEANUP_RESOURCE can't be called from within a namespace and instead have to be in outside of the namespace, so by moving them into small inline functions we can fulfill this restriction.
Related: #192#155
Fixes rendering at unexpected sizes by first rendering to a render target and then rendering the contents of that render target to the frame buffer instead. This also prevent rendering twice or more, which might cause severe FPS impact.
Implements support for various new UI features that weren't possible up until now, such as an 'About StreamFX' window with a thank you to everyone that supported the project up until now.
Through converting the code to a threaded asynchronous approach, the libOBS video renderer no longer has to wait on our tracking code to run, and we can enjoy a little bit of extra calculation time before we actually have to do anything.
However due to the remaining synchronization with the Direct3D11/OpenGL context, it is not entirely safe to spend a full frame tracking as libOBS will then start skipped/dropping frames. Even though the priority of the stream is now increased, this still means that we can't just sit around and have to quickly finish all work.
Related #150
Load additional functions from CUDA and add new enumerations to support them:
* cuDevicePrimaryCtxSetFlags allows us to sched scheduling mode for the GPU.
* cuCtxgetStreamPriorityRange allows us to check which priority levels are supported.
* cuStreamCreateWithPriority allows us to create streams with non-default priority.
The scheduler mode is now set to yield so that other threads can do work when we hit an eventual stalling problem. Streams can also now be created with higher priority and different flags, if necessary. In most cases this should allow CUDA resources to execute even while the GPU is under heavy load.