Asynchronous rendering allows the GPU to perform work while the CPU performs other work, and is significantly faster than lockstep immediate rendering. By reusing existing render targets we can see a performance improvement of up to 500%, while still doing the same things.
The new libOBS API allows us to directly access the underlying API instead of having to mess around in memory. By using it we can avoid crashing in case the compiler for it is different, or in case the actual back end structure changes.
Additionally the mostly unimplemented and unused options have also been removed, which streamlines the use of this class even further and reduces both shader and code complexity.
Finally by optimizing the use of the internal render target we can achieve a speed up of up to 3000% over the old way, allowing for many more mipmapped filters.
Q_INIT_RESOURCE and Q_CLEANUP_RESOURCE can't be called from within a namespace and instead have to be in outside of the namespace, so by moving them into small inline functions we can fulfill this restriction.
Related: #192#155
* #153#167#178 Update localization.
* #157 Fix a few example shaders and add some new ones.
* #160 Fix Look Ahead setting
* #162 Increase Direct3D11 texture eviction priority.
* #163 Asynchronous Nvidia Face Tracking.
* #164 Add clang-tidy support to CMake.
* #165 Fix some clang-10 errors.
* #168 Updated StreamFX logo.
* #169 Redesigned version string generation.
* #170 Fix compiler error with C++17 when using libobs.
* #171 Add global configuration handler.
* #172 Move Windows exclusive library handler into its own file.
* #174 Refactor everything into the streamfx:: namespace.
* #176 Implement a new UI/UX experience for StreamFX users.
* #177 Link to stdc++fs on GNU and Clang.
* #179 Fix incorrect render sizes and performance with gfx::shader.
* #181#183 Fix locale strings missing info or no longer being used.
* #182 Add a default path for gfx::shader based on the selected file or the examples directory.
Crowding commits each language in its own commit, instead of merging multiple into one commit. This results in very spammy builds, to the point of several hundred being spawned in the same second.
Fixes rendering at unexpected sizes by first rendering to a render target and then rendering the contents of that render target to the frame buffer instead. This also prevent rendering twice or more, which might cause severe FPS impact.
Implements support for various new UI features that weren't possible up until now, such as an 'About StreamFX' window with a thank you to everyone that supported the project up until now.
Adds supports for running clang-tidy from within CMake, if the Clang toolset was found. This feature is experimental, but should work on many compilers, as it relies on generated compile_commands.json which are fully generated by the clang subproject. Using clang-tidy we can find hidden bugs that other static analyzers do not detect, or compilers don't even bother throwing an error for.
Through converting the code to a threaded asynchronous approach, the libOBS video renderer no longer has to wait on our tracking code to run, and we can enjoy a little bit of extra calculation time before we actually have to do anything.
However due to the remaining synchronization with the Direct3D11/OpenGL context, it is not entirely safe to spend a full frame tracking as libOBS will then start skipped/dropping frames. Even though the priority of the stream is now increased, this still means that we can't just sit around and have to quickly finish all work.
Related #150
Load additional functions from CUDA and add new enumerations to support them:
* cuDevicePrimaryCtxSetFlags allows us to sched scheduling mode for the GPU.
* cuCtxgetStreamPriorityRange allows us to check which priority levels are supported.
* cuStreamCreateWithPriority allows us to create streams with non-default priority.
The scheduler mode is now set to yield so that other threads can do work when we hit an eventual stalling problem. Streams can also now be created with higher priority and different flags, if necessary. In most cases this should allow CUDA resources to execute even while the GPU is under heavy load.
* Fixed 'Pixelator's color transition point being off-center and uncontrollable.
* Fixed 'Drunk' filter not working at all.
* Added an inverted mode to 'Luma Burn'.
* Added exponential Luma to 'Luma Burn'.
* Fixed odd color behavior in the 'Color Shift' transition by switching out HSL with HSV.
* Added a new 'Sliding Bars' transition shader, for an example of it see this clip: https://clips.twitch.tv/RacyEndearingHorseradishAMPTropPunch .