FFmpeg

mirror of https://git.ffmpeg.org/ffmpeg.git synced 2024-10-19 21:13:25 +00:00

Author	SHA1	Message	Date
Lynne	3e3d46309b	lavu/vulkan: remove unused field from the execution pool structure	2023-07-21 20:04:21 +02:00
Lynne	97890c2b55	lavu/vulkan: remove threadsafe buffer index load and fix a signed overflow It's not needed anymore.	2023-07-21 20:04:20 +02:00
Jan Beich	e6bd8b1323	hwcontext_vulkan: hide Linux-only header after `571756bf2f` major/minor are in <sys/types.h> on BSDs and <sys/mkdev.h> on Solaris-like. libavutil/hwcontext_vulkan.c:55:10: fatal error: 'sys/sysmacros.h' file not found #include <sys/sysmacros.h> ^~~~~~~~~~~~~~~~~	2023-07-21 20:04:10 +02:00
Rémi Denis-Courmont	29b9d616c2	lavu/float_dsp: rework RISC-V V scalar product 1) Take the reductive sum out of the loop, leaving a regular vector addition in the loop. 2) Merge the addition and the multiplication. 3) Unroll. Before: scalarproduct_float_rvv_f32: 832.5 After: scalarproduct_float_rvv_f32: 275.2	2023-07-20 22:54:34 +03:00
Rémi Denis-Courmont	b710f881ce	lavu/float_dsp: unroll RISC-V V loops butterflies_float_c: 1057.0 butterflies_float_rvv_f32: 351.0 (before) butterflies_float_rvv_f32: 329.5 (after) vector_dmac_scalar_c: 819.0 vector_dmac_scalar_rvv_f64: 670.5 (before) vector_dmac_scalar_rvv_f64: 431.0 (after) vector_dmul_c: 800.2 vector_dmul_rvv_f64: 541.5 (before) vector_dmul_rvv_f64: 426.0 (after) vector_dmul_scalar_c: 545.7 vector_dmul_scalar_rvv_f64: 670.7 (before) vector_dmul_scalar_rvv_f64: 324.7 (after) vector_fmac_scalar_c: 804.5 vector_fmac_scalar_rvv_f32: 412.7 (before) vector_fmac_scalar_rvv_f32: 214.5 (after) vector_fmul_c: 811.2 vector_fmul_rvv_f32: 285.7 (before) vector_fmul_rvv_f32: 214.2 (after) vector_fmul_add_c: 1313.0 vector_fmul_add_rvv_f32: 349.0 (before) vector_fmul_add_rvv_f32: 290.2 (after) vector_fmul_reverse_c: 815.7 vector_fmul_reverse_rvv_f32: 529.2 (before) vector_fmul_reverse_rvv_f32: 515.7 (after) vector_fmul_scalar_c: 546.0 vector_fmul_scalar_rvv_f32: 350.2 (before) vector_fmul_scalar_rvv_f32: 169.5 (after)	2023-07-20 22:54:34 +03:00
Rémi Denis-Courmont	b6585eb04c	lavu: add/use flag for RISC-V Zba extension The code was blindly assuming that Zbb or V implied Zba. While the earlier is practically always true, the later broke some QEMU setups, as V was introduced earlier than Zba.	2023-07-19 19:29:35 +03:00
Zhao Zhili	3af0dc6d05	avutil/bfin: remove dead code The code is unused for a decade since `bf6c84d7eb`. Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>	2023-07-19 22:42:08 +08:00
Rémi Denis-Courmont	3d79afbe70	lavu/fixed_dsp: unroll RISC-V V loop Before: butterflies_fixed_c: 804.7 butterflies_fixed_rvv_i32: 348.2 After: butterflies_fixed_rvv_i32: 308.7	2023-07-17 18:48:42 +03:00
Marton Balint	9a7f060c32	avutil/random_seed: turn off buffering when reading from random Signed-off-by: Marton Balint <cus@passwd.hu>	2023-07-16 11:48:31 +02:00
Rémi Denis-Courmont	f032234953	aarch64: remove VFP feature check This is not actually used for anything. The configure check causes the CPU feature flag to be set, but nothing consumes it at all. While AArch64 does have VFP, it is only used for the scalar C code. Conversely, it is still possible to disable VFP, by changing the C compiler flags as before (though that only makes sense for an hypothetical non-standard Armv8 platform without VFP). Note that this retains the "vfp" option flag, for backward compatibility and on the very remote but theoretically possible chance that FFmpeg actually makes use of it in the future. AV_CPU_FLAG_VFP is retained as it is actually used by AArch32.	2023-07-15 22:56:30 +03:00
Nicolas George	ca9ec4e7ed	lavu/avassert: include config.h Fix setting the assert level.	2023-07-12 15:35:37 +02:00
Anton Khirnov	551a9af5a1	lavu/tests/cpu: stop processing the thread count Just print it as it is. Needed by the following commit.	2023-07-11 19:15:04 +02:00
Pavel Koshevoy	0056d9f176	avutil: fix build failure on osx 10.4 libavutil/random_seed.c calls arc4random_buf which is not available on OSX 10.4 Tiger, but the configuration script tests for arc4random which is available. Fix the configuration test to match the actual API used. Co-authored-by: James Almer <jamrial@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>	2023-07-08 14:51:15 -03:00
James Almer	b7f4d5fa7e	avutil/random_seed: add support for gcrypt and OpenSSL as source of randomness Reviewed-by: Anton Khirnov <anton@khirnov.net> Signed-off-by: James Almer <jamrial@gmail.com>	2023-07-06 15:30:27 -03:00
Philip Langdale	0e7fa8b3ca	avutil/random_seed: include stddef.h The new function uses size_t, which has to be defined.	2023-07-05 10:25:12 -07:00
James Almer	d694c25b44	avutil/random_seed: add av_random_bytes() Uses the existing code for av_get_random_seed() to return a buffer with cryptographically secure random data, or an error if none could be generated. Signed-off-by: James Almer <jamrial@gmail.com>	2023-07-05 10:06:05 -03:00
James Almer	7a1128ca07	avutil/random_seed: use fread() in read_random() This ensures the requested amount of bytes is read. Also remove /dev/random as it's no longer necessary. Signed-off-by: James Almer <jamrial@gmail.com>	2023-07-05 08:58:50 -03:00
Tong Wu	d51b0580e4	lavu/hwcontext_qsv: fix memory leak for d3d9 impl Signed-off-by: Tong Wu <tong1.wu@intel.com>	2023-06-25 10:01:51 +08:00
Tong Wu	8ea31f694a	lavu/hwcontext_qsv: fix memory leak for d3d11va impl Signed-off-by: Tong Wu <tong1.wu@intel.com>	2023-06-25 10:01:51 +08:00
Tong Wu	28ed898ac6	avutil/hwcontext_qsv: register free function for device_derive When qsv device is created by device_derive, the ctx->free function is not registered, causing potential memory leak because of not properly closing the MFX session. Signed-off-by: Tong Wu <tong1.wu@intel.com> Signed-off-by: Wenbin Chen <wenbin.chen@intel.com>	2023-06-25 10:01:51 +08:00
Michael Niedermayer	4aa1a42a91	avutil/softfloat: Basic documentation for av_sincos_sf() Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2023-06-23 02:06:46 +02:00
Michael Niedermayer	d84677abd8	avutil/softfloat: fix av_sincos_sf() Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2023-06-23 02:06:46 +02:00
Lynne	d0f1d937fe	hwcontext_vulkan: free temporary array once unneeded Fixes a small memory leak. This also prevents leaks on malloc/mutex init errors.	2023-06-15 22:00:41 +02:00
Lynne	b4d5baa8b0	hwcontext_vulkan: call ff_vk_uninit() on device uninit This fixes three memory leaks from ff_vk_load_props().	2023-06-15 22:00:41 +02:00
Philip Langdale	41be6a5593	lavu/hwcontext_cuda: declare support for rgb32/bgr32 nvenc declares support for these formats, but if hwcontext_cuda doesn't do that as well, then it's not possible to hwupload them for use in a possible cuda pipeline before encoding.	2023-06-15 12:29:52 -07:00
Martin Storsjö	d78bffbf3d	libavutil: Add version bump for new aarch64 cpu flags This was missed in `397cb623c8`. Signed-off-by: Martin Storsjö <martin@martin.st>	2023-06-10 00:21:58 +03:00
Lynne	eff565dc19	hwcontext_vulkan: tune execution pools Having less in-flight resources is better in this case.	2023-06-07 23:59:17 +02:00
Lynne	5f1be341c2	vulkan: discard dependencies when explicitly waiting for execution This reduces memory needed dramatically, as unneeded resources can be immediately returned to the pool. Although waitforfences is threadsafe, we add a mutex wait around it, as the mutex fence in combination with waitforfences assures us that no other thread will reset the fence in the meanwhile whilst the mutex is locked. This allows is to call ff_vk_exec_discard_deps.	2023-06-07 23:59:16 +02:00
Lynne	975cd48bb3	vulkan: synchronize access to execution pool fences vkResetFences is specified as being user-synchronized (yet vkWaitFences, is not).	2023-06-07 23:59:16 +02:00
Martin Storsjö	c76643021e	aarch64: Add Windows runtime detection of the dotprod instructions For Windows, there's no publicly defined constant for checking for the i8mm extension yet. Signed-off-by: Martin Storsjö <martin@martin.st>	2023-06-06 12:50:15 +03:00
Martin Storsjö	9b0052200a	aarch64: Add Apple runtime detection of dotprod and i8mm using sysctl For now, there's not much value in this since Clang don't support enabling the dotprod or i8mm features with either .arch_extension or .arch (it has to be enabled by the base arch flags passed to the compiler). But it may be supported in the future. Signed-off-by: Martin Storsjö <martin@martin.st>	2023-06-06 12:41:20 +03:00
Martin Storsjö	493fcde50a	aarch64: Add Linux runtime cpu feature detection using HWCAP_CPUID Based partially on code by Janne Grunau. Signed-off-by: Martin Storsjö <martin@martin.st>	2023-06-06 12:40:57 +03:00
Martin Storsjö	397cb623c8	aarch64: Add cpu flags for the dotprod and i8mm extensions Set these available if they are available unconditionally for the compiler. Signed-off-by: Martin Storsjö <martin@martin.st>	2023-06-06 12:40:42 +03:00
Martin Storsjö	fb1b88af77	configure: aarch64: Support assembling the dotprod and i8mm arch extensions These are available since ARMv8.4-a and ARMv8.6-a respectively, but can also be available optionally since ARMv8.2-a. Check if ".arch armv8.2-a" and ".arch_extension {dotprod,i8mm}" are supported, and check if the instructions can be assembled. Current clang versions fail to support the dotprod and i8mm features in the .arch_extension directive, but do support them if enabled with -march=armv8.4-a on the command line. (Curiously, lowering the arch level with ".arch armv8.2-a" doesn't make the extensions unavailable if they were enabled with -march; if that changes, Clang should also learn to support these extensions via .arch_extension for them to remain usable here.) Signed-off-by: Martin Storsjö <martin@martin.st>	2023-06-06 12:40:26 +03:00
Philip Langdale	378fb40282	avutil/hwcontext_vulkan: disable multiplane when deriving from cuda Today, cuda is not able to import multiplane images, and cuda requires images to be imported whether you trying to import to cuda or export from cuda (in the later case, the image is imported and then copied into on the cuda side). So any interop between cuda and vulkan requires that multiplane be disabled. The existing option for this is not sufficient, because when deriving devices it is not possible to specify any options. And, it is necessary to derive the Vulkan device, because any pipeline that involves uploading from cuda to vulkan and then back to cuda must use the same cuda context on both sides, and the only way to propagate the cuda context all the way through is to derive the device at each stage. ie: -vf hwupload=derive_device=vulkan,<filters>,hwupload=derive_device=cuda	2023-06-03 16:29:38 -07:00
Lynne	58f82fc26a	vulkan: replace usage of %lu with %"SIZE_SPECIFIER"	2023-05-29 03:22:58 +02:00
Michael Niedermayer	75918016ab	Move bessel_i0() from swresample/resample to avutil/mathematics 0th order modified bessel function of the first kind are used in multiple places, lets avoid having 3+ different implementations I picked this one as its accurate and quite fast, it can be replaced if a better one is found Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2023-05-29 00:45:28 +02:00
Lynne	db1d022781	APIchanges: add hwcontext_vulkan changes and bump lavu minor	2023-05-29 00:42:02 +02:00
Lynne	bef86ba86c	APIchanges: add new pixel formats supported and bump lavu minor	2023-05-29 00:42:02 +02:00
Lynne	160a415e22	lavfi: add nlmeans_vulkan filter	2023-05-29 00:42:01 +02:00
Lynne	dfff3877b7	vulkan: add support for the atomic float ops extension	2023-05-29 00:42:01 +02:00
Lynne	77478f6793	av1dec: add Vulkan hwaccel	2023-05-29 00:42:00 +02:00
Niklas Haas	9675e54b02	avutil/hwcontext_vulkan: add libplacebo required features For compatibility with vf_libplacebo	2023-05-29 00:41:55 +02:00
Lynne	05ce6473ac	lavfi: add lavfi-only Vulkan infrastructure	2023-05-29 00:41:51 +02:00
Lynne	51b7fe81be	hwcontext_vulkan: enable additional device properties	2023-05-29 00:41:51 +02:00
Lynne	33fc919bb7	hwcontext_vulkan: remove duplicate code, port to use generic vulkan utils The temporary AVFrame on staack enables us to use the common dependency/dispatch code in prepare_frame(). The prepare_frame() function is used for both frame initialization and frame import/export queue family transfer operations. In the former case, no AVFrame exists yet, so, as this is purely libavutil code, we create a temporary frame on stack. Otherwise, we'd need to allocate multiple frames somewhere, one for each possible command buffer dispatch.	2023-05-29 00:41:51 +02:00
Lynne	94e17a63a4	hwcontext_vulkan: don't change properties if prepare_frame fails	2023-05-29 00:41:50 +02:00
Lynne	32fc36ee61	hwcontext_vulkan: remove linear+host_visible "fast" path The idea was that it's faster to map linear images and copy them via regular memcpy. This is a very niche use, plus very inconsistently useful, as it would only really be faster on a few Intel GPUs. Even then, using the non-cached memcpy would've been better. Instead, scrap this code. Drivers are better at figuring out what copy to use, and if we're host-mapping, it should actually be just as fast, if not faster.	2023-05-29 00:41:50 +02:00
Lynne	48f85de0e7	hwcontext_vulkan: rewrite to support multiplane surfaces This commit adds proper handling of multiplane images throughout all of the hwcontext code. To avoid breakage of individual components, the change is performed as a single commit.	2023-05-29 00:41:49 +02:00
Lynne	a4d63b46d9	vulkan: make GLSL macro functions semicolumn-safe	2023-05-29 00:41:49 +02:00

1 2 3 4 5 ...

5905 Commits