Commit Graph

6380 Commits

Author SHA1 Message Date
Niklas Haas
2a2e0aced2 avutil/dovi_meta: document static vs dynamic ext blocks 2024-08-16 11:48:02 +02:00
Lynne
604dfdb44c
hwcontext_vulkan: align host mapping size to minImportedHostPointerAlignment
This was left out of the recent rewrite of the system.
2024-08-16 01:22:16 +02:00
Lynne
18d964fc2c
vulkan: enable encoding of images if video_maintenance1 is enabled
Vulkan encoding was designed in a very... consolidated way.
You had to know the exact codec and profile that the image was going to
eventually be encoded as at... image creation time. Unfortunately, as good
as our code is, glimpsing into the exact future isn't what its capable of.

video_maintenance1 removed that requirement, which only then made encoding
images practically possible.
2024-08-16 01:22:16 +02:00
Lynne
46c13834b6
hwcontext_vulkan: enable VK_KHR_video_maintenance1
We require it for encoding.
2024-08-16 01:22:15 +02:00
Lynne
97e947a2a7
hwcontext_vulkan: setup extensions before features
The issue is that enabling features requires that the device
extension is supported. The extensions bitfield was set later,
so it was always 0, leading to no features being added.
2024-08-16 01:22:15 +02:00
Lynne
c3cbaf39bb
hwcontext_vulkan: don't enable deprecated VK_KHR_sampler_ycbcr_conversion extension
It was added to Vulkan 1.1 a long time ago.
Validation layer will warn if this is enabled.
2024-08-16 01:22:15 +02:00
Lynne
3f65d24075
hwcontext_vulkan: fix user layers, add support for different debug modes
The validation layer option only supported GPU-assisted validation.
This is mutually exclusive with shader debug printfs, so we need to
differentiate between the two.

This also fixes issues with user-given layers, and leaks in case of
errors.
2024-08-16 01:22:14 +02:00
gnattu
a1976e963f avutil/hwcontext_videotoolbox: silence warning for RGB
Hardware frames with RGB colorspace will not have a YCbCrMatrixKey.
Currently, it will spam the console with warning if rgb frame is
uploaded.

Signed-off-by: Gnattu OC <gnattuoc@me.com>
Reviewed-by: Marvin Scholz <epirat07@gmail.com>
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
2024-08-15 20:10:33 +08:00
Lynne
d138d7a595
vulkan: make sure descriptor buffers are always DEVICE_LOCAL
Implementations are required to list memory heaps in the most optimal
order. But its better to be explicit for this particular allocation.
2024-08-13 19:05:20 +02:00
Lynne
e25667f9f1
hwcontext_vulkan: ignore false positive validation errors
Issue ref:
https://github.com/KhronosGroup/Vulkan-ValidationLayers/issues/6627
2024-08-11 05:13:18 +02:00
Lynne
ef11a6456d
hwcontext_vulkan: do not chain structs of unsupported extensions in vkCreateDevice
Fixes:

vkCreateDevice(): pCreateInfo->pNext<VkPhysicalDeviceOpticalFlowFeaturesNV> includes a
pointer to a VkPhysicalDeviceOpticalFlowFeaturesNV, but when creating VkDevice, the
parent extension (VK_NV_optical_flow) was not included in ppEnabledExtensionNames.
The Vulkan spec states: Each pNext member of any structure (including this one) in
the pNext chain must be either NULL or a pointer to a valid struct for extending
VkDeviceCreateInfo.
2024-08-11 05:13:17 +02:00
Lynne
d6c08a41cb
vulkan: load queue families upon loading properties
Avoids the need to call ff_vk_qf_init if manually filling in
a queue family structure.
2024-08-11 05:13:16 +02:00
Lynne
0b25f0bc1d
hwcontext_vulkan: correct comment in header 2024-08-11 05:13:16 +02:00
Lynne
5f0f1f7b7a
libavutil: deprecate the old Vulkan queue API, add doc/APIchanges entries 2024-08-11 05:13:15 +02:00
Lynne
83cd77563f
vulkan: add support for encode feedback queries 2024-08-11 05:13:15 +02:00
Lynne
2ce0e51503
hwcontext_vulkan: add support for Vulkan encoding 2024-08-11 05:13:14 +02:00
Lynne
8eac11105b
vulkan: use allocator callback for buffer creation
This would've let to a segfault if custom allocators were used.
2024-08-11 05:13:13 +02:00
Lynne
55adcb4fc5
hwcontext_vulkan: add support for VK_EXT_shader_object
We'd like to use it eventually, and its already covered by
the minimum version of the headers we require.
2024-08-11 05:13:13 +02:00
Lynne
c19af16f8d
hwcontext_vulkan: enable storageBuffer16BitAccess if available 2024-08-11 05:13:12 +02:00
Lynne
957d34784a
hwcontext_vulkan: constify validation layer features table
The struct data seem to get corrupted otherwise.
Possibly a validation layer or libvulkan issue.
2024-08-11 05:13:11 +02:00
Lynne
9e606b33a8
hwcontext_vulkan: add HOST_CACHED flag to transfer buffer
Significantly speeds up downloads on devices without host mapping.
2024-08-11 05:13:11 +02:00
Lynne
aea4d4b423
hwcontext_vulkan: rewrite upload/download
This commit was long overdue. The old transfer dubiously tried to
merge as much code as possible, and had very little in the way
of optimizations, apart from basic host-mapping.

The new code uses buffer pools for any temporary bufflers, and
handles falling back to buffer-based uploads if host-mapping fails.

Roundtrip performance difference:
ffmpeg -init_hw_device "vulkan=vk:0,debug=0,disable_multiplane=1" -f lavfi \
-i color=red:s=3840x2160 -vf hwupload,hwdownload,format=yuv420p -f null -

7900XTX:
Before: 224fps
After: 502fps

Ada, with proprietary drivers:
Before: 29fps
After: 54fps

Alder Lake:
Before: 85fps
After: 108fps

With the host-mapping codepath disabled:
Before: 32fps
After: 51fps
2024-08-11 05:13:11 +02:00
Lynne
81c5d4ea0e
hwcontext_vulkan: remove unused struct 2024-08-11 05:13:10 +02:00
Lynne
a30b7c0158
hwcontext_vulkan: initialize optical flow queues if available
Lets us implement FPS conversion.
2024-08-11 05:13:10 +02:00
Lynne
8790a30882
hwcontext_vulkan: rewrite queue picking system for the new API
This allows us to support different video ops on different queues,
as well as any other arbitrary queues we need.
2024-08-11 05:13:09 +02:00
Lynne
bedfabc437
vulkan: use the new queue family mechanism 2024-08-11 05:13:09 +02:00
Lynne
13489c8a21
hwcontext_vulkan: add a new mechanism to expose used queue families
The issue with the old mechanism is that we had to introduce new
API each time we needed a new queue family, and all the queue families
were functionally fixed to a given purpose.

Nvidia's GPUs are able to handle video encoding and compute on the
same queue, which results in a speedup when pre-processing is required.

Also, this enables us to expose optical flow queues for frame interpolation.
2024-08-11 05:13:03 +02:00
Fei Wang
eab4a9e9f8 lavu/hwcontext_qsv: Use vendor id to create device
New kernel driver "xe" will be supported from Lunar Lake instead of
"i915".

"xe" kernel driver:
https://github.com/torvalds/linux/tree/master/drivers/gpu/drm/xe

Signed-off-by: Fei Wang <fei.w.wang@intel.com>
2024-08-09 13:40:26 +08:00
Fei Wang
dbd74ba3c8 lavu/hwcontext_vaapi: Add option to allow to specify vendor id when init hw device
Vendor id will help to select desired device in case of kernel driver is
unknow or unsupported, for vendor may support different kernel driver on
different platforms.

Signed-off-by: Fei Wang <fei.w.wang@intel.com>
2024-08-09 13:40:24 +08:00
James Almer
210740b4ed avutil/frame: use the maximum compile time supported alignment for strides
This puts lavu frame buffer allocator helpers in sync with lavc's decoder frame
buffer allocator's STRIDE_ALIGN define.

Remove the comment about av_cpu_max_align() while at it as using it is not
ideal when CPU flags can be changed mid process.

Should fix ticket #11116.

Signed-off-by: James Almer <jamrial@gmail.com>
2024-08-07 00:16:21 -03:00
Rémi Denis-Courmont
e0f9f4d491 lavu/cpu: deprecate RISC-V F, D and zba CPU flags 2024-08-05 21:16:26 +03:00
Rémi Denis-Courmont
d1326b6347 lavu/riscv: drop probing for zba CPU capability 2024-08-05 21:16:26 +03:00
Rémi Denis-Courmont
cb31f17ca8 lavu/riscv: depend on RVB and simplify accordingly 2024-08-05 21:16:26 +03:00
Nathan E. Egge
ba88e8174a lavu: Set default FF_TIMER_UNITS to "ns"
Signed-off-by: Nathan E. Egge <unlord@xiph.org>
Signed-off-by: Rémi Denis-Courmont <remi@remlab.net>
2024-08-05 21:16:26 +03:00
Gnattu OC
d50f9701b6 avutil/hwcontext_videotoolbox: Correctly set trc
The color trc key was assigned a color primaries value which causes
the resulting colorspace is always SDR.

Fixes #10884.

Signed-off-by: Gnattu OC <gnattuoc@me.com>
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
2024-08-02 10:24:09 +08:00
Rémi Denis-Courmont
1b2a925e94 lavc/riscv: drop probing for F & D extensions
F and D extensions are included in all RISC-V application profiles ever
made (so starting from RV64GC a.k.a. RVA20). Realistically they need to be
selected at compilation time.

Currently, there are no consumers for these two flags. If there is ever a
need to reintroduce F- or D-specific optimisations, we can always use
__riscv_f or __riscv_d compiler predefined macros respectively.
2024-08-01 22:56:50 +03:00
Rémi Denis-Courmont
54b1970c60 lavu/riscv: fix return type 2024-08-01 18:44:01 +03:00
James Almer
6f8e365a2a avutil/hwcontext_vaapi: use the correct type for VASurfaceAttribExternalBuffers.buffers
Should fix ticket #11115.

Signed-off-by: James Almer <jamrial@gmail.com>
2024-08-01 12:13:53 -03:00
Marvin Scholz
ca7fcf5089 avutil/hwcontext_videotoolbox: Fix build with older SDKs
The previous fix was not sufficient.
To make things easier to reason about, split the function and
add the guards there instead of complicating the call site more.

Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
2024-08-01 20:58:27 +08:00
Nathan E. Egge
8280ec7a32 lavu/riscv: Revert d808070, removing AV_READ_TIME
The implementation of ff_read_time() for RISC-V uses rdtime which has
 precision on existing hardware too low (!) for benchmarking purposes.
Deleting this implementation falls back on clock_gettime() which was
 added as the default ff_read_time() implementation in 33e4cc9.
Below are metrics gathered on SpacemiT K1, before and after this commit:

Before:

$ tests/checkasm/checkasm --bench
benchmarking with native FFmpeg timers
nop: 0.0
checkasm: using random seed 3473665261
checkasm: bench runs 1024 (1 << 10)
RVI:
 - pixblockdsp.get_pixels                [OK]
 - vc1dsp.mspel_pixels                   [OK]
RVF:
 - audiodsp.audiodsp                     [OK]
checkasm: all 4 tests passed
audiodsp.vector_clipf_c: 1388.7
audiodsp.vector_clipf_rvf: 261.5
get_pixels_c: 2.0
get_pixels_rvi: 1.5
vc1dsp.put_vc1_mspel_pixels_tab[0][0]_c: 8.0
vc1dsp.put_vc1_mspel_pixels_tab[0][0]_rvi: 1.0
vc1dsp.put_vc1_mspel_pixels_tab[1][0]_c: 2.0
vc1dsp.put_vc1_mspel_pixels_tab[1][0]_rvi: 0.5

After:

$ tests/checkasm/checkasm --bench
benchmarking with native FFmpeg timers
nop: 56.4
checkasm: using random seed 1021411603
checkasm: bench runs 1024 (1 << 10)
RVI:
 - pixblockdsp.get_pixels                [OK]
 - vc1dsp.mspel_pixels                   [OK]
RVF:
 - audiodsp.audiodsp                     [OK]
checkasm: all 4 tests passed
audiodsp.vector_clipf_c: 23236.4
audiodsp.vector_clipf_rvf: 11038.4
get_pixels_c: 79.6
get_pixels_rvi: 48.4
vc1dsp.put_vc1_mspel_pixels_tab[0][0]_c: 329.6
vc1dsp.put_vc1_mspel_pixels_tab[0][0]_rvi: 38.1
vc1dsp.put_vc1_mspel_pixels_tab[1][0]_c: 89.9
vc1dsp.put_vc1_mspel_pixels_tab[1][0]_rvi: 17.1

Signed-off-by: Rémi Denis-Courmont <remi@remlab.net>
2024-07-31 17:48:50 +03:00
Rémi Denis-Courmont
bd0c3edb13 lavu/riscv: count bytes rather than words for bswap32
This removes the dependency on Zba at essentially zero cost.
2024-07-30 18:41:51 +03:00
Fei Wang
79b4869959 lavu/hwcontext_qsv: Derive bind flag from frame type if no valid surface
Fix cmd:
ffmpeg.exe -init_hw_device d3d11va=d3d -init_hw_device qsv=qsv@d3d \
-filter_hw_device d3d -hwaccel qsv -hwaccel_output_format qsv      \
-i in.h264 -vf "hwmap,format=d3d11,hwdownload,format=nv12" -y out.yuv

Signed-off-by: Fei Wang <fei.w.wang@intel.com>
Tested-by: Tong Wu <wutong1208@outlook.com>
2024-07-30 13:41:15 +08:00
James Almer
9e7a93c6fd x86/intreadwrite: add SSE2 optimized AV_COPY128U
Signed-off-by: James Almer <jamrial@gmail.com>
2024-07-29 23:17:52 -03:00
James Almer
753f2aeed7 avutil/intreadwrite: add missing aligned read/write macros
Signed-off-by: James Almer <jamrial@gmail.com>
2024-07-29 21:33:31 -03:00
Rémi Denis-Courmont
39ced529b0 lavu/riscv: implement floating point clips
Unlike x86, fmin/fmax are single instructions, not function calls. They
are much much faster than doing a comparison, then branching based on its
results. With this, audiodsp.vector_clipf gets almost twice as fast, and
a properly unrollled version of it gets 4-5x faster, on SiFive-U74.
This is only the low-hanging fruit: FFMIN and FFMAX are presumably
affected as well.

This likely applies to other instruction sets with native IEEE floats,
especially those lacking a conditional select instruction.
2024-07-28 21:24:58 +03:00
Niklas Haas
cbea92c84d avutil/dovi_meta: add dv_md_compression to cfg record
This field is used to signal the compression method in use.
2024-07-28 12:20:07 +02:00
Rémi Denis-Courmont
a14d21a446 lavu/riscv: add forward-edge CFI landing pads 2024-07-25 23:10:14 +03:00
Rémi Denis-Courmont
6319601343 lavu/riscv: assembly for zicfilp LPAD
This instruction, if aligned on a 4-byte boundary, defines a valid target
("landing pad") for an indirect call or jump. Since this instruction is a
HINT, it is safe to assemble even if not included in the target
instruction set architecture.

The necessary alignment is already provided by the `func` macro. However
this still lacks the ELF attribute to indicate that the zicfilp is supported
in simple mode. This is left for future work as the ELF specification is not
ratified as of yet.

This will also nonobviously require the assembler to support zicfilp,
insofar as the `tail` pseudo-instruction shall clobber T2 (instead of T1) as
its temporary register.
2024-07-25 23:10:14 +03:00
Rémi Denis-Courmont
982376660c lavu/riscv: align functions to 4 bytes
Currently the start of the byte range for each function is aligned to
4 bytes. But this can lead to situations whence the function is preceded
by a 2-byte C.NOP at the aligned 4-byte boundary. Then the first actual
instruction and the function symbol are only aligned on 2 bytes.

This forcefully disables compression for the alignment and the symbol,
thus ensuring that there is no padding before the function.
2024-07-25 23:10:14 +03:00
Rémi Denis-Courmont
45d7078a21 lavu/riscv: add CPU flag for B bit manipulations
The B extension was finally ratified in May 2024, encompassing:
- Zba (addresses),
- Zbb (basics) and
- Zbs (single bits).
It does not include Zbc (base-2 polynomials).
2024-07-25 23:09:58 +03:00