Commit Graph

6341 Commits

Author SHA1 Message Date
Nathan E. Egge
8280ec7a32 lavu/riscv: Revert d808070, removing AV_READ_TIME
The implementation of ff_read_time() for RISC-V uses rdtime which has
 precision on existing hardware too low (!) for benchmarking purposes.
Deleting this implementation falls back on clock_gettime() which was
 added as the default ff_read_time() implementation in 33e4cc9.
Below are metrics gathered on SpacemiT K1, before and after this commit:

Before:

$ tests/checkasm/checkasm --bench
benchmarking with native FFmpeg timers
nop: 0.0
checkasm: using random seed 3473665261
checkasm: bench runs 1024 (1 << 10)
RVI:
 - pixblockdsp.get_pixels                [OK]
 - vc1dsp.mspel_pixels                   [OK]
RVF:
 - audiodsp.audiodsp                     [OK]
checkasm: all 4 tests passed
audiodsp.vector_clipf_c: 1388.7
audiodsp.vector_clipf_rvf: 261.5
get_pixels_c: 2.0
get_pixels_rvi: 1.5
vc1dsp.put_vc1_mspel_pixels_tab[0][0]_c: 8.0
vc1dsp.put_vc1_mspel_pixels_tab[0][0]_rvi: 1.0
vc1dsp.put_vc1_mspel_pixels_tab[1][0]_c: 2.0
vc1dsp.put_vc1_mspel_pixels_tab[1][0]_rvi: 0.5

After:

$ tests/checkasm/checkasm --bench
benchmarking with native FFmpeg timers
nop: 56.4
checkasm: using random seed 1021411603
checkasm: bench runs 1024 (1 << 10)
RVI:
 - pixblockdsp.get_pixels                [OK]
 - vc1dsp.mspel_pixels                   [OK]
RVF:
 - audiodsp.audiodsp                     [OK]
checkasm: all 4 tests passed
audiodsp.vector_clipf_c: 23236.4
audiodsp.vector_clipf_rvf: 11038.4
get_pixels_c: 79.6
get_pixels_rvi: 48.4
vc1dsp.put_vc1_mspel_pixels_tab[0][0]_c: 329.6
vc1dsp.put_vc1_mspel_pixels_tab[0][0]_rvi: 38.1
vc1dsp.put_vc1_mspel_pixels_tab[1][0]_c: 89.9
vc1dsp.put_vc1_mspel_pixels_tab[1][0]_rvi: 17.1

Signed-off-by: Rémi Denis-Courmont <remi@remlab.net>
2024-07-31 17:48:50 +03:00
Rémi Denis-Courmont
bd0c3edb13 lavu/riscv: count bytes rather than words for bswap32
This removes the dependency on Zba at essentially zero cost.
2024-07-30 18:41:51 +03:00
Fei Wang
79b4869959 lavu/hwcontext_qsv: Derive bind flag from frame type if no valid surface
Fix cmd:
ffmpeg.exe -init_hw_device d3d11va=d3d -init_hw_device qsv=qsv@d3d \
-filter_hw_device d3d -hwaccel qsv -hwaccel_output_format qsv      \
-i in.h264 -vf "hwmap,format=d3d11,hwdownload,format=nv12" -y out.yuv

Signed-off-by: Fei Wang <fei.w.wang@intel.com>
Tested-by: Tong Wu <wutong1208@outlook.com>
2024-07-30 13:41:15 +08:00
James Almer
9e7a93c6fd x86/intreadwrite: add SSE2 optimized AV_COPY128U
Signed-off-by: James Almer <jamrial@gmail.com>
2024-07-29 23:17:52 -03:00
James Almer
753f2aeed7 avutil/intreadwrite: add missing aligned read/write macros
Signed-off-by: James Almer <jamrial@gmail.com>
2024-07-29 21:33:31 -03:00
Rémi Denis-Courmont
39ced529b0 lavu/riscv: implement floating point clips
Unlike x86, fmin/fmax are single instructions, not function calls. They
are much much faster than doing a comparison, then branching based on its
results. With this, audiodsp.vector_clipf gets almost twice as fast, and
a properly unrollled version of it gets 4-5x faster, on SiFive-U74.
This is only the low-hanging fruit: FFMIN and FFMAX are presumably
affected as well.

This likely applies to other instruction sets with native IEEE floats,
especially those lacking a conditional select instruction.
2024-07-28 21:24:58 +03:00
Niklas Haas
cbea92c84d avutil/dovi_meta: add dv_md_compression to cfg record
This field is used to signal the compression method in use.
2024-07-28 12:20:07 +02:00
Rémi Denis-Courmont
a14d21a446 lavu/riscv: add forward-edge CFI landing pads 2024-07-25 23:10:14 +03:00
Rémi Denis-Courmont
6319601343 lavu/riscv: assembly for zicfilp LPAD
This instruction, if aligned on a 4-byte boundary, defines a valid target
("landing pad") for an indirect call or jump. Since this instruction is a
HINT, it is safe to assemble even if not included in the target
instruction set architecture.

The necessary alignment is already provided by the `func` macro. However
this still lacks the ELF attribute to indicate that the zicfilp is supported
in simple mode. This is left for future work as the ELF specification is not
ratified as of yet.

This will also nonobviously require the assembler to support zicfilp,
insofar as the `tail` pseudo-instruction shall clobber T2 (instead of T1) as
its temporary register.
2024-07-25 23:10:14 +03:00
Rémi Denis-Courmont
982376660c lavu/riscv: align functions to 4 bytes
Currently the start of the byte range for each function is aligned to
4 bytes. But this can lead to situations whence the function is preceded
by a 2-byte C.NOP at the aligned 4-byte boundary. Then the first actual
instruction and the function symbol are only aligned on 2 bytes.

This forcefully disables compression for the alignment and the symbol,
thus ensuring that there is no padding before the function.
2024-07-25 23:10:14 +03:00
Rémi Denis-Courmont
45d7078a21 lavu/riscv: add CPU flag for B bit manipulations
The B extension was finally ratified in May 2024, encompassing:
- Zba (addresses),
- Zbb (basics) and
- Zbs (single bits).
It does not include Zbc (base-2 polynomials).
2024-07-25 23:09:58 +03:00
Rémi Denis-Courmont
529d423012 lavu/riscv: remove bespoke SH{1,2,3}ADD assembler
configure checks that the assembler supports the B extension (or rather
its constituents) anyway. These macros were dodging sanity checks for
unsupported instructions and nothing else.
2024-07-25 18:55:48 +03:00
Rémi Denis-Courmont
5f10173fa1 lavu/riscv: require B or zba explicitly 2024-07-25 18:55:48 +03:00
Rémi Denis-Courmont
7f97344bfb lavu/riscv: grok B as an extension
The RISC-V B bit manipulation extension was ratified only two months ago.
But it is strictly equivalent to the union of the zba, zbb and zbs
extensions which were defined almost 3 years earlier. Rather than require
new assembler, we can just match the extension name manually and translate
it into its constituent parts.
2024-07-25 18:55:48 +03:00
Rémi Denis-Courmont
1e7ab200ee lavu/riscv: allow any number of extensions
This reworks the func/endfunc macros to support any number of ISA extension
as parameters.
2024-07-25 18:55:48 +03:00
Rémi Denis-Courmont
0e32192548 lavu/riscv: do not fallback to AT_HWCAP auxillary vector
If __riscv_hwprobe() fails, then the kernel version is presumably too
old. There is not much point falling back to the auxillary vector.

- The Linux kernel requires I, so the flag is always set on Linux, and
  run-time detection is unnecessary. Our RISC-V assembler does anyway not
  support targets without I.

- Linux can compile with or without F and D, but it cannot perform
  run-time detection for them (a kernel with F support will not boot a
  processor without F). The run-time detection is thus useless in that
  case. Besides F and D extensions are used throughout the C code, so
  their run-time detection would not be practical.

- Support for V was added in a later kernel version than riscv_hwprobe(),
  so the system call will always be available if the kernel supports V.
  The only exception would be vendor kernel forks, but those are known to
  haphasardly pretend to support V on systems without actual V support, or
  with only pre-ratification binary-incompatible version. Furthermore, a
  large chunk of our optimisations require Zba and/or Zbb which cannot be
  detected with HWCAP in those kernels.

For what it is worth, OpenJDK already took a similar action. Note that this
keeps AT_HWCAP usage for platforms with neither C run-time <sys/hwprobe.h>
nor kernel <asm/hwprobe.h>, notably kernels other than Linux.
2024-07-22 19:43:51 +03:00
Michael Niedermayer
23851c9ee0
avutil/slicethread: Check pthread_*_init() for failure
Fixes: CID1604383 Unchecked return value
Fixes: CID1604439 Unchecked return value

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-07-21 17:02:13 +02:00
Michael Niedermayer
15540b3d28
avutil/frame: Check log2_crop_align
Fixes: CID1604586 Overflowed constant

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-07-21 17:02:13 +02:00
Michael Niedermayer
82f5b20ff5
avutil/buffer: Check ff_mutex_init() for failure
Fixes: CID1604487 Unchecked return value
Fixes: CID1604494 Unchecked return value

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-07-21 17:02:13 +02:00
Michael Niedermayer
064bcda142
avutil/avsscanf: Remove dead code
Fixes: CID1604498 Structurally dead code

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-07-21 17:02:12 +02:00
Michael Niedermayer
d5ca373d7e
avutil/timecode: Use a 64bit framenum internally
Fixes: negation of -2147483648 cannot be represented in type 'int'; cast to an unsigned type to negate this value to itself
Fixes: 68550/clusterfuzz-testcase-minimized-ffmpeg_dem_MXF_fuzzer-6424065930756096

Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-07-21 15:29:25 +02:00
Zhao Zhili
e713a2d85d avutil/file_open: Fix build error with wasi
Don't assume tempnam is available when !HAVE_MKSTEMP. Check tempnam
explicitly in configure.

Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
2024-07-16 22:50:21 +08:00
Marvin Scholz
2fc37c4239 avutil/hwcontext_videotoolbox: Fix build with older SDKs
I've accidentally used API not available on the checked version.
Additionally check for the SDK to be new enough to even have the
CVImageBufferCreateColorSpaceFromAttachments API to not fail
the build.

Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
2024-07-16 19:51:45 +08:00
Lynne
80ddc72717
vulkan: rename read_only to singular
There's nothing stopping users from writing to such buffers.
Its more accurate to say they are singular, i.e. not duplicated
between multiple submissions.

This can be helpful for global statistics, or error propagation
purposes.
2024-07-14 18:33:56 +02:00
Lynne
e11087b162
vulkan: set VkDescriptorAddressInfoEXT.sType
This was not done, resulting in validation issues, and potential
driver issues.
2024-07-14 18:31:44 +02:00
Michael Niedermayer
ba63e32957
avutil/imgutils: av_image_check_size2() ensure width and height fit in 32bit
width and height > 32bit is not supported and its easier to check in a central place

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-07-12 22:16:05 +02:00
James Almer
70c6b904be x86/intreadwrite: add missing casts to pointer arguments
Should make strict compilers happy.

Also, make AV_COPY128 use integer operations while at it. Removing the
inclusion of immintrin.h ensures a lot less intrinsic related headers are
included as well, which fixes a clash of defines with some Clang versions.

Reviewed-by: Martin Storsjö <martin@martin.st>
Signed-off-by: James Almer <jamrial@gmail.com>
2024-07-11 18:24:26 -03:00
Zhao Zhili
906b883e7b avutil/executor: Fix stack overflow due to recursive call
av_executor_execute run the task directly when thread is disabled.
The task can schedule a new task by call av_executor_execute. This
forms an implicit recursive call. This patch removed the recursive
call.
2024-07-11 20:26:23 +08:00
Zhao Zhili
54f9469fa1 avutil/executor: Fix missing check before using mutex 2024-07-11 20:24:11 +08:00
James Almer
1a86a7a48d x86/intreadwrite: fix include of config.h
Should fix make checkheaders.

Signed-off-by: James Almer <jamrial@gmail.com>
2024-07-10 13:52:52 -03:00
James Almer
15056dd650 x86/intreadwrite.h: add missing preprocessor checks
Removed by accident in the previous commits. This makes the code only run when
compiled with GCC and Clang like before. Support for other compilers like msvc
can be added later.

Signed-off-by: James Almer <jamrial@gmail.com>
2024-07-10 13:49:21 -03:00
James Almer
bd1bcb07e0 x86/intreadwrite: use intrinsics instead of inline asm for AV_COPY128
This has the benefit of removing any SSE -> AVX penalty that may happen when
the compiler emits VEX encoded instructions.

Signed-off-by: James Almer <jamrial@gmail.com>
2024-07-10 13:25:44 -03:00
James Almer
4a04cca69a x86/intreadwrite: use intrinsics instead of inline asm for AV_ZERO128
When called inside a loop, the inline asm version results in one pxor
unnecessarely emitted per iteration, as the contents of the __asm__() block are
opaque to the compiler's instruction scheduler.
This is not the case with intrinsics, where pxor will be emitted once with any
half decent compiler.

This also has the benefit of removing any SSE -> AVX penalty that may happen
when the compiler emits VEX encoded instructions.

Signed-off-by: James Almer <jamrial@gmail.com>
2024-07-10 13:25:44 -03:00
Michael Niedermayer
e9e8bea2e7
avutil/wchar_filename: Correct sizeof
Fixes: CID1591930 Wrong sizeof argument

Sponsored-by: Sovereign Tech Fund
Reviewed-by: Steve Lhomme <robux4@ycbcr.xyz>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-07-10 18:10:10 +02:00
Michael Niedermayer
628ba061c8
avutil/hwcontext_d3d11va: correct sizeof IDirect3DSurface9
Fixes: CID1591944 Wrong sizeof argument

Sponsored-by: Sovereign Tech Fund
Reviewed-by: Steve Lhomme <robux4@ycbcr.xyz>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-07-10 18:10:09 +02:00
Michael Niedermayer
cf22f944d5
avutil/hwcontext_d3d11va: Free AVD3D11FrameDescriptor on error
Fixes: CID1598558 Resource leak

Sponsored-by: Sovereign Tech Fund
Reviewed-by: Steve Lhomme <robux4@ycbcr.xyz>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-07-10 18:10:09 +02:00
Michael Niedermayer
698ed0d5a5
avutil/hwcontext_d3d11va: correct sizeof AVD3D11FrameDescriptor
Fixes: CID1591909 Wrong sizeof argument

Sponsored-by: Sovereign Tech Fund
Reviewed-by: Steve Lhomme <robux4@ycbcr.xyz>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-07-10 18:10:09 +02:00
Zhao Zhili
85706f5136 avutil/hwcontext_videotoolbox: Fix version check
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
2024-07-09 21:39:09 +08:00
Marvin Scholz
cd9ceaef22 avutil/hwcontext_videotoolbox: Set CVBuffer CGColorSpace
In addition to the other properties, try to obtain the right
CGColorSpace and set it as well, else it could lead to a CVBuffer
tagged as BT.2020 but with a CGColorSpace indicating BT.709.

Therefore it is essential for consistency to set a colorspace
according to the other values, or if none can be obtained (for example
because the other values are all unspecified) unset it as well.

Fix #10884

Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
2024-07-05 19:13:43 +08:00
Marvin Scholz
b4f9fcc63c avutil/hwcontext_videotoolbox: Update documentation
The documentation was not clear at all what specifically the
function does, so it was left unspecified if it will unset or
not touch attachments it could not map from the AVFrame.

The documentation of the return  value was wrong as well.

Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
2024-07-05 19:13:43 +08:00
Marvin Scholz
1fa7554bd6 avutil/hwcontext_videotoolbox: Unset undefined values
When mapping AVFrame properties to the CVBuffer attachments, it is
necessary to properly delete undefined attachments, else we can
leave incorrect values in there guessed from VideoToolbox for
example, leading to inconsistent results where the AVFrame and
CVBuffer differ in metadata.

Ref #10884

Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
2024-07-05 19:13:43 +08:00
Tong Wu
d822146f4f avutil/hwcontext_d3d12va: add Flags for resource creation
Flags field is added to support diffferent resource creation.

Signed-off-by: Tong Wu <tong1.wu@intel.com>
2024-07-02 14:15:12 +02:00
Marton Balint
0d5e3f5a40 avutil/timestamp: avoid possible FPE when 0 is passed to av_ts_make_time_string2()
Signed-off-by: Marton Balint <cus@passwd.hu>
2024-06-30 09:11:44 +02:00
Rémi Denis-Courmont
d5e603ddc0 lavu/lls: remove useless VSETVL
This changes neither VL nor VTYPE, so it can safely be removed.
2024-06-29 21:03:44 +03:00
James Almer
8af0919cc6 avutil/stereo3d: add a Stereo3D view to signal that the view is unspecified
Signed-off-by: James Almer <jamrial@gmail.com>
2024-06-28 13:16:57 -03:00
James Almer
1c8b32e19f avutil/stereo3d: add a Stereo3D type to signal that the packing is unspecified
Given that a video stream/frame may have only one view or both views coded with
the packing information being unavailable, this commit adds a new type value
AV_STEREO3D_UNSPEC for this purpose.
The most common case for this is container level signaling of Stereo3D video
where the specifics are defined at the bitstream level.

Signed-off-by: James Almer <jamrial@gmail.com>
2024-06-28 13:16:57 -03:00
Zhao Zhili
baf3123c1c avutil/executor: Allowing thread_count be zero
Before the patch, disable threads support at configure/build time
was the only method to force zero thread in executor. However,
it's common practice for libavcodec to run on caller's thread when
user specify thread number to one. And for WASM environment, whether
threads are supported needs to be detected at runtime. So executor
should support zero thread at runtime.

A single thread executor can be useful, e.g., to handle network
protocol. So we can't take thread_count one as zero thread, which
disabled a valid usercase.

Other libraries take -threads 0 to mean auto. Executor as a low
level utils doesn't do cpu detect. So take thread_count zero as
zero thread, literally.

Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
2024-06-27 20:54:42 +08:00
J. Dekker
e61fed8280 avutil/riscv/cpu: fix __riscv_v_min_vlen typo
Signed-off-by: J. Dekker <jdek@itanimul.li>
2024-06-26 12:50:02 +02:00
Brad Smith
41190da9e1 aarch64: Add OpenBSD runtime detection of dotprod and i8mm using sysctl
Signed-off-by: Brad Smith <brad@comstyle.com>
2024-06-26 02:06:53 -04:00
James Almer
e6baf4f384 avutil/stereo3d: add a new allocator function that returns a size
av_stereo3d_alloc() is not useful in scenarios where you need to know the
runtime size of AVStereo3D.

Signed-off-by: James Almer <jamrial@gmail.com>
2024-06-25 00:01:05 -03:00