Commit Graph

115606 Commits

Author SHA1 Message Date
Rémi Denis-Courmont
a535ce2ac0 lavc/flacdsp: R-V Zvl256b lpc33
flac_lpc_33_13_c: 499.7
flac_lpc_33_13_rvv_i64: 197.7
flac_lpc_33_16_c: 601.5
flac_lpc_33_16_rvv_i64: 195.2
flac_lpc_33_29_c: 1011.5
flac_lpc_33_29_rvv_i64: 300.7
flac_lpc_33_32_c: 1099.0
flac_lpc_33_32_rvv_i64: 296.7
2024-05-27 22:07:29 +03:00
Rémi Denis-Courmont
5ebb071d79 lavc/vp8dsp: disable EPEL HV on RV128
RV128 is mostly scifi at this point, so we can just disable it here
(the EPEL HV prologue/epilogue do not save 128-bit registers).
2024-05-27 22:07:29 +03:00
Diego Felix de Souza
aead61451c avcodec/nvenc_av1: Correct CQ range for AV1
The Constant Quality (CQ) range for the AV1 codec is actually 0 to
63, contrary to what is stated in the header and documentation.

Signed-off-by: Diego Felix de Souza <ddesouza@nvidia.com>
Signed-off-by: Timo Rothenpieler <timo@rothenpieler.org>
2024-05-27 19:20:18 +02:00
Andreas Rheinhardt
41e1322845 tests/fate/source-check: Relax BSD licence check
Several files already had standard license header (namely
2-clause BSD files), yet due to the 80 char line length limit,
they were not treated as such by source-check.sh (which
fate-source uses). Therefore relax the BSD check.

Reviewed-by: Rémi Denis-Courmont <remi@remlab.net>
Reviewed-by: Pierre-Anthony Lemieux <pal@sandflow.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2024-05-27 19:04:09 +02:00
Andreas Rheinhardt
f1337e5dd9 doc/mips: Update list of files with MIPS copyright notice
E.g. the AAC stuff has been removed in
03cf101645.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2024-05-27 19:04:08 +02:00
Frank Plowman
49c3918c1a lavc/vvc: Validate temporal MVP references
Per VVCv3 p. 157, the collocated reference picture used in temporal
motion vector prediction must have RprConstraintsActiveFlag equal to
zero and the same CTU size as the current picture.  Add these checks,
fixing crashes decoding some fuzzed bitstreams.

Additionally, only set up the collocated reference picture if it is
actually going to be used (i.e. if ph_temporal_mvp_enabled_flag is 1),
else legal RPR bitstreams will fail the new checks.

Co-authored-by: Nuo Mi <nuomi2021@gmail.com>
Signed-off-by: Frank Plowman <post@frankplowman.com>
2024-05-27 20:24:21 +08:00
llyyr
2b11a8b95b
lavc/vp9: reset segmentation fields when segmentation isn't enabled
Fields under the segmentation switch are never reset on a new frame, and
retain the value from the previous frame. This bugs out a bunch of
hwaccel drivers when segmentation is disabled but update_map isn't
reset because they don't ignore values behind switches. This commit also
resets the temporal field, though it may not be required.

We also do this for vp8 [1] so this commit is just mirroring the vp8
logic.

This fixes an issue with certain samples [2] that causes blocky
artifacts with vaapi, d3d11va and cuda (and possibly others).
Mesa worked around [3] this by ignoring these fields if
segmentation.enabled is 0, but d3d11va still displays blocky artifacts.

[1] 2e877090f9:/libavcodec/vp8.c#l797
[2] https://github.com/mpv-player/mpv/issues/13533
[3] https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27816

Signed-off-by: llyyr <llyyr.public@gmail.com>
2024-05-27 12:23:40 +02:00
Fei Wang
01c7f68f7a lavc/qsvdec: Use coded_w/h for frame resolution when use system memory
Fix output mismatch when decode clip with crop(conf_win_*offset in
syntax) info by using system memory:

$ ffmpeg -c:v hevc_qsv -i conf_win_offet.bit -y out.yuv

Signed-off-by: Fei Wang <fei.w.wang@intel.com>
2024-05-27 09:38:46 +08:00
Fei Wang
1c56263704 lavc/qsvdec: Allow decoders to export crop information
Signed-off-by: Fei Wang <fei.w.wang@intel.com>
2024-05-27 09:38:46 +08:00
Haihao Xiang
a72e9aeabc lavc/qsvenc_av1: accept HDR metadata if have
The sdk av1 encoder can accept HDR metadata via mfxEncodeCtrl::ExtParam.

Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>
2024-05-27 09:38:46 +08:00
Haihao Xiang
473e84ad62 lavc/qsvdec: update HDR side data on output AVFrame for AV1 decoding
The SDK may provide HDR metadata for HDR streams via mfxExtBuffer
attached on output mfxFrameSurface1

Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>
2024-05-27 09:38:46 +08:00
Brad Smith
43b1a95678 configure: enable ffnvcodec, nvenc, nvdec for FreeBSD
Signed-off-by: Brad Smith <brad@comstyle.com>
2024-05-26 19:25:51 -04:00
Rémi Denis-Courmont
25a33665a0 lavc/vp8dsp: remove unused macro parameter 2024-05-26 19:20:48 +03:00
Rémi Denis-Courmont
728a1dd3b6 lavc/rv34dsp: remove stray load immediate 2024-05-26 19:20:45 +03:00
sunyuechi
63697d3350 lavc/vp8dsp: R-V V put_epel hv
C908:
vp8_put_epel4_h4v4_c: 20.0
vp8_put_epel4_h4v4_rvv_i32: 11.0
vp8_put_epel4_h4v6_c: 25.2
vp8_put_epel4_h4v6_rvv_i32: 13.5
vp8_put_epel4_h6v4_c: 22.2
vp8_put_epel4_h6v4_rvv_i32: 14.5
vp8_put_epel4_h6v6_c: 29.0
vp8_put_epel4_h6v6_rvv_i32: 15.7
vp8_put_epel8_h4v4_c: 73.0
vp8_put_epel8_h4v4_rvv_i32: 22.2
vp8_put_epel8_h4v6_c: 90.5
vp8_put_epel8_h4v6_rvv_i32: 26.7
vp8_put_epel8_h6v4_c: 85.0
vp8_put_epel8_h6v4_rvv_i32: 27.2
vp8_put_epel8_h6v6_c: 104.7
vp8_put_epel8_h6v6_rvv_i32: 29.5
vp8_put_epel16_h4v4_c: 145.5
vp8_put_epel16_h4v4_rvv_i32: 26.5
vp8_put_epel16_h4v6_c: 190.7
vp8_put_epel16_h4v6_rvv_i32: 47.5
vp8_put_epel16_h6v4_c: 173.7
vp8_put_epel16_h6v4_rvv_i32: 33.2
vp8_put_epel16_h6v6_c: 222.2
vp8_put_epel16_h6v6_rvv_i32: 35.5

Amended to disable unsupported RV128.

Signed-off-by: Rémi Denis-Courmont <remi@remlab.net>
2024-05-26 15:15:28 +03:00
Rémi Denis-Courmont
0b2316e37f lavc/sbrdsp: fix inverted boundary check
128-bit is the maximum, not the minimum here. Larger vector sizes can
result in reads past the end of the noise value table.

This partially reverts commit cdcb4b98b7.
2024-05-25 22:03:37 +03:00
Rémi Denis-Courmont
e6b38c944f lavc/sbrdsp: fix potential overflow in noise table
Since the SBR noise application optimisations are currently restricted
to hardware with 128-bit vectors, and use a quadruple multipler, they
can load up to 16 32-bit elements. But the "loads" are of 2 segments,
or 16 pairs of single precision float.

Thus we need to expand the dupiclated section of the noise table from
2x8 to 2x16 to avoid overflows.
2024-05-25 22:00:18 +03:00
Andreas Rheinhardt
e9197db4f7 tests/checkasm/vvc_alf: Don't use declare_func_emms
VVC does not have MMX code at all, so one can use the stricter
declare_func to also check that the MMX state has not been clobbered
with (which would be an ABI violation).

Reviewed-by: Martin Storsjö <martin@martin.st>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2024-05-25 14:21:54 +02:00
Andreas Rheinhardt
8e27bd025f avformat/async,cache: Use more unique context names
Otherwise Doxygen thinks any text like "Context for foo"
is a link to the async protocol's struct called "Context".

Reported-by: Andrew Sayers <ffmpeg-devel@pileofstuff.org>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2024-05-25 13:52:19 +02:00
Andreas Rheinhardt
edc235e076 avformat/riffenc: Fix outdated comment
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2024-05-25 13:52:05 +02:00
Andreas Rheinhardt
50c25d1f0a avformat/matroskaenc: Check ff_put_wav_header() failure
Fixes Coverity issue #1506706.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2024-05-25 13:51:58 +02:00
Andreas Rheinhardt
65763bffb6 avformat/mpegts: Don't use uninitialized value in av_log()
It is undefined behaviour in (at least) C11 (see C11 6.3.2.1 (2)).
Fixes Coverity issue #1500314.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2024-05-25 13:51:27 +02:00
Andreas Rheinhardt
d8cad01805 avformat/dhav: Check amount read
Prevents potential use of uninitialized data in the following
memcmp().

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2024-05-25 13:51:27 +02:00
Andreas Rheinhardt
cf6d07522a avformat/dhav: Check ffio_ensure_seekback()
Fixes Coverity issue #1492324.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2024-05-25 13:51:27 +02:00
Andreas Rheinhardt
95faf45af1 avformat/qoadec: Check ffio_ensure_seekback()
Fixes Coverity issue #1598406.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2024-05-25 13:51:27 +02:00
Andreas Rheinhardt
6dc8d4eea8 avformat/westwood_vqa: Check ffio_ensure_seekback()
Fixes Coverity issue #1598405.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2024-05-25 13:51:27 +02:00
Andreas Rheinhardt
590fffe6ad avformat/gifdec: Check ffio_ensure_seekback()
Fixes Coverity issue #1598400.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2024-05-25 13:51:27 +02:00
Andreas Rheinhardt
b47116be45 avformat/oggdec: Check ffio_ensure_seekback()
Fixes Coverity issue #1492327.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2024-05-25 13:51:27 +02:00
Rémi Denis-Courmont
f883746587 lavc/flacdsp: do not assume maximum R-V VL
This loop correctly assumes that VLMAX=16 (4x128-bit vectors
with 32-bit elements) and 32 >= pred_order > 16. We need to alternate
between VL=16 and VL=t2=pred_order-16 elements to add up to pred_order.

The current code requests AVL=a2=pred_order elements. In QEMU and on
thte K230 hardware, this sets VL=16 as we need. But the specification
merely guarantees that we get: ceil(AVL / 2) <= VL <= VLMAX. For
instance, if pred_order equals 27, we could end up with VL=14 or VL=15
instead of VL=16. So instead, request literally VLMAX=16.
2024-05-25 10:31:50 +03:00
Andreas Rheinhardt
aff24c1658 avcodec/flacdec: Remove unused variable
Forgotten in 0380a03f1f.

Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2024-05-24 19:05:57 +02:00
Rémi Denis-Courmont
ba38d0e328 lavc/pixblockdsp: add scalar get_pixels_unaligned
The code is already there, we just need to use it.

get_pixels_unaligned_c: 2.2
get_pixels_unaligned_misaligned: 1.7
2024-05-24 17:53:43 +03:00
Rémi Denis-Courmont
d03cdfa2b6 checkasm/riscv: test misaligned before V
Otherwise V functions mask scalar misaligned ones.
2024-05-24 17:53:43 +03:00
James Almer
0920f506a7 checkasm/flacdsp: add a test for lpc33
Signed-off-by: James Almer <jamrial@gmail.com>
2024-05-24 09:23:00 -03:00
James Almer
0380a03f1f avcodec/flacdsp: split off lpc33 into a dsp function
Signed-off-by: James Almer <jamrial@gmail.com>
2024-05-24 09:23:00 -03:00
James Almer
62397bcf6a avformat/movenc: add support for writing SA3D boxes
Signed-off-by: James Almer <jamrial@gmail.com>
2024-05-23 19:06:46 -03:00
James Almer
8c97449482 avutil/channel_layout: add a helper function to get the ambisonic order of a layout
Signed-off-by: James Almer <jamrial@gmail.com>
2024-05-23 12:07:19 -03:00
Haihao Xiang
8155808ce6 libavcodec/x86/vvc/vvc_sad: fix assembler error
X86ASM    libavcodec/x86/vvc/vvc_sad.o
libavcodec/x86/vvc/vvc_sad.asm:85: error: invalid number of operands
libavcodec/x86/vvc/vvc_sad.asm:87: error: invalid number of operands

Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2024-05-23 09:12:50 -03:00
Andreas Rheinhardt
ece95dc3dc avfilter/af_atempo: Fix indentation
Forgotten after b8f74ee57a.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2024-05-23 10:45:55 +02:00
Andreas Rheinhardt
42e0e05834 avfilter/af_atempo: Simplify resetting
The earlier code distinguished between a partial reset
(yae_clear()) and a complete reset (yae_release_buffers()
which also releases the buffers); this separation existed
to avoid allocations, as buffers were reallocated on reconfigs.

Yet it is pointless since a5704659e3,
so simply use yae_release_buffers() everywhere.

Reviewed-by: Pavel Koshevoy <pkoshevoy@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2024-05-23 10:45:25 +02:00
Andreas Rheinhardt
35e7fa0a2e avfilter/af_atempo: Properly check av_tx_init()
Fixes Coverity issue #1516804.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2024-05-23 10:45:16 +02:00
Stone Chen
2e877090f9 tests/checkasm: Add check_vvc_sad to vvc_mc.c
Adds checkasm for DMVR SAD AVX2 implementation.

Benchmarks ( AMD 7940HS )
vvc_sad_8x8_c: 50.3
vvc_sad_8x8_avx2: 0.3
vvc_sad_16x16_c: 250.3
vvc_sad_16x16_avx2: 10.3
vvc_sad_32x32_c: 1020.3
vvc_sad_32x32_avx2: 60.3
vvc_sad_64x64_c: 3850.3
vvc_sad_64x64_avx2: 220.3
vvc_sad_128x128_c: 14100.3
vvc_sad_128x128_avx2: 840.3

Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2024-05-22 20:36:46 -03:00
Stone Chen
0e52a4e434 libavcodec/x86/vvc: Add AVX2 DMVR SAD functions for VVC
Implements AVX2 DMVR (decoder-side motion vector refinement) SAD functions. DMVR SAD is only calculated if w >= 8, h >= 8, and w * h > 128. To reduce complexity, SAD is only calculated on even rows. This is calculated for all video bitdepths, but the values passed to the function are always 16bit (even if the original video bitdepth is 8). The AVX2 implementation uses min/max/sub.

Additionally this changes parameters dx and dy from int to intptr_t. This allows dx & dy to be used as pointer offsets without needing to use movsxd.

Benchmarks ( AMD 7940HS )
Before:
BQTerrace_1920x1080_60_10_420_22_RA.vvc | 106.0 |
Chimera_8bit_1080P_1000_frames.vvc | 204.3 |
NovosobornayaSquare_1920x1080.bin | 197.3 |
RitualDance_1920x1080_60_10_420_37_RA.266 | 174.0 |

After:
BQTerrace_1920x1080_60_10_420_22_RA.vvc | 109.3 |
Chimera_8bit_1080P_1000_frames.vvc | 216.0 |
NovosobornayaSquare_1920x1080.bin | 204.0|
RitualDance_1920x1080_60_10_420_37_RA.266 | 181.7 |

Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2024-05-22 20:36:21 -03:00
James Almer
3146b77a7d avformat/mov: store sample_sizes as unsigned ints
As defined in Section 8.7.3.2.1 of ISO 14496-12.
Any unsupported value will be rejected in mov_build_index() without outright
aborting demuxing.

Fixes ticket #11005.

Signed-off-by: James Almer <jamrial@gmail.com>
2024-05-22 17:46:49 -03:00
James Almer
2d84ee3745 avformat/vvc: fix parsing sps_subpic_id
The length of the sps_subpic_id[i] syntax element is sps_subpic_id_len_minus1 + 1 bits.

Signed-off-by: James Almer <jamrial@gmail.com>
2024-05-22 17:46:49 -03:00
James Almer
3bd7e3a336 avformat/vvc: initialize some ptl flags
Signed-off-by: James Almer <jamrial@gmail.com>
2024-05-22 17:46:49 -03:00
Rémi Denis-Courmont
910d281b21 lavc/h263dsp: R-V V {h,v}_loop_filter
Since the horizontal and vertical filters are identical except for a
transposition, this uses a common subprocedure with an ad-hoc ABI.
To preserve return-address stack prediction, a link register has to be
used (c.f. the "Control Transfer Instructions" from the
RISC-V ISA Manual). The alternate/temporary link register T0 is used
here, so that the normal RA is preserved (something Arm cannot do!).

To load the strength value based on `qscale`, the shortest possible
and PIC-compatible sequence is used: AUIPC; ADD; LBU. The classic
LLA; ADD; LBU sequence would add one more instruction since LLA is a
convenience alias for AUIPC; ADDI. To ensure that this trick works,
relocation relaxation is disabled.

To implement the two signed divisions by a power of two toward zero:
 (x / (1 << SHIFT))
the code relies on the small range of integers involved, computing:
 (x + (x >> (16 - SHIFT))) >> SHIFT
rather than the more general:
 (x + ((x >> (16 - 1)) & ((1 << SHIFT) - 1))) >> SHIFT
Thus one ANDI instruction is avoided.

T-Head C908:
h263dsp.h_loop_filter_c:       228.2
h263dsp.h_loop_filter_rvv_i32: 144.0
h263dsp.v_loop_filter_c:       242.7
h263dsp.v_loop_filter_rvv_i32: 114.0
(C is probably worse in real use due to less predictible branches.)
2024-05-22 19:15:39 +03:00
James Almer
3d1597d3e2 x86/vvc_alf: use the x86inc instruction macros
Let its magic figure out the correct mnemonic based on target instruction set.

Signed-off-by: James Almer <jamrial@gmail.com>
2024-05-22 20:51:30 +08:00
llyyr
d1b96c3808 avformat/mov: avoid seeking back to 0 on HEVC open GOP files
ab77b878f1 attempted to fix the issue of broken packets being sent to
the decoder by implementing logic that kept attempting to PTS-step
backwards until it reached a valid point, however applying this
heuristic meant that in files that had no valid points (such as HEVC
videos shot on iPhones), we'd seek back to sample 0 on every seek
attempt. This meant that files that were previously seekable, albeit
with some skipped frames, were not seekable at all now.

Relax this heuristic a bit by giving up on seeking to a valid point if
we've tried a different sample and we still don't have a valid point to
seek to. This may some frames to be skipped on seeking but it's better
than not being able to seek at all in such files.

Fixes: ab77b878f1 ("avformat/mov: fix seeking with HEVC open GOP files")
Fixes: #10585
Signed-off-by: Philip Langdale <philipl@overt.org>
2024-05-21 18:57:44 -07:00
sunyuechi
0c1304ae11 lavc/vp9dsp: R-V V mc avg
C908:
vp9_avg4_8bpp_c: 1.2
vp9_avg4_8bpp_rvv_i64: 1.0
vp9_avg8_8bpp_c: 3.7
vp9_avg8_8bpp_rvv_i64: 1.5
vp9_avg16_8bpp_c: 14.7
vp9_avg16_8bpp_rvv_i64: 3.5
vp9_avg32_8bpp_c: 57.7
vp9_avg32_8bpp_rvv_i64: 10.0
vp9_avg64_8bpp_c: 229.0
vp9_avg64_8bpp_rvv_i64: 31.7

Signed-off-by: Rémi Denis-Courmont <remi@remlab.net>
2024-05-21 21:28:14 +03:00
Rémi Denis-Courmont
7591eb4055 Revert "lavc/sbrdsp: R-V V neg_odd_64"
While this function can easily be written with vectors, it just fails to
get any performance improvement.

For reference, this is a simpler loop-free implementation that does get
better performance than the current one depending on hardware, but still
more or less the same metrics as the C code:

 func ff_sbr_neg_odd_64_rvv, zve64x
         li      a1, 32
         addi    a0, a0, 7
         li      t0, 8
         vsetvli zero, a1, e8, m2, ta, ma
         li      t1, 0x80
         vlse8.v v8, (a0), t0
         vxor.vx v8, v8, t1
         vsse8.v v8, (a0), t0
         ret
 endfunc

This reverts commit d06fd18f8f.
2024-05-21 21:26:39 +03:00