Commit Graph

108615 Commits

Author SHA1 Message Date
Andreas Rheinhardt
37ee36f689 checkasm/idctdsp: Use declare_func_emms only when needed
There is no MMX code for (add|put|put_signed)_pixels_clamped
since commit bfb28b5ce8, so use
declare_func instead of declare_func_emms() to also test that
we are not in MMX mode after return.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-10-11 14:18:54 +02:00
Andreas Rheinhardt
5102b98b7a checkasm/llviddspenc: Use declare_func_emms only when needed
There is no MMX code for diff_bytes since commit
230ea38de1, so use declare_func
instead of declare_func_emms() to also test that we are not
in MMX mode after return.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-10-11 14:18:54 +02:00
Andreas Rheinhardt
e814569c8d checkasm/huffyuvdsp: Use declare_func_emms only when needed
There is no MMX code for add_int16 since commit
4b6ffc2880, so use declare_func
instead of declare_func_emms() to also test that we are not
in MMX mode after return.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-10-11 14:18:54 +02:00
Andreas Rheinhardt
cd8a33bcce checkasm/llviddsp: Be strict about MMX
There is no MMX code for llviddsp after commit
fed07efcde, so use declare_func
instead of declare_func_emms() to also test that we are not
in MMX mode after return.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-10-11 14:18:54 +02:00
Andreas Rheinhardt
b4e2d67636 checkasm/pixblockdsp: Be strict about MMX
There is no MMX code for pixblockdsp after commit
92b5800277, so use declare_func
instead of declare_func_emms() to also test that we are not
in MMX mode after return.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-10-11 14:18:54 +02:00
Andreas Rheinhardt
42921190cb checkasm/audiodsp: Be strict about MMX
There is no MMX code for audiodsp after commit
3d716d38ab, so use declare_func
instead of declare_func_emms() to also test that we are not
in MMX mode after return.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-10-11 14:18:54 +02:00
Andreas Rheinhardt
18afaa20f1 checkasm/blockdsp: Be strict about MMX
There is no MMX code for blockdsp after commit
ee551a21dd, so use declare_func
instead of declare_func_emms() to also test that we are not
in MMX mode after return.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-10-11 14:18:54 +02:00
Andreas Rheinhardt
f224c195e0 checkasm/vc1dsp: Use declare_func_emms only when needed
There is no MMX code for vc1_inv_trans_8x8 or
vc1_unescape_buffer, so use declare_func instead of
declare_func_emms() to also test that we are not in MMX
mode after return.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-10-11 14:18:54 +02:00
Anton Khirnov
adb927fa7a lavc/encode: combine setting no-delay pts for video/audio 2022-10-11 11:59:11 +02:00
Anton Khirnov
8789720d28 lavc/encode: generalize a check for setting dts=pts
DTS may be different from PTS only if both of these are true:
- the codec supports reordering
- the encoder has delay
2022-10-11 11:57:52 +02:00
Reimar Döffinger
38cd829dce
aarch64: Implement stack spilling in a consistent way.
Currently it is done in several different ways, which
might cause needless dependencies or in case of
tx_float_neon.S is incorrect.

Reviewed-by: Martin Storsjö <martin@martin.st>
Signed-off-by: Reimar Döffinger <Reimar.Doeffinger@gmx.de>
2022-10-11 09:12:02 +02:00
Andreas Rheinhardt
e10e27a2ea avcodec/opustab: Avoid indirection to access ff_celt_window
Currently, it is accessed via a pointer (ff_celt_window)
exported from opustab.h which points inside a static array
(ff_celt_window_padded) in opustab.h. Instead export
ff_celt_window_padded directly and make opustab.h
a static const pointer pointing inside ff_celt_window_padded.
Also mark all the declarations in opustab.h as hidden,
so that the compiler knows that ff_celt_window has a fixed
offset from the code even when compiling position-independent
code.

Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-10-10 14:10:49 +02:00
Andreas Rheinhardt
a60befce40 avutil/attributes_internal: Add visibility pragma
GCC 4.0 not only added a visibility attribute, but also
a pragma to set it for a whole region of code.*
This commit exposes this via macros.

*: See https://gcc.gnu.org/gcc-4.0/changes.html

Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-10-10 13:43:59 +02:00
Haihao Xiang
f3b5277057 lavc/qsvenc_hevc: use open GOP by default
HEVC spec has CRA frame which allows random access with open GOP, hence
it can achieve higher compression efficiency.

Removing the entry was suggested by Andreas

Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>
2022-10-10 11:10:13 +08:00
Fei Wang
56a52af12b lavc/qsv: add support for decoding & encoding 12bit content
AV_PIX_FMT_P012, AV_PIX_FMT_Y212 and AV_PIX_FMT_XV36 are used in
FFmpeg and MFX_FOURCC_P016, MFX_FOURCC_Y216, and MFX_FOURCC_Y416 are used
in the SDK

Signed-off-by: Fei Wang <fei.w.wang@intel.com>
Signed-off-by: Wenbin Chen <wenbin.chen@intel.com>
Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>
2022-10-10 09:31:34 +08:00
Fei Wang
201cb35061 lavu/hwcontext_qsv: add support for 12bit content on Linux
P012, Y212 and XV36 are used for 12bit content in FFmpeg VAAPI, so
these formats should be used in FFmpeg QSV too, however the SDK only
declares support for P016, Y216 and Y416. So this commit fudged mappings
between AV_PIX_FMT_P012 and MFX_FOURCC_P016, AV_PIX_FMT_Y212 and
MFX_FOURCC_Y216, AV_PIX_FMT_XV36 and MFX_FOURCC_Y416.

Signed-off-by: Fei Wang <fei.w.wang@intel.com>
Signed-off-by: Wenbin Chen <wenbin.chen@intel.com>
Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>
2022-10-10 09:31:34 +08:00
Haihao Xiang
1898dbddd5 lavc/qsv: add support for decoding & encoding 10bit 4:4:4 content
AV_PIX_FMT_XV30 is used in FFmpeg and MFX_FOURCC_Y410 is used in the
SDK.

Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>
2022-10-10 09:31:34 +08:00
Haihao Xiang
aba25b391c lavu/hwcontext_qsv: add support for 10bit 4:4:4 content on Linux
XV30 is used for 10bit 4:4:4 content in FFmpeg VAAPI, so XV30 should be
used for 10bit 4:4:4 content in FFmpeg QSV too because QSV is based on
VAAPI on Linux. However the SDK only declares support for Y410 but does
nothing with the alpha in Y410, so this commit fudged a mapping between
AV_PIX_FMT_XV30 and MFX_FOURCC_Y410.

Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>
2022-10-10 09:31:34 +08:00
Haihao Xiang
3f28116ea2 lavc/qsv: specify Shift for each format too
Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>
2022-10-10 09:31:34 +08:00
Haihao Xiang
1496e7c173 lavu/hwcontext_qsv: specify Shift for each format
We can't get Shift from bit depth for some formats in the SDK. For
example, bit depth is 10, however Shift is 0 for Y410 (XV30 in FFmpeg).
In order to support these formats in the next commits, this patch
specified Shift for each format

Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>
2022-10-10 09:31:34 +08:00
Rémi Denis-Courmont
c962c78901 checkasm: RISC-V 64-bit assembler test harness 2022-10-10 02:23:18 +02:00
Rémi Denis-Courmont
105921251a lavc/aacpsdsp: fix clobber on RISC-V LP64D/ILP32D
Although the DSP function only uses single precision from RISC-V F, the
caller may leave double precision values in the spilled registers if the
calling convention supports double precision hardware floats. Then, we
need to save and restore FS registers as double precision.

Conversely, we do not need to save anything at all if an integer calling
convention is in use. However we can assume that single precision floats
are supported, since the Zve32f extension implies the F extension.
So for the sake of simplicity, we always save at least single precision
values.

In theory, we should even save quadruple precision values if the LP64Q
ABI is in use. I have yet to see a compiler that supports it though.
2022-10-10 02:23:18 +02:00
Rémi Denis-Courmont
bfc69297c5 lavc/opusdsp: RISC-V V (512-bit) postfilter
This adds a variant of the postfilter for use with 512-bit vectors.
Half a vector is enough to perform the scalar product. Normally a whole
vector would be used anyhow. Indeed fractional multiplers are no faster
than the unit multipler.

But in this particular function, a full vector makes up 16 samples,
which would be loaded at each iteration of the outer loop. The minimum
guaranteed CELT postfilter period is only 15. Accounting for the edges,
we can only safely preload up to 13 samples.

The fractional multipler is thus used to cap the selected vector length
to a safe value of 8 elements or 256 bits.

Likewise, we have the 1024-bit variant with the quarter multipler. In
theory, a 2048-bit one would be possible with the eigth multipler, but
that length is not even defined in the specifications as of yet, nor is
it supported by any emulator - forget actual hardware.
2022-10-10 02:23:17 +02:00
Rémi Denis-Courmont
97d34befea lavc/opusdsp: RISC-V V (256-bit) postfilter
This adds a variant of the postfilter for use with 256-bit vectors.
As a single vector is then large enough to perform the scalar product,
the group multipler is reduced to just one at run-time.

The different vector type is passed via register. Unfortunately,
there is no VSETIVL instruction, so the constant vector size (5) also
needs to be passed via a register.
2022-10-10 02:22:39 +02:00
Rémi Denis-Courmont
f59a767ccd lavu/riscv: helper macro for VTYPE encoding
On most cases, the vector type (VTYPE) for the RISC-V Vector extension
is supplied as an immediate value, with either of the VSETVLI or
VSETIVLI instructions. There is however a third instruction VSETVL
which takes the vector type from a general purpose register. That is so
the type can be selected at run-time.

This introduces a macro to load a (valid) vector type into a register.
The syntax follows that of VSETVLI and VSETIVLI, with element size,
group multiplier, then tail and mask policies.
2022-10-10 02:22:12 +02:00
Rémi Denis-Courmont
8009581912 lavc/opusdsp: RISC-V V (128-bit) postfilter
This is implemented for a vector size of 128-bit. Since the scalar
product in the inner loop covers 5 samples or 160 bits, we need a group
multipler of 2.

To avoid reconfiguring the vector type, the outer loop, which loads
multiple input samples sticks to the same multipler. Consequently, the
outer loop loads 8 samples per iteration. This is safe since the minimum
period of the CELT codec is 15 samples.

The same code would also work, albeit needlessly inefficiently with a
vector length of 256 bits. A proper implementation will follow instead.
2022-10-10 02:22:10 +02:00
Carl Eugen Hoyos
82479ef6bd lavfi/rotate: Avoid undefined behaviour.
Fixes the following integer overflows:
libavfilter/vf_rotate.c:273:13: runtime error: signed integer overflow: 92951468 + 2058533568 cannot be represented in type 'int'
libavfilter/vf_rotate.c:273:37: runtime error: signed integer overflow: 39684 * 54149 cannot be represented in type 'int'
libavfilter/vf_rotate.c:272:13: runtime error: signed integer overflow: 247587320 + 1900985032 cannot be represented in type 'int'
libavfilter/vf_rotate.c:272:37: runtime error: signed integer overflow: 42584 * 50430 cannot be represented in type 'int'
libavfilter/vf_rotate.c:272:50: runtime error: signed integer overflow: 65083 * 52912 cannot be represented in type 'int'
libavfilter/vf_rotate.c:273:50: runtime error: signed integer overflow: 65286 * 38044 cannot be represented in type 'int'

Fixes ticket #9799, different output with different compilers.
2022-10-10 02:58:39 +02:00
Carl Eugen Hoyos
60e87faf7f lavc/x86/simple_idct: Fix linking shared libavcodec with MS link.exe
link.exe hangs on empty simple_idct.o

Fixes ticket #9909.
2022-10-10 02:42:44 +02:00
Andreas Rheinhardt
8320e236c1 avcodec/opus: Rename opus.c->opus_celt.c, opus_celt.c->opusdec_celt.c
Since commit 4fc2531fff opus.c
contains only the celt stuff shared between decoder and encoder.
meanwhile, opus_celt.c is decoder-only. So the new names
reflect the actual content better than the current ones.

Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-10-09 19:45:06 +02:00
Andreas Rheinhardt
4486ff9242 avcodec/mjpegenc_common: Don't flush unnecessarily
The PutBitContext has already been flushed a few lines above
and nothing has been written to it in the meantime.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-10-09 19:31:47 +02:00
Andreas Rheinhardt
33a96b600b avcodec/speedhqenc: Remove unnecessary headers
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-10-09 19:31:47 +02:00
Andreas Rheinhardt
d2dc6440e6 avcodec/vc2enc: Don't use bitcount when byte-aligned
(There is a small issue that is now being treated differently:
The earlier code would record a position in a buffer that
is being written to via put_bits(), then write data,
then overwrite the byte at the position recorded earlier
and only then flush the PutBitContext. In case there was
no writeout in the meantime, said flush would overwrite
what one has just written. This never happened in my tests,
but maybe it can happen. In this case this commit fixes
this issue by flushing before overwriting the old data.)

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-10-09 19:31:47 +02:00
Andreas Rheinhardt
b9133bce04 avcodec/me_cmp: Mark ff_square_tab as hidden
ff_square_tab is always used with an offset; if this table
is marked as hidden, the compiler can infer that it and
therefore also ff_square_tab + 256 have a fixed offset
from the code. This allows to avoid performing "+ 256"
at runtime by baking it into the offset from the code to
the table.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-10-09 19:31:47 +02:00
Andreas Rheinhardt
ebcaa24274 avcodec/asvdec: Remove unnecessary emms_c()
This codec uses BswapDSP, BlockDSP and IDCTDSP.
The former never used MMX, the latter does not use it
for idct_put since bfb28b5ce8
and BlockDSP does not use it since commit
ee551a21dd.
Therefore this emms_c() is can be removed.

(It was actually always redundant, because its caller
(decode_simple_internal()) calls emms_c() itself afterwards.)

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-10-09 19:28:11 +02:00
Andreas Rheinhardt
af94ae7dc7 avcodec/ljpegenc: Remove unnecessary emms_c()
This encoder does not use any DSP function at all.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-10-09 19:28:11 +02:00
Andreas Rheinhardt
5bd55b488f avcodec/ljpegenc: Remove unused IDCTDSPContext
It is basically write-only.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-10-09 19:28:11 +02:00
Andreas Rheinhardt
77adbe28ab avcodec/mjpegenc_common: Don't check luma/chroma matrices unnecessarily
These matrices are only used for MJPEG, not for LJPEG.
So only check them for the former. This is in preparation
for removing said matrices from LJPEG altogether
(i.e. sending NULL matrices).

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-10-09 19:28:11 +02:00
Andreas Rheinhardt
6bf99f8c93 avcodec/huffyuv: Update outdated link
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-10-09 09:15:40 +02:00
Andreas Rheinhardt
cad1593330 avcodec/huffyuv: Speed up generating Huffman codes
The codes here have the property that the long codes
are to the left of the tree (each zero bit child node
is by definition to the left of its one bit sibling);
they also have the property that among codes of the same length,
the symbol is ascending from left to right.

These properties can be used to create the codes from
the lengths in only two passes over the array of lengths
(the current code uses one pass for each length, i.e. 32):
First one counts how many nodes of each length there are.
Then one calculates the range of codes of each length
(possible because the codes are ordered by length in the tree).
This enables one to calculate the actual codes with only
one further traversal of the length array.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-10-09 09:15:40 +02:00
Andreas Rheinhardt
566280c3f4 avcodec/huffyuv: Split HYuvContext into decoder and encoder context
While the share of elements used by both is quite big, the amount
of code shared between the decoders and encoders is negligible.
Therefore one can easily split the context if one wants to.
The reasons for doing so are that the non-shared elements
are non-negligible: The stats array which is only used by
the encoder takes 524288B of 868904B (on x64); similarly,
pix_bgr_map which is only used by the decoder takes 16KiB.
Furthermore, using a shared context also entails inclusions
of unneeded headers like put_bits.h for the decoder and get_bits.h
for the encoder (and all of these and much more for huffyuv.c).

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-10-09 09:15:40 +02:00
Andreas Rheinhardt
83a8b9fac7 avcodec/huffyuv: Inline ff_huffyuv_common_init() in its callers
This is in preparation for splitting HYuvContext.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-10-09 09:15:40 +02:00
Andreas Rheinhardt
2415f5158b avcodec/huffyuv: Use AVCodecContext.(width|height) directly
These parameters are easily accessible whereever they
are accessed, so using copies from HYuvContext is
unnecessary.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-10-09 09:15:40 +02:00
Andreas Rheinhardt
bfdf3470f7 avcodec/huffyuvenc: Avoid unnecessary function call
av_pix_fmt_get_chroma_sub_sample() is superfluous if one
already has an AVPixFmtDescriptor.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-10-09 09:15:40 +02:00
Andreas Rheinhardt
f9be667452 avcodec/huffyuvenc: Improve code locality
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-10-09 09:15:40 +02:00
Andreas Rheinhardt
59535346b1 avocdec/huffyuvdec: Don't use HYuvContext.avctx
It is nearly unused anyway, so stop use the field altogether.
This is in preparation for splitting HYuvContext into
decoder and encoder contexts.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-10-09 09:15:39 +02:00
Andreas Rheinhardt
1741adb1c7 avcodec/huffyuvencdsp: Pass pix_fmt directly when initing dsp
It is the only thing that is actually used.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-10-09 09:15:39 +02:00
Andreas Rheinhardt
9ec50660ad avcodec/huffyuvenc: Don't second-guess error code
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-10-09 09:15:39 +02:00
Andreas Rheinhardt
75842c35e7 avcodec/huffyuvenc: Remove redundant call
All codecs here have the FF_CODEC_CAP_INIT_CLEANUP set,
so ff_huffyuv_common_end() will be called automatically
in encode_end() on error.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-10-09 09:15:39 +02:00
Andreas Rheinhardt
e766378619 avcodec/huffyuvenc: Remove always-false check
The ffvhuff encoder has AVCodec.pix_fmts set and therefore
encode_preinit_video() checks that the used pixel format
is permissible.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-10-09 09:15:39 +02:00
Andreas Rheinhardt
be65f24ad6 avcodec/huffyuvenc: Avoid pointless indirections
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-10-09 09:15:39 +02:00