Commit Graph

5657 Commits

Author SHA1 Message Date
Rémi Denis-Courmont
c177108ae1 lavu/riscv: add <intmath.h> optimisations
This provides some micro-optimisations for signed integer clipping, and
support for bit weight with the Zbb extension.
2022-09-13 16:50:43 -03:00
Rémi Denis-Courmont
df2057041b lavu/riscv: byte-swap operations
If the target supports the Basic bit-manipulation (Zbb) extension, then
the REV8 instruction is available to reverse byte order.

Note that this instruction only exists at the "XLEN" register size,
so we need to right shift the result down to the data width.

If Zbb is not supported, then this patchset does nothing. Support for
run-time detection is left for the future. Currently, there are no
bits in auxv/ELF HWCAP for Z-extensions, so there are no clean ways to
do this.
2022-09-13 16:50:43 -03:00
Rémi Denis-Courmont
d808070547 lavu/riscv: AV_READ_TIME cycle counter
This uses the architected RISC-V 64-bit cycle counter from the
RISC-V unprivileged instruction set.

In 64-bit and 128-bit, this is a straightforward CSR read.
In 32-bit mode, the 64-bit value is exposed as two CSRs, which
cannot be read atomically, so a loop is necessary to detect and fix up
the race condition where the bottom half wraps exactly between the two
reads.
2022-09-13 16:50:43 -03:00
James Almer
bda3a9faf4 x86/float_dsp: use three operand form for some instructions
Fixes compilation with old yasm

Signed-off-by: James Almer <jamrial@gmail.com>
2022-09-13 13:50:09 -03:00
Paul B Mahol
72acff9f59 avutil/x86/float_dsp: add fma3 for scalarproduct 2022-09-13 17:43:15 +02:00
Andreas Rheinhardt
29c4c0886d avutil/x86/intreadwrite: Add ability to detect whether MMX code is used
It can be used to call emms_c() only when needed.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-09-11 21:08:04 +02:00
Lynne
f1b35fc8f0
lavu/tx: remove av_cold from table definitions
How did this get here?
2022-09-11 03:18:40 +02:00
Lynne
c92edd969a
lavu/tx: rotate 3 & 15-point exptabs
This just inverts their signs. Simplifies SIMD.
2022-09-10 02:37:17 +02:00
Lynne
51172223fd
lavu/tx: generalize MDCTs
The same code can perform any-length MDCTs with minimal changes.
2022-09-10 02:37:16 +02:00
Lynne
645a1f4422
lavu/tx: add the inplace flag to PFA FFTs
They support in-place, because they have to use a temporary buffer.
2022-09-10 02:37:14 +02:00
Lynne
8c283e8fe6
lavu/tx: propagate the codelet flags into the context
The field is documented as a combination of both.
2022-09-10 02:37:11 +02:00
Haihao Xiang
b7dbffe698 lavu/hwcontext_qsv: add support for AV_PIX_FMT_VUYX on Linux
AV_PIX_FMT_VUYX is used for 8bit 4:4:4 content in FFmpeg VAAPI, so
AV_PIX_FMT_VUYX should be used for 8bit 4:4:4 content in FFmpeg QSV too
because QSV is based on VAAPI on Linux. However the SDK only declares
support for AYUV and does nothing with the alpha, so this commit fudged
a mapping between AV_PIX_FMT_VUYX and MFX_FOURCC_AYUV.

Reviewed-by: Philip Langdale <philipl@overt.org>
Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>
2022-09-07 14:04:12 +08:00
James Almer
f4097e4c1f x86/tx_float: add missing check for AVX2
Fixes compilation with old yasm.

Signed-off-by: James Almer <jamrial@gmail.com>
2022-09-06 14:06:33 -03:00
James Almer
74f5fb6db8 x86/tx_float: set all operands for shufps
Fixes compilation with AVX2 enabled yasm.

Signed-off-by: James Almer <jamrial@gmail.com>
2022-09-06 14:06:03 -03:00
Martin Storsjö
da5f7799a0 slicethread: Limit the automatic number of threads to 16
This matches a similar cap on the number of automatic threads
in libavcodec/pthread_slice.c.

On systems with lots of cores, this fixes a couple fate failures
in 32 bit mode on such machines (where spawning a huge number of
threads runs out of address space).

Signed-off-by: Martin Storsjö <martin@martin.st>
2022-09-06 18:46:44 +03:00
Martin Storsjö
e4759fa951 x86/tx_float: Fix building for platforms with a symbol prefix
This fixes building for x86 macOS (both i386 and x86_64) and
i386 windows.

Signed-off-by: Martin Storsjö <martin@martin.st>
2022-09-06 18:46:39 +03:00
Lynne
a89025f74d
aarch64/tx_float: fix compilation
Forgot to add the new function arguments.
2022-09-06 05:42:32 +02:00
Lynne
4537d9554d
x86/tx_float: implement inverse MDCT AVX2 assembly
This commit implements an iMDCT in pure assembly.

This is capable of processing any mod-8 transforms, rather than just
power of two, but since power of two is all we have assembly for
currently, that's what's supported.
It would really benefit if we could somehow use the C code to decide
which function to jump into, but exposing function labels from assebly
into C is anything but easy.
The post-transform loop could probably be improved.

This was somewhat annoying to write, as we must support arbitrary
strides during runtime. There's a fast branch for stride == 4 bytes
and a slower one which uses vgatherdps.

Zen 3 benchmarks for stride == 4 for old (av_imdct_half) vs new (av_tx):

128pt:
   2811 decicycles in         av_tx (imdct),16775916 runs,   1300 skips
   3082 decicycles in         av_imdct_half,16776751 runs,    465 skips

256pt:
   4920 decicycles in         av_tx (imdct),16775820 runs,   1396 skips
   5378 decicycles in         av_imdct_half,16776411 runs,    805 skips

512pt:
   9668 decicycles in         av_tx (imdct),16775774 runs,   1442 skips
  10626 decicycles in         av_imdct_half,16775647 runs,   1569 skips

1024pt:
  19812 decicycles in         av_tx (imdct),16777144 runs,     72 skips
  23036 decicycles in         av_imdct_half,16777167 runs,     49 skips
2022-09-06 04:21:46 +02:00
Lynne
2425d5cd7e
x86/tx_float: add support for calling assembly functions from assembly
Needed for the next patch.
We get this for the extremely small cost of a branch on _ns functions,
which wouldn't be used anyway with assembly.
2022-09-06 04:21:41 +02:00
Anton Khirnov
ea5b375e0e lavu/fifo: clarify interaction of AV_FIFO_FLAG_AUTO_GROW with av_fifo_write() 2022-09-05 08:59:36 +02:00
Anton Khirnov
8728808b3e lavu/fifo: clarify interaction of AV_FIFO_FLAG_AUTO_GROW with av_fifo_can_write() 2022-09-05 08:59:14 +02:00
Anton Khirnov
c9b6fd27bf lavu/fifo: add the header to its own doxy group
Also, drop mentions of it being a circular buffer, as this is an
internal implementation detail that should be invisible to the caller.
2022-09-05 08:58:51 +02:00
Andreas Rheinhardt
f89949afed avutil/tests/.gitignore: Add channel_layout testtool
Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-09-05 02:00:08 +02:00
Philip Langdale
2f9b8bbd1f lavu/hwcontext_vulkan: support mapping VUYX, P012, and XV36
If we want to be able to map between VAAPI and Vulkan (to do Vulkan
filtering), we need to have matching formats on each side.

The mappings here are not exact. In the same way that P010 is still
mapped to full 16 bit formats, P012 has to be mapped that way as well.

Similarly, VUYX has to be mapped to an alpha-equipped format, and XV36
has to be mapped to a fully 16bit alpha-equipped format.

While Vulkan seems to fundamentally lack formats with an undefined,
but physically present, alpha channel, it has have 10X6 and 12X4
formats that you could imagine using for P010, P012 and XV36, but these
formats don't support the STORAGE usage flag. Today, hwcontext_vulkan
requires all formats to be storable because it wants to be able to use
them to create writable images. Until that changes, which might happen,
we have to restrict the set of formats we use.

Finally, when mapping a Vulkan image back to vaapi, I observed that
the VK_FORMAT_R16G16B16A16_UNORM format we have to use for XV36 going
to Vulkan is mapped to Y416 when going to vaapi (which makes sense as
it's the exact matching format) so I had to add an entry for it even
though we don't use it directly.
2022-09-03 16:19:40 -07:00
Philip Langdale
b982dd0d83 lavc/vaapi: Add support for remaining 10/12bit profiles
With the necessary pixel formats defined, we can now expose support for
the remaining 10/12bit combinations that VAAPI can handle.

Specifically, we are adding support for:

* HEVC
** 12bit 420
** 10bit 422
** 12bit 422
** 10bit 444
** 12bit 444

* VP9
** 10bit 444
** 12bit 444

These obviously require actual hardware support to be usable, but where
that exists, it is now enabled.

Note that unlike YUVA/YUVX, the Intel driver does not formally expose
support for the alphaless formats XV30 and XV360, and so we are
implicitly discarding the alpha from the decoder and passing undefined
values for the alpha to the encoder. If a future encoder iteration was
to actually do something with the alpha bits, we would need to use a
formal alpha capable format or the encoder would need to explicitly
accept the alphaless format.
2022-09-03 16:19:40 -07:00
Philip Langdale
d75c4693fe lavu/pixfmt: Add P012, Y212, XV30, and XV36 formats
These are the formats we want/need to use when dealing with the Intel
VAAPI decoder for 12bit 4:2:0, 12bit 4:2:2, 10bit 4:4:4 and 12bit 4:4:4
respectively.

As with the already supported Y210 and YUVX (XVUY) formats, they are
based on formats Microsoft picked as their preferred 4:2:2 and 4:4:4
video formats, and Intel ran with it.

P12 and Y212 are simply an extension of 10 bit formats to say 12 bits
will be used, with 4 unused bits instead of 6.

XV30, and XV36, as exotic as they sound, are variants of Y410 and Y412
where the alpha channel is left formally undefined. We prefer these
over the alpha versions because the hardware cannot actually do
anything with the alpha channel and respecting it is just overhead.

Y412/XV46 is a normal looking packed 4 channel format where each
channel is 16bits wide but only the 12msb are used (like P012).

Y410/XV30 packs three 10bit channels in 32bits with 2bits of alpha,
like A/X2RGB10 style formats. This annoying layout forced me to define
the BE version as a bitstream format. It seems like our pixdesc
infrastructure can handle the LE version being byte-defined, but not
when it's reversed. If there's a better way to handle this, please
let me know. Our existing X2 formats all have the 2 bits at the MSB
end, but this format places them at the LSB end and that seems to be
the root of the problem.
2022-09-03 16:19:40 -07:00
Rémi Denis-Courmont
620e6e1487 arm: relax byte-swap assembler constraints
There are no particular reasons to force the compiler to use the same
register as output and input operand. This forces an extra MOV
instruction if the input value needs to be reused after the swap.

In most cases, this makes no differences, as the compiler will seleect
the same register for both operands either way.

Signed-off-by: Martin Storsjö <martin@martin.st>
2022-09-03 23:54:05 +03:00
Rémi Denis-Courmont
164021423a aarch64: relax byte-swap assembler constraints
There are no particular reasons to force the compiler to use the same
register as output and input operand. This forces an extra MOV
instruction if the input value needs to be reused after the swap.

In most cases, this makes no differences, as the compiler will seleect
the same register for both operands either way.

Signed-off-by: Martin Storsjö <martin@martin.st>
2022-09-03 23:54:05 +03:00
Andreas Rheinhardt
73fada029c avcodec/codec_internal: Add macros for update_thread_context(_for_user)
It reduces typing: Before this patch, there were 11 callbacks
that exceeded the 80 char line length limit; now there are zero.
It also allows to remove ONLY_IF_THREADS_ENABLED() in
libavutil/internal.h.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-09-03 15:42:57 +02:00
Andreas Rheinhardt
48286d4d98 avcodec/codec_internal: Add macro to set AVCodec.long_name
It reduces typing: Before this patch, there were 105 codecs
whose long_name-definition exceeded the 80 char line length
limit. Now there are only nine of them.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-09-03 15:42:57 +02:00
Andreas Rheinhardt
dea9744560 avutil/file: Properly deprecate av_tempfile()
It has been deprecated in b4f59beeb4,
but the attribute_deprecated was not set and there was no entry
in APIchanges. This commit adds these and schedules it for removal.
Given that the reason behind the deprecation is exactly the same
as in av_fopen_utf8(), reuse its FF_API_AV_FOPEN_UTF8.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-09-03 15:42:40 +02:00
Andreas Rheinhardt
72c601e0f7 avutil/internal: Move avpriv-file API to a header of its own
It is not used by the large majority of files that include
lavu/internal.h.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-09-03 15:41:44 +02:00
Andreas Rheinhardt
04b7217872 avutil/dict: Move avpriv_dict_set_timestamp() to a header of its own
It is used almost nowhere, so it needn't be auto-included
almost everywhere.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-09-03 15:41:44 +02:00
Andreas Rheinhardt
26325cceb0 avutil/internal: Remove unused FF_SYMVER
They are unused since d63443b968.
Furthermore, they were always in the wrong header:
libavutil/internal.h is auto-included almost everywhere, but
FF_SYMVER would only ever be used at a few places, so a proper
header of its own would be appropriate for it.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-09-03 15:41:44 +02:00
Andreas Rheinhardt
5b0856d2b9 avutil/internal: Remove unused ff_rint64_clip()
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-09-03 15:41:44 +02:00
Martin Storsjö
e4ac156b7c libavcodec: Set hidden visibility on global symbols accessed from AArch64 assembly
The AArch64 assembly accesses those symbols directly, without
indirection via e.g. the GOT on ELF. In order for this not to
require text relocations, those symbols need to be resolved fully
at link time, i.e. those symbols can't be interposable.

Normally, so far, this is achieved when linking shared libraries
in two ways; we have a version script (libavcodec/libavcodec.v) which
marks all symbols that don't start with av* as local. Additionally,
we try to add -Wl,-Bsymbolic to the linker options if supported,
making sure that such symbol references are resolved fully at link
time, instead of making them interposable.

When the libavcodec static library is linked into another shared
library, there's no guarantee that it uses similar options (even though
that would be favourable), which would end up requiring text relocations
in the AArch64 assembly.

Explicitly mark the symbols that are accessed from AArch64 assembly
as hidden, so that they are resolved fully at link time even without
the version script and -Wl,-Bsymbolic.

Signed-off-by: Martin Storsjö <martin@martin.st>
2022-09-02 23:13:29 +03:00
Martin Storsjö
0dd8fe6f4b arm: Check the build time constants in av_clip_*intp2
This fixes building for arm targets with optimizations disabled.

Signed-off-by: Martin Storsjö <martin@martin.st>
2022-09-02 23:12:26 +03:00
Philip Langdale
caf26a8a12 lavc/vaapi: Switch preferred 8bit 444 format to VUYX
As vaapi doesn't actually do anything useful with the alpha channel,
and we have an alphaless format available, let's use that instead.

The changes here are mostly 1:1 switching, but do note the explicit
change in the number of declared channels from 4 to 3 to reflect that
the alpha is being ignored.
2022-08-25 19:04:10 -07:00
Philip Langdale
cc5a5c9860 lavu/pixfmt: Introduce VUYX format
This is the alphaless version of VUYA that I introduced recently. After
further discussion and noting that the Intel vaapi driver explicitly
lists XYUV as a support format for encoding and decoding 8bit 444
content, we decided to switch our usage and avoid the overhead of
having a declared alpha channel around.

Note that I am not removing VUYA, as this turned out to have another
use, which was to replace the need for v408enc/dec when dealing with
the format.

The vaapi switching will happen in the next change
2022-08-25 19:02:49 -07:00
Lynne
f932b89ea3
lavu/tx: implement aarch64 NEON SIMD FFT
The fastest fast Fourier transform in not just the west, but the world,
now for the most popular toy ISA.

On a high level, it follows the design of the AVX2 version closely,
with the exception that the input is slightly less permuted as we don't have
to do lane switching with the input on double 4pt and 8pt.

On a low level, the lack of subadd/addsub instructions REALLY penalizes
any attempt at writing an FFT. That single register matters a lot,
and reloading it simply takes unacceptably long.
In x86 land, vendors would've noticed developers need this.
In ARM land, you get a badly designed complex multiplication instruction
we cannot use, that's not present on 95% of devices. Because only
compilers matter, right?

Future optimization options are very few, perhaps better register
management to use more ld1/st1s.

All timings below are in cycles:
A53:
Length | C           | New (lavu)  | Old (lavc)  | FFTW
------ |-------------|-------------|-------------|-----
4      |         842 | 420         | 1210        | 1460
8      |        1538 | 1020        | 1850        | 2520
16     |        3717 | 1900        | 3700        | 3990
32     |        9156 | 4070        | 8289        | 8860
64     |       21160 | 9931        | 18600       | 19625
128    |       49180 | 23278       | 41922       | 41922
256    |      112073 | 53876       | 93202       | 101092
512    |      252864 | 122884      | 205897      | 207868
1024   |      560512 | 278322      | 458071      | 453053
2048   |     1295402 | 775835      | 1038205     | 1020265
4096   |     3281263 | 2021221     | 2409718     | 2577554
8192   |     8577845 | 4780526     | 5673041     | 6802722

Apple M1
New  - Total for len 512 reps 2097152 = 1.459141 s
Old  - Total for len 512 reps 2097152 = 2.251344 s
FFTW - Total for len 512 reps 2097152 = 1.868429 s

New  - Total for len 1024 reps 4194304 = 6.490080 s
Old  - Total for len 1024 reps 4194304 = 9.604949 s
FFTW - Total for len 1024 reps 4194304 = 7.889281 s

New  - Total for len 16384 reps 262144 = 10.374001 s
Old  - Total for len 16384 reps 262144 = 15.266713 s
FFTW - Total for len 16384 reps 262144 = 12.341745 s

New  - Total for len 65536 reps 8192 = 1.769812 s
Old  - Total for len 65536 reps 8192 = 4.209413 s
FFTW - Total for len 65536 reps 8192 = 3.012365 s

New  - Total for len 131072 reps 4096 = 1.942836 s
Old  - Segfaults
FFTW - Total for len 131072 reps 4096 = 3.713713 s

Thanks to wbs for some simplifications, assembler fixes and a review
and to jannau for giving it a look.
2022-08-25 17:40:28 +02:00
Andreas Rheinhardt
0bb0c26799 avutil/mem_internal: Fix headers
Including avassert.h is unnecessary since commit
786be70e28.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-08-24 03:43:52 +02:00
Timo Rothenpieler
ef2c2a2220 avutil/half2float: use native _Float16 if available
_Float16 support was available on arm/aarch64 for a while, and with gcc
12 was enabled on x86 as long as SSE2 is supported.

If the target arch supports f16c, gcc emits fairly efficient assembly,
taking advantage of it. This is the case on x86-64-v3 or higher.
Same goes on arm, which has native float16 support.
On x86, without f16c, it emulates it in software using sse2 instructions.

This has shown to perform rather poorly:

_Float16 full SSE2 emulation:
frame=50074 fps=848 q=-0.0 size=N/A time=00:33:22.96 bitrate=N/A speed=33.9x

_Float16 f16c accelerated (Zen2, --cpu=znver2):
frame=50636 fps=1965 q=-0.0 Lsize=N/A time=00:33:45.40 bitrate=N/A speed=78.6x

classic half2float full software implementation:
frame=49926 fps=1605 q=-0.0 Lsize=N/A time=00:33:17.00 bitrate=N/A speed=64.2x

Hence an additional check was introduced, that only enables use of
_Float16 on x86 if f16c is being utilized.

On aarch64, a similar uplift in performance is seen:

RPi4 half2float full software implementation:
frame= 6088 fps=126 q=-0.0 Lsize=N/A time=00:04:03.48 bitrate=N/A speed=5.06x

RPi4 _Float16:
frame= 6103 fps=158 q=-0.0 Lsize=N/A time=00:04:04.08 bitrate=N/A speed=6.32x

Since arm/aarch64 always natively support 16 bit floats, it can always
be considered fast there.

I'm not aware of any additional platforms that currently support
_Float16. And if there are, they should be considered non-fast until
proven fast.
2022-08-19 22:09:36 +02:00
Timo Rothenpieler
6dc79f1d04 avutil/half2float: move non-inline init code out of header 2022-08-19 22:09:36 +02:00
Timo Rothenpieler
f3fb528cd5 avutil/half2float: move tables to header-internal structs
Having to put the knowledge of the size of those arrays into a multitude
of places is rather smelly.
2022-08-19 22:09:36 +02:00
Timo Rothenpieler
cb8ad005bb avutil/half2float: adjust conversion of NaN
IEEE-754 differentiates two different kind of NaNs.
Quiet and Signaling ones. They are differentiated by the MSB of the
mantissa.

For whatever reason, actual hardware conversion of half to single always
sets the signaling bit to 1 if the mantissa is != 0, and to 0 if it's 0.
So our code has to follow suite or fate-testing hardware float16 will be
impossible.
2022-08-19 22:09:36 +02:00
Timo Rothenpieler
b42925264a avutil: move half-precision float helper to avutil 2022-08-19 22:09:36 +02:00
Lynne
ae66a9db7b
lavu/tx: optimize and simplify inverse MDCTs
Convert the input from a scatter to a gather instead,
which is faster and better for SIMD.
Also, add a pre-shuffled exptab version to avoid
gathering there at all. This doubles the exptab size,
but the speedup makes it worth it. In SIMD, the
exptab will likely be purged to a higher cache
anyway because of the FFT in the middle, and
the amount of loads stays identical.

For a 960-point inverse MDCT, the speedup is 10%.

This makes it possible to write sane and fast SIMD
versions of inverse MDCTs.
2022-08-16 01:22:38 +02:00
Timo Rothenpieler
dd94a03468 avutil/hwcontext_d3d11va: add support for rgbaf16 pixel format 2022-08-13 15:21:59 +02:00
Timo Rothenpieler
e95b08a7dd lavu/pixfmt: add packed RGBA float16 format
This is the default format of the Windows compositor and what DXGI
Desktop Duplication will give you for any kind of HDR output.
2022-08-13 15:21:46 +02:00
Timo Rothenpieler
b77fff47d0 configure: always enable gnu_windres if available
Use the appropiate Makefile variable to ensure the resource file is
only built into shared libraries instead.
2022-08-13 14:42:36 +02:00
Haihao Xiang
05bd88dca2 lavu/hwcontext_qsv: make qsv hwdevice works with oneVPL
In oneVPL, MFXLoad() and MFXCreateSession() are required to create a
workable mfx session[1]

Add config filters for D3D9/D3D11 session (galinart)

The default device is changed to d3d11va for oneVPL when both d3d11va
and dxva2 are enabled on Microsoft Windows

This is in preparation for oneVPL support

[1] https://spec.oneapi.io/versions/latest/elements/oneVPL/source/programming_guide/VPL_prg_session.html#onevpl-dispatcher

Co-authored-by: galinart <artem.galin@intel.com>
Signed-off-by: galinart <artem.galin@intel.com>
Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>
2022-08-12 10:43:39 +08:00
Haihao Xiang
e0bbdbe0a6 lavu/hwcontext_qsv: add loader field to AVQSVDeviceContext
In oneVPL, a valid mfxLoader handle is needed when creating mfx session
for decoding, encoding and processing[1], so add loader field to
AVQSVDeviceContext. User should fill this field before calling
av_hwdevice_ctx_init() if using oneVPL

This is in preparation for oneVPL support

[1]https://spec.oneapi.io/versions/latest/elements/oneVPL/source/programming_guide/VPL_prg_session.html#onevpl-dispatcher
2022-08-12 10:43:39 +08:00
Haihao Xiang
c77149bc37 qsv: restrict OPAQUE memory to MFX_VERSION < 2.0
OPAQUE memory isn't supported for MFX_VERSION >= 2.0[1][2]. This is in
preparation for oneVPL support

[1] https://spec.oneapi.io/versions/latest/elements/oneVPL/source/VPL_intel_media_sdk.html#msdk-full-name-feature-removals
[2] https://github.com/oneapi-src/oneVPL
2022-08-12 10:43:39 +08:00
Haihao Xiang
3e61b7dd7f qsv: remove mfx/ prefix from mfx headers
The following Cflags has been added to libmfx.pc, so mfx/ prefix is no
longer needed when including mfx headers in FFmpeg.
   Cflags: -I${includedir} -I${includedir}/mfx

Some old versions of libmfx have the following Cflags in libmfx.pc
   Cflags: -I${includedir}

We may add -I${includedir}/mfx to CFLAGS when running 'configure
--enable-libmfx' for old versions of libmfx, if so, mfx headers without
mfx/ prefix can be included too.

If libmfx comes without pkg-config support, we may do a small change to
the settings of the environment(e.g. set -I/opt/intel/mediasdk/include/mfx
instead of -I/opt/intel/mediasdk/include to CFLAGS), then the build can
find the mfx headers without mfx/ prefix

After applying this change, we won't need to change #include for mfx
headers when mfx headers are installed under a new directory.

This is in preparation for oneVPL support (mfx headers in oneVPL are
installed under vpl directory)
2022-08-12 10:43:39 +08:00
Andreas Rheinhardt
d576b37fa7 avutil/buffer: Never poison returned buffers
Poisoning returned buffers is based around the implicit assumption
that the contents of said buffers are transient. Yet this is not true
for the buffer pools used by the various hardware contexts which store
important state in there that needs to be preserved.
Furthermore, the current code is also based on the assumption
that the complete buffer pointed to by AVBuffer->data coincides with
AVBufferRef->data; yet an implementation might store some data of its
own before the actual user-visible data (accessible via AVBufferRef)
which would be broken by the current code.

(This is of course yet more proof that the AVBuffer API is not the right
tool for the hardware contexts.)

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-08-10 18:49:35 +02:00
Lynne
98b32ef462
x86/tx_float: save a branch during coefficient deinterleaving
Directly branch into the special 64-point deinterleave
subroutine rather than going through the general deinterleave.

64-point transform timings on Zen 3:
Before:
   1974 decicycles in           av_tx (fft),16776864 runs,    352 skips
After:
   1956 decicycles in           av_tx (fft),16775378 runs,   1838 skips
2022-08-09 03:35:12 +02:00
Zhao Zhili
fc13803323 avutil/hwcontext_videotoolbox: add missing include for AVFrame
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
2022-08-08 11:08:55 +08:00
James Almer
85c59bd6de avutil/test/pixfmt_best: test the VUYA pixel format
Signed-off-by: James Almer <jamrial@gmail.com>
2022-08-07 09:33:16 -03:00
Andreas Rheinhardt
2c8dc7e953 avcodec/loongarch/h264chroma, vc1dsp_lasx: Add wrapper for __lasx_xvldx
__lasx_xvldx does not accept a pointer to const (in fact,
no function in lasxintrin.h does so), although it is not allowed
to modify the pointed-to buffer. Therefore this commit adds a wrapper
for it in order to constify the H264Chroma API in a later commit.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-08-05 02:59:58 +02:00
Andreas Rheinhardt
6c9a60ada4 avcodec/loongarch: Add wrapper for __lsx_vldx
__lsx_vldx does not accept a pointer to const (in fact,
no function in lsxintrin.h does so), although it is not allowed
to modify the pointed-to buffer. Therefore this commit adds a wrapper
for it in order to constify the HEVC DSP functions in a later commit.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-08-05 02:53:35 +02:00
Philip Langdale
2b720676e0 lavu/hwcontext_vaapi: Map the AYUV format
This is the format used by Intel VAAPI for 8bit 4:4:4 content.
2022-08-03 14:10:12 -07:00
Philip Langdale
6ab8a9d375 lavu/pixfmt: Add packed 4:4:4 format
The "AYUV" format is defined by Microsoft as their preferred format for
4:4:4 content, and so it is the format used by Intel VAAPI and QSV.

As Microsoft like to define their byte ordering in little-endian
fashion, the memory order is reversed, and so our pix_fmt, which
follows memory order, has a reversed name (VUYA).
2022-08-03 14:09:46 -07:00
Andreas Rheinhardt
8d7d52721a avutil/opt: Combine multiple av_log statements
Reviewed-by: Thilo Borgmann <thilo.borgmann@mail.de>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-08-03 21:09:24 +02:00
Anton Khirnov
eede1d2927 lavu/frame: allow calling av_frame_make_writable() on non-refcounted frames
This is an easy way to make a refcounted frame from a non-refcounted
one.
2022-08-02 10:44:37 +02:00
Anton Khirnov
4397f9a5a0 lavu/frame: add a duration field to AVFrame
The only duration field currently present in AVFrame is pkt_duration,
which is semantically restricted to those frames that are output by
decoders.

Add a new field that stores the frame's duration without regard for how
that frame was produced. Deprecate pkt_duration.
2022-07-19 12:27:17 +02:00
Timo Rothenpieler
63ce42019c avutil/hwcontext_d3d11va: add BGRA/RGBA10 formats support
Desktop duplication outputs those
2022-07-18 00:32:14 +02:00
Timo Rothenpieler
6cbb7d673d avutil/hwcontext_d3d11va: update hwctx flags from input texture
At least QSV relies on those being set correctly when deriving a hwctx.
2022-07-18 00:32:14 +02:00
Timo Rothenpieler
30bbc0a624 avutil/hwcontext_d3d11va: fix texture_infos writes on non-fixed-size pools 2022-07-18 00:32:14 +02:00
Timo Rothenpieler
e18c575474 avutil/hwcontext_d3d11va: fix mixed declaration and code 2022-07-18 00:32:14 +02:00
Michael Niedermayer
fd26b07e8b Bump versions after 5.1 branch
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2022-07-13 00:29:05 +02:00
Michael Niedermayer
6f1b144358 Bump Versions for 5.1 branch
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2022-07-13 00:27:37 +02:00
Paul B Mahol
6ed9eaf664 avfilter: add remap opencl filter 2022-07-07 17:52:32 +02:00
Andreas Rheinhardt
aca09ed7d4 avutil/mem: Handle fast allocations near UINT_MAX properly
av_fast_realloc and av_fast_mallocz? store the size of
the objects they allocate in an unsigned. Yet they overallocate
and currently they can allocate more than UINT_MAX bytes
in case a user has requested a size of about UINT_MAX * 16 / 17
or more if SIZE_MAX > UINT_MAX (and if the user increased
max_alloc_size via av_max_alloc). In this case it is impossible
to store the true size of the buffer via the unsigned*;
future requests are likely to use the (re)allocation codepath
even if the buffer is actually large enough because of
the incorrect size.

Fix this by ensuring that the actually allocated size
always fits into an unsigned. (This entails erroring out
in case the user requested more than UINT_MAX.)

Reviewed-by: Tomas Härdin <tjoppen@acc.umu.se>
Reviewed-by: Anton Khirnov <anton@khirnov.net>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-07-06 22:53:15 +02:00
Lynne
f9dd8fcf9b
)hwcontext: add a stub implementation for Vulkan functions 2022-07-05 15:20:08 +02:00
James Almer
0afdc95767 Revert "avutil/channel_layout: av_channel_layout_describe_bprint: Check for buffer end"
The doxy for av_channel_layout_describe() states that the user should look
at the return value to check if the string was truncated. Returning an error
code in this scenario goes against this and is an API break.

A proper fix for the timeout was applied to the Matroska demuxer in 94901a9518.

This reverts commit 8154cb7c2f.
2022-07-04 14:04:54 -03:00
Michael Niedermayer
8154cb7c2f avutil/channel_layout: av_channel_layout_describe_bprint: Check for buffer end
Fixes: Timeout printing a billion channels
Fixes: 48099/clusterfuzz-testcase-minimized-ffmpeg_dem_MATROSKA_fuzzer-6754782204788736

Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2022-07-02 19:22:36 +02:00
Andreas Rheinhardt
4454142782 avutil/wchar_filename: Make the header C++ compatible
When compiling decklink, this header is included from
a C++ file (albeit inside 'extern "C"') and this
causes compilation failures because of an implicit
void* -> char* conversion. So add an explicit cast.
Fixes ticket #9819.

Reviewed-by: Martin Storsjö <martin@martin.st>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-06-28 10:59:31 +02:00
Brad Smith
beaf172d75 avutil/ppc/cpu: Use proper header for OpenBSD PPC CPU detection
Use the proper header for PPC CPU detection code. sys/param.h includes
sys/types, but sys/types.h is the more appropriate header to be used
here.

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2022-06-25 12:16:51 +02:00
Andreas Rheinhardt
2718a3be1f avutil/x86/float_dsp: Remove obsolete 3dnowext function
x64 always has MMX, MMXEXT, SSE and SSE2 and this means
that some functions for MMX, MMXEXT, SSE and 3dnow are always
overridden by other functions (unless one e.g. explicitly
disables SSE2). So given that the only systems which benefit
from ff_vector_fmul_window_3dnowext are truely ancient 32bit
AMD x86s it is removed.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-06-22 13:37:22 +02:00
Andreas Rheinhardt
ea043cc53e avutil/x86/pixelutils: Remove obsolete MMX(EXT) functions
x64 always has MMX, MMXEXT, SSE and SSE2 and this means
that some functions for MMX, MMXEXT, SSE and 3dnow are always
overridden by other functions (unless one e.g. explicitly
disables SSE2). So given that the only systems which benefit
from the 8x8 MMX (overridden by MMXEXT) or the 16x16 MMXEXT
(overridden by SSE2) are truely ancient 32bit x86s they are removed.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-06-22 13:36:44 +02:00
Nil Admirari
cc5844da98 libavutil: Add wchartoutf8(), wchartoansi(), utf8toansi(), getenv_utf8(), freeenv_utf8() and getenv_dup()
wchartoutf8() converts strings returned by WinAPI into UTF-8,
which is FFmpeg's preffered encoding.

Some external dependencies, such as AviSynth, are still
not Unicode-enabled. utf8toansi() converts UTF-8 strings
into ANSI in two steps: UTF-8 -> wchar_t -> ANSI.
wchartoansi() is responsible for the second step of the conversion.
Conversion in just one step is not supported by WinAPI.

Since these character converting functions allocate the buffer
of necessary size, they also facilitate the removal of MAX_PATH limit
in places where fixed-size ANSI/WCHAR strings were used
as filename buffers.

On Windows, getenv_utf8() wraps _wgetenv() converting its input from
and its output to UTF-8. Strings returned by getenv_utf8()
must be freed by freeenv_utf8().

On all other platforms getenv_utf8() is a wrapper around getenv(),
and freeenv_utf8() is a no-op.

The value returned by plain getenv() cannot be modified;
av_strdup() is usually used when modifications are required.
However, on Windows, av_strdup() after getenv_utf8() leads to
unnecessary allocation. getenv_dup() is introduced to avoid
such an allocation. Value returned by getenv_dup() must be freed
by av_free().

Because of cleanup complexities, in places that only test the existence
of an environment variable or compare its value with a string
consisting entirely of ASCII characters, the use of plain getenv()
is still preferred. (libavutil/log.c check_color_terminal()
is an example of such a place.)

Plain getenv() is also preffered in UNIX-only code,
such as bktr.c, fbdev_common.c, oss.c in libavdevice
or af_ladspa.c in libavfilter.

Signed-off-by: Martin Storsjö <martin@martin.st>
2022-06-21 13:27:46 +03:00
Andreas Rheinhardt
ac322ec214 avutil/cpu_internal: Fix check for SSE2SLOW
For SSE2 and SSE3, there are four states that the two flags
involved (AV_CPU_FLAG_SSE[23] and AV_CPU_FLAG_SSE[23]SLOW) can convey.
When ordered from worst to best they are:
1. both flags unset (SSE[23] unavailable)
2. the slow flag set, the ordinary flag unset (this is designed
for cases where SSE2 is available, but so slow that MMX(EXT)/SSE
code is usually faster)
3. both flags set (SSE2 is available, but there might be scenarios
where MMX(EXT)/SSE code is faster)
4. the ordinary flag set, the slow flag unset (this is the normal case)

The ordinary macros for checking cpuflags return true
in the latter two cases; the fast macros only return true for
the latter case. Yet the macros to check for slow currently
only return true in case three.

This seems unintended. In fact, the only uses of the slow macros
are all of the form
if (EXTERNAL_SSE2(cpu_flags) || EXTERNAL_SSE2_SLOW(cpu_flags))
where the check for EXTERNAL_SSE2_SLOW is completely redundant.
Even more importantly, it is not what was intended. Before
6369ba3c9c, the checks passed
in cases 2 to 4. Said commit changed this to something that
only passes for the third case. Commits
7fb758cd8e and
c1913064e3 restored the old behaviour,
yet merging 4efab89332 (in commit
ac774cfa57) broke this again
by changing it to what it is now.*

This commit changes the macros to make the slow macros check
whether a specific instruction is supported, even if slow.
This restores the intended meaning to all uses of the SLOW macros
and is generally more natural.

*: Libav only checks for EXTERNAL_SSE2_SLOW, i.e. for the third case
only.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-06-18 19:25:03 +02:00
Andreas Rheinhardt
40e6575aa3 all: Replace if (ARCH_FOO) checks by #if ARCH_FOO
This is more spec-compliant because it does not rely
on dead-code elimination by the compiler. Especially
MSVC has problems with this, as can be seen in
https://ffmpeg.org/pipermail/ffmpeg-devel/2022-May/296373.html
or
https://ffmpeg.org/pipermail/ffmpeg-devel/2022-May/297022.html

This commit does not eliminate every instance where we rely
on dead code elimination: It only tackles branching to
the initialization of arch-specific dsp code, not e.g. all
uses of CONFIG_ and HAVE_ checks. But maybe it is already
enough to compile FFmpeg with MSVC with whole-programm-optimizations
enabled (if one does not disable too many components).

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-06-15 04:56:37 +02:00
Zane van Iperen
ff59ecc4de avutil: bump version after UUID changes
Forgot to bump after 76e95daa08.

Signed-off-by: Zane van Iperen <zane@zanevaniperen.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2022-06-13 22:01:28 -03:00
Pierre-Anthony Lemieux
7c2f029ede
avutil/tests/uuid: add uuid tests 2022-06-12 18:34:37 +10:00
Zane van Iperen
76e95daa08
avutil/uuid: add utility library for manipulating UUIDs as specified in RFC 4122
Co-authored-by: Pierre-Anthony Lemieux <pal@palemieux.com>
Signed-off-by: Zane van Iperen <zane@zanevaniperen.com>
2022-06-12 18:34:28 +10:00
softworkz
c5aba39a04 avutil/wchar_filename,file_open: Support long file names on Windows
Signed-off-by: softworkz <softworkz@hotmail.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
2022-06-09 13:03:47 +03:00
softworkz
22ab2a375d libavutil/tests/md5: Remove 'volatile workaround' to avoid warnings
Those are always showing up on Patchwork when FATE tests are failing,
covering some possibly more useful information.

The volatile keyword was used as a workaround for an eight year old
clang version.

Signed-off-by: softworkz <softworkz@hotmail.com>
Signed-off-by: Marton Balint <cus@passwd.hu>
2022-06-08 22:24:31 +02:00
James Almer
5929ea6d4b avutil/avframe: fix channel layout checks in av_frame_copy()
Normally, both the source and dest frame would have only the old API fields
set, only the new API fields set, or both set. But in some cases, like when
calling av_frame_ref() using a non reference counted source frame where only
the old channel layout API fields were populated, the result would be the dst
frame having both the new and old fields populated.

This commit takes this into account and fixes the checks by calling
av_channel_layout_compare() only if the source frame has the new API fields
set, and doing sanity checks for the source frame old API fields if the new
ones are not set.

Signed-off-by: James Almer <jamrial@gmail.com>
2022-06-05 09:09:07 -03:00
Leo Izen
d42b410e05 avutil/csp: create public API for colorspace structs
This commit moves some of the functionality from avfilter/colorspace
into avutil/csp and exposes it as a public API so it can be used by
libavcodec and/or libavformat. It also converts those structs from
double values to AVRational to make regression testing easier and
more consistent.

Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2022-06-01 13:52:38 -04:00
Zhao Zhili
5d8d3c1ac2 avutil/mem: fix doc for reallocs
The doc says those function are like av_free if size or nmemb is
zero. It doesn't match the code. av_realloc() realloc one byte if
size is zero, which was added by 91ff05f6ac ten years ago.
realloc() itself in C is implementation-dependent. Make the doc
match the longstanding behaviour.

Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
2022-05-26 17:18:23 +08:00
Haihao Xiang
478e1a98a2 qsv: add requirement for the mininal version of libmfx
libmfx 1.28 was released 3 years ago, it is easy to get a greater
version than 1.28. We may remove lots of compile-time checks if adding
the requirement for the minimal version in the configure script.

Reviewed-by: softworkz <softworkz@hotmail.com>
Signed-off-by: Jean-Baptiste Kempf <jb@videolan.org>
Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>
2022-05-25 15:17:35 +08:00
Martin Storsjö
4cdc14aa95 libavutil: Deprecate av_fopen_utf8, provide an avpriv version
Since every DLL can use an individual CRT on Windows, having
an exported function that opens a FILE* won't work if that
FILE* is going to be used from a different DLL (or from user
application code).

Internally within the libraries, the issue can be worked around
by duplicating the function in all libraries (this already happened
implicitly because the function resided in file_open.c) and renaming
the function to ff_fopen_utf8 (so that it doesn't end up exported from
the DLLs) and duplicating it in all libraries that use it.

This makes the avpriv_fopen_utf8 / ff_fopen_utf8 function work in
the exact same way as the existing avpriv_open / ff_open, with the
same setup as introduced in e743e7ae6e.

That mechanism doesn't work for external users, thus deprecate the
existing function.

Signed-off-by: Martin Storsjö <martin@martin.st>
2022-05-23 13:52:26 +03:00
Tong Wu
1f9b5fa581 avutil/hwcontext_qsv: fix mapping issue between QSV frames and D3D11VA frames
Fixes:
$ ffmpeg.exe -init_hw_device d3d11va=d3d11 -init_hw_device \
qsv=qsv@d3d11 -s:v WxH -pix_fmt nv12 -i input.yuv -vf \
"hwupload=extra_hw_frames=16,hwmap=derive_device=d3d11va,format=d3d11,\
hwmap=derive_device=qsv,format=qsv" -f null -

Reviewed-by: Soft Works <softworkz@hotmail.com>
Signed-off-by: Tong Wu <tong1.wu@intel.com>
Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>
2022-05-23 15:10:05 +08:00
Tong Wu
20807a9d61 avutil/hwcontext_d3d11va: pass the format value from outside for staging texture
In d3d11va_create_staging_texture(), during the hwmap process, the
ctx->internal->priv is not initialized, resulting in the
texDesc.Format not initialized. Now pass the format value from
d3d11va_transfer_data() to fix it.

$ ffmpeg.exe -y -hwaccel qsv -init_hw_device d3d11va=d3d11 \
-init_hw_device qsv=qsv@d3d11 -c:v h264_qsv \
-i input.h264 -vf "hwmap=derive_device=d3d11va,format=d3d11,hwdownload,format=nv12" \
-f null -

Reviewed-by: Soft Works <softworkz@hotmail.com>
Signed-off-by: Tong Wu <tong1.wu@intel.com>
Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>
2022-05-23 15:10:05 +08:00
Tong Wu
632db3c36d avutil/hwcontext_qsv: derive QSV frames to D3D11VA frames
Fixes:
$ ffmpeg.exe -y -hwaccel qsv -init_hw_device d3d11va=d3d11 \
-init_hw_device qsv=qsv@d3d11 -c:v h264_qsv -i input.h264 \
-vf "hwmap=derive_device=d3d11va,format=d3d11" -f null -

Reviewed-by: Soft Works <softworkz@hotmail.com>
Signed-off-by: Tong Wu <tong1.wu@intel.com>
Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>
2022-05-23 15:10:05 +08:00
Lynne
27cffd16aa
x86/tx_float: replace fft_sr_avx with fft_sr_fma3
When the SLOW_GATHER flag was added to the AVX2 version, this
made FMA3-features not enabled on Zen CPUs.
As FMA3 adds 6-7% across all platforms that support it, in
the interest of saving space, this commit removes the AVX
version and replaces it with an FMA3 version.
The only CPUs affected are Sandy Bridge and Bulldozer, which
have AVX support, but no FMA3 support.
In the future, if there's a demand for it, a version of the
function duplicated for AVX can be added.
2022-05-21 02:11:50 +02:00
Lynne
0938ff9701
x86/tx_float: improve temporary register allocation for loads
On Zen 3:

Before:
1484285 decicycles in           av_tx (fft),  131072 runs,      0 skips

After:
1415243 decicycles in           av_tx (fft),  131072 runs,      0 skips
2022-05-21 02:11:45 +02:00
Lynne
805e8d1921
lavu/tx: make slow ISA extension penalties smarter
Instead of having a fixed -64 prio penalty, make the penalties
more granular.
As the prio is based on the register size in bits, decrementing
it by 129 makes AVX SLOW functions be avoided in favor of any
SSE versions.
2022-05-21 02:10:14 +02:00
Lynne
19c0bb2aa9
x86/tx_float: add AV_CPU_FLAG_AVXSLOW/SLOW_GATHER flags where appropriate 2022-05-21 02:10:09 +02:00