Commit Graph

480 Commits

Author SHA1 Message Date
Hao Chen
38cacce22a
swscale/la: Optimize hscale functions with lasx.
ffmpeg -i 1_h264_1080p_30fps_3Mbps.mp4 -f rawvideo -s 640x480 -y /dev/null -an
before: 101fps
after:  138fps

Signed-off-by: Hao Chen <chenhao@loongson.cn>
Reviewed-by: yinshiyou-hf@loongson.cn
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2022-09-10 22:56:38 +02:00
Philip Langdale
09a8e5debb swscale/output: add support for Y210LE and Y212LE 2022-09-10 12:29:12 -07:00
Philip Langdale
68181623e9 swscale/output: add support for XV30LE 2022-09-10 12:29:12 -07:00
Philip Langdale
366f073c62 swscale/output: add support for XV36LE 2022-09-10 12:29:12 -07:00
Philip Langdale
caf8d4d256 swscale/output: add support for P012
This generalises the existing P010 support.
2022-09-10 12:29:12 -07:00
Philip Langdale
4a59eba227 swscale/input: add support for Y212LE 2022-09-06 12:49:10 -07:00
Philip Langdale
198b5b90d5 swscale/input: add support for XV30LE 2022-09-06 12:49:10 -07:00
Philip Langdale
5bdd726115 swscale/input: add support for P012
As we now have three of these formats, I added macros to generate the
conversion functions.
2022-09-06 12:49:10 -07:00
Philip Langdale
8d9462844a swscale/input: add support for XV36LE 2022-09-06 12:49:10 -07:00
Philip Langdale
45726aa117 libswscale: add support for VUYX format
As we already have support for VUYA, I figured I should do the small
amount of work to support VUYX as well. That means a little refactoring
to share code.
2022-08-25 19:03:49 -07:00
Timo Rothenpieler
aca569aad2 swscale/input: add rgbaf16 input support
This is by no means perfect, since at least ddagrab will return scRGB
data with values outside of 0.0f to 1.0f for HDR values.
Its primary purpose is to be able to work with the format at all.
2022-08-19 22:09:36 +02:00
Alan Kelly
a38293e444 libswscale: Enable hscale_avx2 for all input sizes.
ff_shuffle_filter_coefficients shuffles the tail as required.

Signed-off-by: Anton Khirnov <anton@khirnov.net>
2022-08-18 16:24:48 +02:00
James Almer
1974813261 swscale/output: add VUYA output support
Signed-off-by: James Almer <jamrial@gmail.com>
2022-08-07 09:33:16 -03:00
James Almer
f0abd07996 swscale/input: add VUYA input support
Reviewed-by: Philip Langdale <philipl@overt.org>
Signed-off-by: James Almer <jamrial@gmail.com>
2022-08-05 09:39:21 -03:00
Matthieu Bouron
0a6bb7da55 swscale: add NV16 input/output
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2022-07-19 12:20:16 +02:00
Andreas Rheinhardt
40e6575aa3 all: Replace if (ARCH_FOO) checks by #if ARCH_FOO
This is more spec-compliant because it does not rely
on dead-code elimination by the compiler. Especially
MSVC has problems with this, as can be seen in
https://ffmpeg.org/pipermail/ffmpeg-devel/2022-May/296373.html
or
https://ffmpeg.org/pipermail/ffmpeg-devel/2022-May/297022.html

This commit does not eliminate every instance where we rely
on dead code elimination: It only tackles branching to
the initialization of arch-specific dsp code, not e.g. all
uses of CONFIG_ and HAVE_ checks. But maybe it is already
enough to compile FFmpeg with MSVC with whole-programm-optimizations
enabled (if one does not disable too many components).

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-06-15 04:56:37 +02:00
Swinney, Jonathan
0ea61725b1 swscale/aarch64: add hscale specializations
This patch adds code to support specializations of the hscale function
and adds a specialization for filterSize == 4.

ff_hscale8to15_4_neon is a complete rewrite. Since the main bottleneck
here is loading the data from src, this data is loaded a whole block
ahead and stored back to the stack to be loaded again with ld4. This
arranges the data for most efficient use of the vector instructions and
removes the need for completion adds at the end. The number of
iterations of the C per iteration of the assembly is increased from 4 to
8, but because of the prefetching, there must be a special section
without prefetching when dstW < 16.

This improves speed on Graviton 2 (Neoverse N1) dramatically in the case
where previously fs=8 would have been required.

before: hscale_8_to_15__fs_8_dstW_512_neon: 1962.8
after : hscale_8_to_15__fs_4_dstW_512_neon: 1220.9

Signed-off-by: Jonathan Swinney <jswinney@amazon.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
2022-05-28 01:09:05 +03:00
Andreas Rheinhardt
f2b79c5b85 lib*/version: Move library version functions into files of their own
This avoids having to rebuild big files every time FFMPEG_VERSION
changes (which it does with every commit).

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-05-10 06:49:32 +02:00
Martin Storsjö
6cd2ac388d libswscale: Split version.h
Signed-off-by: Martin Storsjö <martin@martin.st>
2022-03-16 14:05:26 +02:00
Martin Storsjö
c523724c69 swscale: Take the destination range into account for yuv->rgb->yuv conversions
The range parameters need to be set up before calling
sws_init_context (which selects which fastpaths can be used;
this gets called by sws_getContext); solely passing them via
sws_setColorspaceDetails isn't enough.

This fixes producing full range YUV range output when doing
YUV->YUV conversions between different YUV color spaces.

Signed-off-by: Martin Storsjö <martin@martin.st>
2022-02-25 11:01:17 +02:00
Alan Kelly
e534d98af3 libswscale: Re-factor ff_shuffle_filter_coefficients.
Make the code more readable and follow the style guide.

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2022-02-17 17:17:22 +01:00
Alan Kelly
f1a5414c97 libswscale: Check and propagate memory allocation errors from ff_shuffle_filter_coefficients.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2022-02-17 17:17:07 +01:00
rcombs
88d804b7ff swscale: add P210/P410/P216/P416 output 2021-12-22 18:38:40 -06:00
Alan Kelly
eebe406c80 libswscale: Test AV_CPU_FLAG_SLOW_GATHER for hscale functions.
This is instead of EXTERNAL_AVX2_FAST so that the avx2 hscale functions
are only used where they are faster.
2021-12-21 17:44:53 -03:00
Alan Kelly
f900a19fa9 libswscale: Adds ff_hscale8to15_4_avx2 and ff_hscale8to15_X4_avx2 for all filter sizes.
Fixes so that fate under 64 bit Windows passes.

These functions replace all ff_hscale8to15_*_ssse3 when avx2 is available.

Signed-off-by: James Almer <jamrial@gmail.com>
2021-12-15 20:04:59 -03:00
rcombs
f0204de47d swscale: add P210/P410/P216/P416 input 2021-11-28 16:40:43 -06:00
Michael Niedermayer
5f3a160b42 swscale/utils: Improve return codes of sws_setColorspaceDetails()
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2021-10-24 16:54:36 +02:00
Michael Niedermayer
c7699f95bb swscale/utils: Set all threads to the same colorspace even on failure
Fixes: ./ffplay dav.y4m -vf "scale=hd1080:threads=4"
Found-by: Paul
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2021-10-24 16:54:36 +02:00
Manuel Stoeckl
32329397e2 swscale: add input/output support for X2BGR10LE
Signed-off-by: Manuel Stoeckl <code@mstoeckl.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2021-09-26 16:26:10 +02:00
Andreas Rheinhardt
1ea3650823 Replace all occurences of av_mallocz_array() by av_calloc()
They do the same.

Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2021-09-20 01:03:52 +02:00
Andreas Rheinhardt
f440c422b7 swscale/swscale: Fix races when using unaligned strides/data
In this case the current code tries to warn once; to do so, it uses
ordinary static ints to store whether the warning has already been
emitted. This is both a data race (and therefore undefined behaviour)
as well as a race condition, because it is really possible for multiple
threads to be the one thread to emit the warning. This is actually
common since the introduction of the new multithreaded scaling API.

This commit fixes this by using atomic integers for the state;
furthermore, these are not static anymore, but rather contained
in the user-facing SwsContext (i.e. the parent SwsContext in case
of slice-threading).

Given that these atomic variables are not intended for synchronization
at all (but only for atomicity, i.e. only to output the warning once),
the atomic operations use memory_order_relaxed.

This affected the nv12, nv21, yuv420, yuv420p10, yuv422, yuv422p10 and
yuv444 filter-overlay FATE-tests.

Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2021-09-19 23:52:37 +02:00
Andreas Rheinhardt
a1255a350d libswscale/options: Add parent_log_context_offset to AVClass
This allows to associate log messages from slice contexts to
the user-visible SwsContext.

Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2021-09-19 23:52:37 +02:00
Anton Khirnov
d6fdc78e91 sws: implement slice threading 2021-09-06 09:17:53 +02:00
Anton Khirnov
42cd64c182 sws: add a new scaling API 2021-09-06 09:16:52 +02:00
Michael Niedermayer
fa1e158ef6 swscale/utils: Use full chroma interpolation for rgb4/8 and dither none
Dither none is only implemented in full chroma interpolation for these rgb formats
Its also a obscure choice (producing less nice images) that implementing it in the
other code-paths makes no sense

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2021-07-09 12:29:03 +02:00
Anton Khirnov
1f80789bf7 sws: rename SwsContext.swscale to convert_unscaled
That function pointer is now used only for unscaled conversion.
2021-07-03 15:57:53 +02:00
Anton Khirnov
fe490ec165 sws: separate the calls to scaled vs unscaled conversion
Call the scaler function directly rather than through a function
pointer. Drop the now-unused return value from ff_getSwsFunc() and
rename the function to reflect its new role.

This will be useful in the following commits, where it will become
important that the amount of output is different for scaled vs unscaled
case.
2021-07-03 15:57:13 +02:00
Anton Khirnov
0f8e0957d2 sws: do not reallocate scratch buffers for each slice 2021-07-03 15:56:16 +02:00
Peter Lundblad
da0abbbb01 libswscale: Make sws_init_context thread safe.
Call ff_sws_rgb2rgb_init via ff_thread_once instead of checking one of the
variables it updates.

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2021-07-01 23:49:41 +02:00
Andreas Rheinhardt
ea2d9b7a2e libswscale: Remove unused deprecated functions, make used ones static
Deprecated in 3b905b9fe6.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2021-04-27 10:43:11 -03:00
Jan Ekström
7ea4bcff7b swscale/utils: override forced-zero formats back to full range
Fixes vf_scale outputting RGB AVFrames with limited range flagged
in case either input or output specifically sets the range.

This is the reverse of the logic utilized for RGB and PAL8 content
in sws_setColorspaceDetails.
2020-10-11 12:58:13 +03:00
Jan Ekström
3fe24fe232 swscale/utils: split range override check into its own function 2020-10-11 12:58:13 +03:00
Paul B Mahol
9d58cdb4ba swscale: do not drop half of bits from 16bit bayer formats 2020-08-08 12:03:42 +02:00
Limin Wang
67a07dc778 swscale/utils: return better error code from initFilter()
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Limin Wang <lance.lmwang@gmail.com>
2020-06-14 21:54:40 +08:00
Limin Wang
8efecc9063 swscale/utils: reindent
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Limin Wang <lance.lmwang@gmail.com>
2020-06-14 21:54:40 +08:00
Limin Wang
a408d03ee6 swscale/utils: remove FF_ALLOC_ARRAY_OR_GOTO macros
Signed-off-by: Limin Wang <lance.lmwang@gmail.com>
2020-06-13 06:59:19 +08:00
Fei Wang
c721b45014 swscale: Add swscale input/output support for X2RGB10LE
Signed-off-by: Fei Wang <fei.w.wang@intel.com>
2020-06-12 17:56:15 +01:00
Mark Reid
fabeef22d9 libswscale: fix for floating point formats, require full chroma
upon more floating point testing, looks like I missed adding this bit.

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2020-05-12 01:00:28 +02:00
Mark Reid
b4967fc71c libswscale: add output support for AV_PIX_FMT_GBRAPF32
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2020-05-05 20:06:58 +02:00
Mark Reid
ba5d0515a6 libswscale: add input support AV_PIX_FMT_GBRAPF32
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2020-05-05 20:06:58 +02:00