Commit Graph

987 Commits

Author SHA1 Message Date
Anton Khirnov
1d6c76e11f audiodsp/x86: fix ff_vector_clip_int32_sse2
This version, which is the only one doing two processing cycles per loop
iteration, computes the load/store indices incorrectly for the second
cycle.

CC: libav-stable@libav.org
2016-09-19 19:18:07 +02:00
Diego Biurrun
de452e5037 pixblockdsp: Change type of stride parameters to ptrdiff_t
This avoids SIMD-optimized functions having to sign-extend their
line size argument manually to be able to do pointer arithmetic.

Also adjust parameter names to be "stride" everywhere.
2016-09-14 14:12:36 +02:00
Diego Biurrun
721d57e608 vp56: Separate VP5 and VP6 dsp initialization
VP5 has no arch-specific optimizations (nor will it get some in the
future), so it makes no sense to try to share dsp init code with VP6.
2016-08-26 11:50:22 +02:00
Diego Biurrun
3fd22538bc prores: Change type of stride parameters to ptrdiff_t
This avoids SIMD-optimized functions having to sign-extend their
line size argument manually to be able to do pointer arithmetic.

Also adjust parameter names to be "linesize" everywhere.
2016-08-26 11:50:21 +02:00
Diego Biurrun
f81be06cf6 cavs: Change type of stride parameters to ptrdiff_t
ptrdiff_t is the correct type for array strides and similar.
2016-08-26 11:48:15 +02:00
Diego Biurrun
802727b538 vp8: Update some assembly comments left unchanged in bd66f073fe 2016-08-26 11:36:53 +02:00
Diego Biurrun
d9d26a3674 vp56: Change type of stride parameters to ptrdiff_t
This avoids SIMD-optimized functions having to sign-extend their
line size argument manually to be able to do pointer arithmetic.
2016-08-26 11:36:26 +02:00
Diego Biurrun
6892df9294 vp3: Change type of stride parameters to ptrdiff_t
This avoids SIMD-optimized functions having to sign-extend their
stride argument manually to be able to do pointer arithmetic.

Also adjust parameter names to be "stride" everywhere.
2016-08-26 11:36:26 +02:00
Diego Biurrun
e2b9993558 simple_idct: x86: Drop disabled IDCT implementation
This gem has been disabled since 2001.
2016-08-17 12:21:54 +02:00
Ronald S. Bultje
9790b44a89 vp9mc/x86: sse2 MC assembly.
Also a slight change to the ssse3 code, which prevents a theoretical
overflow in the sharp filter.

Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-08-03 11:04:51 +02:00
James Almer
67922b4ee4 vp9mc/x86: add AVX and AVX2 MC
Roughly 25% faster MC than ssse3 for blocksizes 32 and 64.

Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-08-03 11:00:08 +02:00
Clément Bœsch
3cda179f18 vp9mc/x86: rename ff_* to ff_vp9_*
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-08-03 10:57:55 +02:00
James Almer
8be8444d01 vp9mc/x86: rename ff_avg[48]_sse to ff_avg[48]_mmxext
pavgb is an sse integer instruction, so the mmxext flag is enough

Signed-off-by: James Almer <jamrial@gmail.com>
Reviewed-by: "Ronald S. Bultje" <rsbultje@gmail.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-08-03 10:57:55 +02:00
Clément Bœsch
6ab642d69d vp9mc/x86: simplify a few inits.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-08-03 10:57:55 +02:00
Ronald S. Bultje
3a09494939 vp9mc/x86: add 16px functions (64bit only).
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-08-03 10:57:55 +02:00
Anton Khirnov
89466de4ae vp9/x86: rename vp9dsp to vp9mc
It only contains the MC SIMD, other SIMD will go into different files.
2016-08-03 10:57:50 +02:00
Christophe Gisquet
3c504bc359 x86: deduplicate some constants
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-08-03 10:56:52 +02:00
Diego Biurrun
d06dfaa5cb x86: huffyuv: Use EXTERNAL_SSSE3_FAST convenience macro where appropriate 2016-07-20 18:43:28 +02:00
Diego Biurrun
4efab89332 x86: Use *_FAST/*_SLOW CPU feature detection macros where appropriate 2016-07-20 18:43:28 +02:00
Diego Biurrun
0a39c9ac0b x86: hpeldsp: Don't check for bitexact flag when initializing VP3-specific code
That code is only ever initialized with that flag set.
2016-07-20 18:37:45 +02:00
Diego Biurrun
95c1df929b x86: hpeldsp: Drop unused function parameters 2016-07-20 18:33:26 +02:00
Diego Biurrun
c3e83ad3b7 x86: hpeldsp: Use EXTERNAL_SSE2_FAST where appropriate 2016-07-20 18:33:26 +02:00
Diego Biurrun
1dfc3cf89d x86: hpeldsp: Split off VP3-specific bits into a separate file 2016-07-20 18:33:25 +02:00
James Almer
fca3c3b619 hevc: Add AVX2 DC IDCT
Originally written by Pierre Edouard Lepere <pierre-edouard.lepere@insa-rennes.fr>.
Integrated to Libav by Josh de Kock <josh@itanimul.li>.

Signed-off-by: Alexandra Hájková <alexandra@khirnov.net>
2016-07-18 15:27:13 +02:00
Clément Bœsch
4a081f224e libavcodec: fix constness in clobber test avcodec_open2() wrappers
Signed-off-by: Martin Storsjö <martin@martin.st>
2016-06-26 21:34:04 +03:00
Anton Khirnov
9df889a5f1 h264: rename h264.[ch] to h264dec.[ch]
This is more consistent with the naming of other decoders.
2016-06-21 11:11:26 +02:00
Martin Storsjö
f1a9eee41c x86: Add missing movsxd for the int stride parameter
Signed-off-by: Martin Storsjö <martin@martin.st>
2016-06-17 00:11:21 +03:00
Diego Biurrun
1e9c5bf4c1 asm: FF_-prefix internal macros used in inline assembly
These warnings conflict with system macros on Solaris, producing
truckloads of warnings about macro redefinition.
2016-05-28 19:18:26 +02:00
Diego Biurrun
dc40a70c57 Drop unnecessary libavutil/x86/asm.h #includes 2016-05-28 19:18:26 +02:00
Diego Biurrun
a6a750c7ef tests: Move all test programs to a subdirectory 2016-05-13 14:55:56 +02:00
Vittorio Giovara
41ed7ab45f cosmetics: Fix spelling mistakes
Signed-off-by: Diego Biurrun <diego@biurrun.de>
2016-05-04 18:16:21 +02:00
Diego Biurrun
01621202aa build: miscellaneous cosmetics
Restore alphabetical order in lists, break overly long lines, do some
prettyprinting, add some explanatory section comments, group parts
together that belong together logically.
2016-04-07 15:26:08 +02:00
Diego Biurrun
1a094af638 fft: Split MDCT bits off from FFT 2016-03-01 10:18:28 +01:00
Diego Biurrun
73ff983e8d fft: x86: cosmetics: Drop silly comments, add comment, whitespace 2016-02-26 14:34:58 +01:00
Diego Biurrun
257b30af8e x86: hevc: Fix linking with both yasm and optimizations disabled
Some optimized functions reference optimized symbols, so the functions
must be explicitly disabled when those symbols are unavailable.
2016-02-23 11:47:54 +01:00
Diego Biurrun
15a24614ae build: Add vc1dsp component for more fine-grained dependencies 2016-02-19 20:38:18 +01:00
Luca Barbato
e280fe1329 v210: Use separate sample_factors
The 10bit and the 8bit functions can now be implemented to process
a different amount of samples.

And while at it simplify a little the code.
2016-02-01 13:40:07 +01:00
James Darnley
15ec7aa417 v210: Add avx2 version of the 10-bit line encoder
Around 25% faster than the ssse3 version.

Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2016-02-01 13:40:07 +01:00
James Darnley
d29237e557 v210: Add avx2 version of the 8-bit line encoder
Around 35% faster than the avx version.

Signed-off-by: Henrik Gramner <henrik@gramner.com>
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2016-02-01 13:40:07 +01:00
Luca Barbato
eafb05fcf3 v210: x86: Add the correct guards around the asm code
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2016-01-26 23:31:57 +01:00
Geza Lore
cc602061ee x86inc: Add debug symbols indicating sizes of compiled functions
Some debuggers/profilers use this metadata to determine which function a
given instruction is in; without it they get can confused by local labels
(if you haven't stripped those). On the other hand, some tools are still
confused even with this metadata. e.g. this fixes `gdb`, but not `perf`.

Currently only implemented for ELF.

Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-01-23 20:46:28 +01:00
Diego Biurrun
03ef89faf2 x86: build: Group all encoder objects together 2016-01-18 14:47:58 +01:00
Diego Biurrun
4f22b13888 x86: ac3dsp: Drop forward declaration for nonexisting function 2016-01-18 11:55:38 +01:00
Janne Grunau
8563f98871 x86: use emms after ff_int32_to_float_fmul_scalar_sse
Intel's Instruction Set Reference (as of September 2015) clearly states
that cvtpi2ps switches to MMX state. Actual CPUs do not switch if the
source is a memory location. The Instruction Set Reference from 1999
(Order Number 243191) describes this behaviour but all later versions
I've seen have make no distinction whether MMX registers or memory is
used as source.
The documentation for the matching SSE2 instruction to convert to double
(cvtpi2pd) was fixed (see the valgrind bug
https://bugs.kde.org/show_bug.cgi?id=210264).

It will take time to get a clarification and fixes in place. In the
meantime it makes sense to change ff_int32_to_float_fmul_scalar_sse to
be correct according to the documentation. The vast majority of users
will have SSE2 so a change to the SSE version has little effect.

Fixes fate-checkasm on x86 valgrind targets.

Valgrind 'bug' reported as https://bugs.kde.org/show_bug.cgi?id=357059
2015-12-30 13:37:57 +01:00
Janne Grunau
f4f27e4cf1 x86: zero extend the 32-bit length in int32_to_float_fmul_scalar implicitly
This reverts commit 5dfe4edad6.
2015-12-29 11:42:51 +01:00
Alexandra Hájková
2008f76054 dca: remove unused decode_hf function and quant_d tables
They were superseded with their integer equivalents. Rename integer
decode_hf to decode_hf.
2015-12-24 13:58:18 +01:00
Janne Grunau
5dfe4edad6 x86_64: int32_to_float_fmul_scalar sign extend integer length 2015-12-14 16:42:35 +01:00
Dave Yeo
b0b133b8c0 hevcdsp: use a macro for .rodata section
fixes assembling on OS/2

Signed-off-by: Dave Yeo <dave.r.yeo@gmail.com>
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2015-12-11 16:19:30 +01:00
Anton Khirnov
e7078e842d hevcdsp: add x86 SIMD for MC 2015-12-05 21:11:52 +01:00
Vittorio Giovara
5d14cf1999 mpegvideo: Make sure mpegutils.h is included where needed 2015-09-13 17:34:45 +02:00