Commit Graph

85 Commits

Author SHA1 Message Date
Ronald S. Bultje
8db00081a3 x86: hpeldsp: Move half-pel assembly from dsputil to hpeldsp
Signed-off-by: Martin Storsjö <martin@martin.st>
2013-04-19 23:18:53 +03:00
Ronald S. Bultje
b93b27edb0 dsputil: Make dsputil selectable
Signed-off-by: Martin Storsjö <martin@martin.st>
2013-04-10 11:04:05 +03:00
Ronald S. Bultje
610b18e2e3 x86: qpel: Move fullpel and l2 functions to a separate file
This way, they can be shared between mpeg4qpel and h264qpel without
requiring either one to be compiled unconditionally.

Signed-off-by: Martin Storsjö <martin@martin.st>
2013-04-08 12:38:33 +03:00
Daniel Kang
9acd23d655 x86: dsputil: Fix h263 loop filter link error in some configurations
This was caused by unconditionally referencing a conditionally compiled
table. Now the code is also compiled conditionally.

Signed-off-by: Diego Biurrun <diego@biurrun.de>
2013-02-18 17:09:00 +01:00
Martin Storsjö
a846dccb29 h264chroma: x86: Fix building with yasm disabled
Signed-off-by: Martin Storsjö <martin@martin.st>
2013-02-06 17:05:33 +02:00
Diego Biurrun
79dad2a932 dsputil: Separate h264chroma 2013-02-06 11:30:53 +01:00
Daniel Kang
71155d7b41 dsputil: x86: Convert mpeg4 qpel and dsputil avg to yasm
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2013-01-27 06:45:31 +01:00
Mans Rullgard
e9d817351b dsputil: Separate h264 qpel
The sh4 optimizations are removed, because the code is
100% identical to the C code, so it is unlikely to
provide any real practical benefit.

Signed-off-by: Diego Biurrun <diego@biurrun.de>
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2013-01-24 10:44:43 +01:00
Ronald S. Bultje
2e4bb99f4d vorbisdsp: convert x86 simd functions from inline asm to yasm. 2013-01-22 18:02:24 -08:00
Ronald S. Bultje
fef906c77c Move vorbis_inverse_coupling from dsputil to vorbisdspcontext.
Conveniently (together with Justin's earlier patches), this makes
our vorbis decoder entirely independent of dsputil.
2013-01-19 22:21:10 -08:00
Diego Biurrun
a0c5917f86 Drop Snow codec
Snow is a toy codec with no real-world use and horrible code.
2013-01-06 16:30:02 +01:00
Ronald S. Bultje
8c53d39e7f lavc: introduce VideoDSPContext
Move some functions from dsputil. The idea is that videodsp contains
functions that are useful for a large and varied set of video decoders.
Currently, it contains emulated_edge_mc() and prefetch().

Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2012-12-20 13:40:45 +01:00
Daniel Kang
610e00b359 x86: h264: Convert 8-bit QPEL inline assembly to YASM
Signed-off-by: Diego Biurrun <diego@biurrun.de>
2012-11-25 20:38:35 +01:00
Janne Grunau
7e522859fc x86: vc1: call ff_vc1dsp_init_x86() under if (ARCH_X86) 2012-10-08 11:54:05 +02:00
Janne Grunau
cb36febcbc x86: cavs: call ff_cavsdsp_init_x86() under if (ARCH_X86) 2012-10-08 11:54:05 +02:00
Janne Grunau
f101eab1be x86: call most of the x86 dsp init functions under if (ARCH_X86)
Rename the called dsp init functions to *_init_x86.
2012-10-08 11:54:05 +02:00
Diego Biurrun
a84edbacaf x86: dsputil: Only compile motion_est code when encoders are enabled 2012-09-10 08:31:47 +02:00
Diego Biurrun
2e6f93a284 x86: Always compile files with functions that are called unconditionally 2012-08-29 00:27:06 +02:00
Diego Biurrun
bcc45d6348 x86: avcodec: Drop silly "_mmx" suffixes from filenames 2012-08-28 18:37:34 +02:00
Diego Biurrun
efbd04c332 x86: avcodec: Drop silly "_sse" suffixes from filenames 2012-08-28 18:37:33 +02:00
Diego Biurrun
3f02c533f3 build: fft: x86: Drop unused YASM-OBJS-FFT- variable 2012-08-27 03:10:58 +02:00
Diego Biurrun
dc40285427 x86: mpegvideo: more sensible names for optimization file and init function 2012-08-24 02:23:16 +02:00
Diego Biurrun
d211547ddd x86: mpegvideoenc: Split optimizations off into a separate file 2012-08-24 02:23:16 +02:00
Diego Biurrun
26ce9aec03 dnxhdenc: x86: more sensible names for optimization file and init function 2012-08-24 02:23:15 +02:00
Diego Biurrun
6fa488678f build: x86: Only compile mpegvideo optimizations when necessary 2012-08-22 01:06:33 +02:00
Diego Biurrun
6961bdface x86: avcodec: Consistently name all init files 2012-08-16 11:05:38 +02:00
Diego Biurrun
29cfdd3767 x86: avcodec: Appropriately name files containing only init functions 2012-08-15 03:24:08 +02:00
Diego Biurrun
3b9e832e17 x86: Drop silly "_yasm" suffixes from filenames 2012-08-12 17:13:05 +02:00
Mans Rullgard
ec7c501ed5 x86: remove libmpeg2 mmx(ext) idct functions
These functions are not faster than other mmx implementations on
any hardware I have been able to test on, and they are horribly
inaccurate.  There is thus no reason to ever use them.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-08-02 12:14:52 +01:00
Ronald S. Bultje
b6a3849adb fft: port FFT/IMDCT 3dnow functions to yasm, and disable on x86-64.
64-bit CPUs always have SSE available, thus there is no need to compile
in the 3dnow functions. This results in smaller binaries.
2012-07-31 21:20:47 -07:00
Mans Rullgard
28f9ab7029 vp3: move idct and loop filter pointers to new vp3dsp context
This moves all VP3-specific function pointers from dsputil to a
new vp3dsp context.  There is no reason to ever use the VP3 IDCT
where an MPEG2 IDCT is expected or vice versa.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-07-18 10:32:19 +01:00
Mans Rullgard
ab9f987661 build: add CONFIG_VP3DSP, reduce repetition in OBJS lists
Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-07-18 10:32:18 +01:00
Mans Rullgard
8299260470 x86: fft: convert sse inline asm to yasm 2012-06-25 13:31:00 +01:00
Diego Biurrun
7bb3a302fe build: Consistently handle conditional compilation for all optimization OBJS. 2012-04-12 09:00:49 +02:00
Diego Biurrun
ad0e31f134 build: prettyprinting cosmetics 2012-03-26 13:00:10 +02:00
Diego Biurrun
915a2a0a65 x86: conditionally compile H.264 QPEL optimizations 2012-03-25 11:50:45 +02:00
Christophe GISQUET
34454c761f SBR DSP x86: implement SSE sbr_sum_square_sse
The 32bits targets have been compiled with -mfpmath=sse for proper reference.
sbr_sum_square C  /32bits: 82c (unrolled)/102c
               C  /64bits: 69c (unrolled)/82c
               SSE/32bits: 42c
               SSE/64bits: 31c

Use of SSE4.1 dpps to perform the final sum is slower.
Not unrolling to perform 8 operations in a loop yields 10 more cycles.

Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2012-02-23 15:50:06 -08:00
Ronald S. Bultje
7e4d9d5d45 win64: add a XMM clobber test configure option.
This will be useful to test more aggressively for failures to mark XMM
registers as clobbered in Win64 builds, and prevent regressions thereof.

Based on a patch by Ramiro Polla <ramiro.polla@gmail.com>
2012-02-02 12:00:48 -08:00
Christophe Gisquet
e5c9de2ab7 rv40: x86 SIMD for biweight
Provide MMX, SSE2 and SSSE3 versions, with a fast-path when the weights are
multiples of 512 (which is often the case when the values round up nicely).

*_TIMER report for the 16x16 and 8x8 cases:
C:
9015 decicycles in 16, 524257 runs, 31 skips
2656 decicycles in 8, 524271 runs, 17 skips
MMX:
4156 decicycles in 16, 262090 runs, 54 skips
1206 decicycles in 8, 262131 runs, 13 skips
MMX on fast-path:
2760 decicycles in 16, 524222 runs, 66 skips
995 decicycles in 8, 524252 runs, 36 skips
SSE2:
2163 decicycles in 16, 262131 runs, 13 skips
832 decicycles in 8, 262137 runs, 7 skips
SSE2 with fast path:
1783 decicycles in 16, 524276 runs, 12 skips
711 decicycles in 8, 524283 runs, 5 skips
SSSE3:
2117 decicycles in 16, 262136 runs, 8 skips
814 decicycles in 8, 262143 runs, 1 skips
SSSE3 with fast path:
1315 decicycles in 16, 524285 runs, 3 skips
578 decicycles in 8, 524286 runs, 2 skips

This means around a 4% speedup for some sequences.

Signed-off-by: Diego Biurrun <diego@biurrun.de>
2012-01-30 23:58:25 +01:00
Diego Biurrun
91bafb52ae x86: Give RV40 init file a more suitable name. 2012-01-30 23:58:24 +01:00
Ronald S. Bultje
59f474b49d png: convert DSP functions to yasm. 2012-01-29 18:47:50 -08:00
Ronald S. Bultje
e92003514d png: move DSP functions to their own DSP context. 2012-01-29 08:11:18 -08:00
Christophe GISQUET
3faa303a47 rv34: DC-only inverse transform
When decoding coefficients, detect whether the block is DC-only, and take
advantage of this knowledge to perform DC-only inverse transform.

This is achieved by:
- first, changing the 108x4 element modulo_three_table into a 108 element
  table (kind of base4), and accessing each value using mask and shifts.
- then, checking low bits for 0 (as they represent the presence of higher
  frequency coefficients)

Also provide x86 SIMD code for the DC-only inverse transform.

Signed-off-by: Kostya Shishkov <kostya.shishkov@gmail.com>
2012-01-12 09:52:33 +01:00
Vitor Sessak
39df0c434c mpegaudiodec: optimized iMDCT transform
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2012-01-08 17:40:55 -08:00
Diego Biurrun
30bbd5cbc0 x86: conditionally compile dnxhd encoder optimizations 2011-12-19 13:54:10 +01:00
Diego Biurrun
88b9735753 build: conditionally compile x86 H.264 chroma optimizations 2011-12-14 11:58:45 +01:00
Ronald S. Bultje
e3f530feca prores: idct sse2/sse4 optimizations.
~3.0-3.5x as fast as original C version, 1.6x as fast overall.
2011-10-11 07:50:48 -07:00
Kostya Shishkov
d241f51e0f Move RV3/4-specific DSP functions into their own context
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2011-08-11 16:07:15 -07:00
Daniel Kang
9bfa5363da H.264: Add x86 assembly for 10-bit H.264 qpel functions.
Mainly ported from 8-bit H.264 qpel.

Some code ported from x264. LGPL ok by author.

Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2011-07-03 07:43:38 -07:00
Daniel Kang
84e70ef004 h264: Add x86 assembly for 10-bit weight/biweight H.264 functions.
Mainly ported from 8-bit H.264 weight/biweight.

Signed-off-by: Diego Biurrun <diego@biurrun.de>
2011-06-21 15:24:13 +02:00