FFmpeg/libavcodec/x86
Christophe Gisquet 9630b3fc06 x86: lossless audio: SSE4 madd 32bits
The unique user so far is wmalossless 24bits. The few samples tested show an
order of 8, so more unrolling or an avx2 version do not make sense.

Timings: 68 -> 49 cycles

Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2016-05-07 23:28:48 +02:00
..
aacpsdsp_init.c
aacpsdsp.asm
ac3dsp_init.c Merge commit '4f22b138886e29f7fffa8c715673951e51be9f32' 2016-01-27 18:23:31 +00:00
ac3dsp.asm
alacdsp_init.c x86/alacdsp: add simd optimized functions 2015-10-06 20:22:00 -03:00
alacdsp.asm x86/alacdsp: add simd optimized functions 2015-10-06 20:22:00 -03:00
audiodsp_init.c
audiodsp.asm
blockdsp_init.c blockdsp: reindent after parameter removal 2015-10-03 23:34:56 +02:00
blockdsp.asm
bswapdsp_init.c
bswapdsp.asm
cabac.h
cavsdsp.c
constants.c avcodec/v210: add avx2 version of the 10-bit line encoder 2016-01-17 16:03:43 +01:00
constants.h avcodec/v210: add avx2 version of the 10-bit line encoder 2016-01-17 16:03:43 +01:00
dcadsp_init.c x86/dcadec: add ff_lfe_fir1_float_{sse3,avx} 2016-02-22 21:21:34 -03:00
dcadsp.asm x86/dcadec: add ff_lfe_fir1_float_{sse3,avx} 2016-02-22 21:21:34 -03:00
dct32.asm
dct_init.c
dct-test.c x86: dct-test: add more idcts 2015-10-13 16:03:04 +02:00
dirac_dwt_init.c dirac_dwt: Make x86 files/functions names consistent 2016-02-05 19:30:23 -08:00
dirac_dwt.asm dirac_dwt: Make x86 files/functions names consistent 2016-02-05 19:30:23 -08:00
diracdsp_init.c diracdsp: Make x86 files/functions names consistent 2016-02-05 19:29:43 -08:00
diracdsp.asm diracdsp: Make x86 files/functions names consistent 2016-02-05 19:29:43 -08:00
dnxhdenc_init.c
dnxhdenc.asm
fdct.c
fdct.h
fdctdsp_init.c
fft_init.c Merge commit '73ff983e8dd22ccee166403d0bbbc9c1cd543622' 2016-04-12 15:42:21 +01:00
fft.asm avcodec: Extend fft to size 2^17 2016-03-04 13:51:42 +01:00
fft.h
flac_dsp_gpl.asm
flacdsp_init.c
flacdsp.asm
fmtconvert_init.c
fmtconvert.asm avcodec/x86/fmtconvert: Add emms to int32_to_float_fmul_array8_sse() 2016-01-15 17:08:37 +01:00
fpel.asm
fpel.h x86: fpel: Remove erroneous ff_put_pixels8_mmxext prototype 2015-10-19 16:52:37 -07:00
g722dsp_init.c
g722dsp.asm
h263_loopfilter.asm
h263dsp_init.c
h264_chromamc_10bit.asm
h264_chromamc.asm
h264_deblock_10bit.asm
h264_deblock.asm avcodec/h264: Fix segfault in 4:2:2 chroma deblock with 32-bit msvc 2016-02-05 22:01:38 +01:00
h264_i386.h
h264_idct_10bit.asm vp9: 16bpp tm/dc/h/v intra pred simd (mostly sse2) functions. 2015-10-03 14:42:39 -04:00
h264_idct.asm
h264_intrapred_10bit.asm vp9: 16bpp tm/dc/h/v intra pred simd (mostly sse2) functions. 2015-10-03 14:42:39 -04:00
h264_intrapred_init.c
h264_intrapred.asm
h264_qpel_8bit.asm
h264_qpel_10bit.asm vp9: 10/12bpp SIMD (sse2/ssse3/avx) for directional intra prediction. 2015-10-03 14:42:39 -04:00
h264_qpel.c x86: fpel: Move prototypes for 4-px block functions 2015-10-19 16:52:33 -07:00
h264_weight_10bit.asm
h264_weight.asm
h264chroma_init.c
h264dsp_init.c avcodec/h264: mmxext 4:2:2 chroma deblock/loop filter 2016-02-05 17:26:04 +01:00
hevc_deblock.asm
hevc_idct.asm
hevc_mc.asm hevcdsp: use a macro for .rodata section 2015-12-11 16:19:30 +01:00
hevc_res_add.asm
hevc_sao_10bit.asm x86/hevc_sao: add ff_hevc_sao_edge_filter_{8,16}_{10,12} 2015-12-20 17:01:15 -03:00
hevc_sao.asm
hevcdsp_init.c x86: hevc: Fix linking with both yasm and optimizations disabled 2016-02-23 11:47:54 +01:00
hevcdsp.h
hpeldsp_init.c
hpeldsp_rnd_template.c
hpeldsp.asm
hpeldsp.h
huffyuvdsp_init.c
huffyuvdsp.asm
huffyuvencdsp_mmx.c x86: use the new helper macros where useful 2016-02-14 20:00:21 -03:00
huffyuvencdsp.asm huffyuvencdsp: Undefine "i" macro after each use 2016-02-07 09:19:17 -08:00
idctdsp_init.c x86: simple_idct: 12bits versions 2015-10-13 15:34:32 +02:00
idctdsp.asm
idctdsp.h
imdct36.asm x86/imdct36: use extractps inside the STORE macro 2016-01-28 13:35:15 -03:00
inline_asm.h
jpeg2000dsp_init.c x86: use the new helper macros where useful 2016-02-14 20:00:21 -03:00
jpeg2000dsp.asm
lossless_audiodsp_init.c x86: lossless audio: SSE4 madd 32bits 2016-05-07 23:28:48 +02:00
lossless_audiodsp.asm x86: lossless audio: SSE4 madd 32bits 2016-05-07 23:28:48 +02:00
lossless_videodsp_init.c
lossless_videodsp.asm
lpc.c
Makefile x86/vc1dsp: Split the file into MC and loopfilter 2016-02-29 08:46:53 -08:00
mathops.h
me_cmp_init.c
me_cmp.asm
mlpdsp_init.c x86: use the new helper macros where useful 2016-02-14 20:00:21 -03:00
mlpdsp.asm
mpegaudiodsp.c
mpegvideo.c
mpegvideodsp.c
mpegvideoenc_qns_template.c
mpegvideoenc_template.c
mpegvideoenc.c
mpegvideoencdsp_init.c
mpegvideoencdsp.asm
pixblockdsp_init.c
pixblockdsp.asm pixblockdsp: x86: Condense diff_pixels_* to a shared macro 2015-11-07 14:31:34 -08:00
pngdsp_init.c
pngdsp.asm
proresdsp_init.c
proresdsp.asm x86inc: Add debug symbols indicating sizes of compiled functions 2016-01-23 20:46:28 +01:00
qpel.asm
qpeldsp_init.c
qpeldsp.asm
rnd_template.c
rv34dsp_init.c
rv34dsp.asm
rv40dsp_init.c all: fix -Wextra-semi reported on clang 2015-10-24 17:58:17 -04:00
rv40dsp.asm
sbrdsp_init.c
sbrdsp.asm
simple_idct10_template.asm x86: simple_idct10_template: use const 2015-10-13 22:52:33 +02:00
simple_idct10.asm x86inc: Add debug symbols indicating sizes of compiled functions 2016-01-21 23:19:46 +01:00
simple_idct.c
simple_idct.h x86: simple_idct: 12bits versions 2015-10-13 15:34:32 +02:00
snowdsp.c
svq1enc_init.c
svq1enc.asm
synth_filter_init.c x86: use the new helper macros where useful 2016-02-14 20:00:21 -03:00
synth_filter.asm avcodec/synth_filter: split off remaining code from dcadec files 2016-01-25 14:57:38 -03:00
takdsp_init.c avcodec/takdec: add x86 SIMD for rest of decorrelation modes 2015-10-09 21:38:15 +02:00
takdsp.asm x86/takdsp: use arithmetic shift instructions 2015-10-09 23:52:39 -03:00
ttadsp_init.c
ttadsp.asm
v210-init.c
v210.asm
v210enc_init.c Merge commit 'e280fe13291e9c712a5f4aa13b5263f3e8afed45' 2016-02-16 17:23:32 +00:00
v210enc.asm Merge commit 'eafb05fcf37cd19a910ca3b17824384f9006bc0a' 2016-02-16 17:02:56 +00:00
vc1dsp_init.c x86: vc1dsp: Convert vc1_inv_trans_*_dc to NASM format 2016-02-01 17:01:11 -08:00
vc1dsp_loopfilter.asm x86/vc1dsp: Split the file into MC and loopfilter 2016-02-29 08:46:53 -08:00
vc1dsp_mc.asm x86/vc1dsp: Split the file into MC and loopfilter 2016-02-29 08:46:53 -08:00
vc1dsp_mmx.c x86/vc1dsp: Port vc1_*_hor_16b_shift2 to NASM format 2016-02-14 11:11:02 -08:00
vc1dsp.h
videodsp_init.c
videodsp.asm videodsp: fix 1-byte overread in top/bottom READ_NUM_BYTES iterations. 2016-01-18 11:12:47 -05:00
vorbisdsp_init.c
vorbisdsp.asm
vp3dsp_init.c
vp3dsp.asm
vp6dsp_init.c
vp6dsp.asm
vp8dsp_init.c
vp8dsp_loopfilter.asm
vp8dsp.asm
vp9dsp_init_10bpp.c
vp9dsp_init_12bpp.c
vp9dsp_init_16bpp_template.c x86: use the new helper macros where useful 2016-02-14 20:00:21 -03:00
vp9dsp_init_16bpp.c x86: use the new helper macros where useful 2016-02-14 20:00:21 -03:00
vp9dsp_init.c x86: use the new helper macros where useful 2016-02-14 20:00:21 -03:00
vp9dsp_init.h all: fix -Wextra-semi reported on clang 2015-10-24 17:58:17 -04:00
vp9intrapred_16bpp.asm vp9: don't keep a stack pointer if we don't need it. 2015-10-07 08:55:19 -04:00
vp9intrapred.asm
vp9itxfm_16bpp.asm x86/vp9itxfm: fix register clobbering in ff_vp9_idct_idct_4x4_add_12_sse2 2015-10-13 20:21:33 -03:00
vp9itxfm_template.asm vp9: add x86 simd (sse2/ssse3) for iadst4 10bpp functions. 2015-10-13 11:05:58 -04:00
vp9itxfm.asm vp9: refactor itx coefficients and share between 8 and 10/12bpp. 2015-10-13 11:06:01 -04:00
vp9lpf_16bpp.asm vp9: sse2/ssse3/avx 16bpp loopfilter x86 simd. 2015-10-03 14:42:39 -04:00
vp9lpf.asm
vp9mc_16bpp.asm vp9: sse2/ssse3/avx 16bpp loopfilter x86 simd. 2015-10-03 14:42:39 -04:00
vp9mc.asm
vp56_arith.h
w64xmmtest.c
xvididct_init.c
xvididct.asm
xvididct.h