FFmpeg/libavcodec/arm
Martin Storsjö 600f4c9b03 arm: vp9itxfm: Avoid reloading the idct32 coefficients
The idct32x32 function actually pushed q4-q7 onto the stack even
though it didn't clobber them; there are plenty of registers that
can be used to allow keeping all the idct coefficients in registers
without having to reload different subsets of them at different
stages in the transform.

Since the idct16 core transform avoids clobbering q4-q7 (but clobbers
q2-q3 instead, to avoid needing to back up and restore q4-q7 at all
in the idct16 function), and the lanewise vmul needs a register in
the q0-q3 range, we move the stored coefficients from q2-q3 into q4-q5
while doing idct16.

While keeping these coefficients in registers, we still can skip pushing
q7.

Before:                              Cortex A7       A8       A9      A53
vp9_inv_dct_dct_32x32_sub32_add_neon:  18553.8  17182.7  14303.3  12089.7
After:
vp9_inv_dct_dct_32x32_sub32_add_neon:  18470.3  16717.7  14173.6  11860.8

This is cherrypicked from libav commit
402546a172.

Signed-off-by: Martin Storsjö <martin@martin.st>
2017-03-11 13:14:51 +02:00
..
aac.h
aacpsdsp_init_arm.c
aacpsdsp_neon.S
ac3dsp_arm.S
ac3dsp_armv6.S
ac3dsp_init_arm.c
ac3dsp_neon.S
asm-offsets.h
audiodsp_arm.h
audiodsp_init_arm.c
audiodsp_init_neon.c
audiodsp_neon.S
blockdsp_arm.h blockdsp: remove high bitdepth parameter 2015-10-02 04:38:40 +02:00
blockdsp_init_arm.c blockdsp: remove high bitdepth parameter 2015-10-02 04:38:40 +02:00
blockdsp_init_neon.c blockdsp: reindent after parameter removal 2015-10-03 23:34:56 +02:00
blockdsp_neon.S
cabac.h
dca.h avcodec/dca: remove old decoder 2016-01-31 17:09:38 +01:00
fft_fixed_init_arm.c Merge commit '97aec6e75ef36ed0402653519daa8e1fc8ddb555' 2016-04-12 15:43:09 +01:00
fft_fixed_neon.S
fft_init_arm.c Merge commit '4c297249ac0f513a610a62691ce96d6b62f65b94' 2016-04-12 15:43:34 +01:00
fft_neon.S
fft_vfp.S
flacdsp_arm.S
flacdsp_init_arm.c
fmtconvert_init_arm.c Merge commit '90b1b9350c0a97c4065ae9054b83e57f48a0de1f' 2016-01-02 11:21:36 +01:00
fmtconvert_neon.S Merge commit '90b1b9350c0a97c4065ae9054b83e57f48a0de1f' 2016-01-02 11:21:36 +01:00
fmtconvert_vfp.S
g722dsp_init_arm.c
g722dsp_neon.S
h264chroma_init_arm.c
h264cmc_neon.S avcodec: fix vc1dsp dependencies 2016-09-25 13:11:45 +02:00
h264dsp_init_arm.c
h264dsp_neon.S
h264idct_neon.S
h264pred_init_arm.c Merge commit '256ef19844892c6cf8e0386e3287bae970ec6320' 2015-07-18 02:13:22 +02:00
h264pred_neon.S
h264qpel_init_arm.c
h264qpel_neon.S
hevcdsp_arm.h
hevcdsp_deblock_neon.S
hevcdsp_idct_neon.S Merge commit '1bd890ad173d79e7906c5e1d06bf0a06cca4519d' 2017-01-31 15:31:34 +01:00
hevcdsp_init_arm.c
hevcdsp_init_neon.c Merge commit '1bd890ad173d79e7906c5e1d06bf0a06cca4519d' 2017-01-31 15:31:34 +01:00
hevcdsp_qpel_neon.S
hpeldsp_arm.h
hpeldsp_arm.S
hpeldsp_armv6.S
hpeldsp_init_arm.c
hpeldsp_init_armv6.c
hpeldsp_init_neon.c
hpeldsp_neon.S
idct.h
idctdsp_arm.h
idctdsp_arm.S
idctdsp_armv6.S
idctdsp_init_arm.c Merge commit '7c6eb0a1b7bf1aac7f033a7ec6d8cacc3b5c2615' 2015-07-27 22:10:35 +02:00
idctdsp_init_armv5te.c
idctdsp_init_armv6.c Merge commit '7c6eb0a1b7bf1aac7f033a7ec6d8cacc3b5c2615' 2015-07-27 22:10:35 +02:00
idctdsp_init_neon.c
idctdsp_neon.S
int_neon.S
jrevdct_arm.S
lossless_audiodsp_init_arm.c
lossless_audiodsp_neon.S
Makefile arm: Add NEON optimizations for 10 and 12 bit vp9 loop filter 2017-01-24 22:35:59 +02:00
mathops.h
mdct_fixed_neon.S
mdct_neon.S
mdct_vfp.S
me_cmp_armv6.S
me_cmp_init_arm.c
mlpdsp_armv5te.S
mlpdsp_armv6.S Merge commit '41ed7ab45fc693f7d7fc35664c0233f4c32d69bb' 2016-06-21 21:55:34 +02:00
mlpdsp_init_arm.c
mpegaudiodsp_fixed_armv6.S
mpegaudiodsp_init_arm.c
mpegvideo_arm.c
mpegvideo_arm.h
mpegvideo_armv5te_s.S
mpegvideo_armv5te.c Merge commit '41ed7ab45fc693f7d7fc35664c0233f4c32d69bb' 2016-06-21 21:55:34 +02:00
mpegvideo_neon.S
mpegvideoencdsp_armv6.S
mpegvideoencdsp_init_arm.c
neon.S
neontest.c avcodec: fix arguments on xmm/neon clobber test wrappers 2016-10-02 02:15:47 -03:00
pixblockdsp_armv6.S
pixblockdsp_init_arm.c
rdft_init_arm.c arm/rdft_init: fix license header 2016-04-12 15:01:19 -03:00
rdft_neon.S
rv34dsp_init_arm.c
rv34dsp_neon.S
rv40dsp_init_arm.c
rv40dsp_neon.S
sbrdsp_init_arm.c
sbrdsp_neon.S
simple_idct_arm.S Merge commit '41ed7ab45fc693f7d7fc35664c0233f4c32d69bb' 2016-06-21 21:55:34 +02:00
simple_idct_armv5te.S
simple_idct_armv6.S
simple_idct_neon.S
startcode_armv6.S
startcode.h
synth_filter_init_arm.c avcodec/synth_filter: split off remaining code from dcadec files 2016-01-25 14:57:38 -03:00
synth_filter_neon.S
synth_filter_vfp.S
vc1dsp_init_arm.c
vc1dsp_init_neon.c
vc1dsp_neon.S
vc1dsp.h
videodsp_arm.h
videodsp_armv5te.S arm: use a local label instead of the function symbol in ff_prefetch_arm 2015-07-20 23:10:29 +02:00
videodsp_init_arm.c
videodsp_init_armv5te.c
vorbisdsp_init_arm.c
vorbisdsp_neon.S
vp3dsp_init_arm.c
vp3dsp_neon.S
vp6dsp_init_arm.c
vp6dsp_neon.S
vp8_armv6.S
vp8.h
vp8dsp_armv6.S Merge commit '5f74bd31a9bd1ac7655103b11743c12d38e0419f' 2016-11-17 15:05:07 +01:00
vp8dsp_init_arm.c
vp8dsp_init_armv6.c
vp8dsp_init_neon.c
vp8dsp_neon.S Merge commit 'e8b96a77010dd62624c3c65c357d7ae3b397ceaa' 2016-11-14 15:21:49 +01:00
vp8dsp.h
vp9dsp_init_10bpp_arm.c arm: Add NEON optimizations for 10 and 12 bit vp9 MC 2017-01-24 22:35:50 +02:00
vp9dsp_init_12bpp_arm.c arm: Add NEON optimizations for 10 and 12 bit vp9 MC 2017-01-24 22:35:50 +02:00
vp9dsp_init_16bpp_arm_template.c arm: Add NEON optimizations for 10 and 12 bit vp9 loop filter 2017-01-24 22:35:59 +02:00
vp9dsp_init_arm.c arm: vp9lpf: Implement the mix2_44 function with one single filter pass 2017-03-11 13:14:51 +02:00
vp9dsp_init.h arm: Add NEON optimizations for 10 and 12 bit vp9 MC 2017-01-24 22:35:50 +02:00
vp9itxfm_16bpp_neon.S arm: Add NEON optimizations for 10 and 12 bit vp9 itxfm 2017-01-24 22:35:56 +02:00
vp9itxfm_neon.S arm: vp9itxfm: Avoid reloading the idct32 coefficients 2017-03-11 13:14:51 +02:00
vp9lpf_16bpp_neon.S arm: Add NEON optimizations for 10 and 12 bit vp9 loop filter 2017-01-24 22:35:59 +02:00
vp9lpf_neon.S arm: vp9lpf: Implement the mix2_44 function with one single filter pass 2017-03-11 13:14:51 +02:00
vp9mc_16bpp_neon.S arm: Add NEON optimizations for 10 and 12 bit vp9 MC 2017-01-24 22:35:50 +02:00
vp9mc_neon.S arm: vp9mc: Calculate less unused data in the 4 pixel wide horizontal filter 2017-03-11 13:14:47 +02:00
vp56_arith.h