FFmpeg/libavcodec/arm
Martin Storsjö eabc5abf94 arm: vp9itxfm16: Do a simpler half/quarter idct16/idct32 when possible
This work is sponsored by, and copyright, Google.

This avoids loading and calculating coefficients that we know will
be zero, and avoids filling the temp buffer with zeros in places
where we know the second pass won't read.

This gives a pretty substantial speedup for the smaller subpartitions.

The code size increases from 14516 bytes to 22484 bytes.

The idct16/32_end macros are moved above the individual functions; the
instructions themselves are unchanged, but since new functions are added
at the same place where the code is moved from, the diff looks rather
messy.

Before:                                 Cortex A7       A8       A9      A53
vp9_inv_dct_dct_16x16_sub1_add_10_neon:     454.0    270.7    418.5    295.4
vp9_inv_dct_dct_16x16_sub2_add_10_neon:    3840.2   3244.8   3700.1   2337.9
vp9_inv_dct_dct_16x16_sub4_add_10_neon:    4212.5   3575.4   3996.9   2571.6
vp9_inv_dct_dct_16x16_sub8_add_10_neon:    5174.4   4270.5   4615.5   3031.9
vp9_inv_dct_dct_16x16_sub12_add_10_neon:   5676.0   4908.5   5226.5   3491.3
vp9_inv_dct_dct_16x16_sub16_add_10_neon:   6403.9   5589.0   5839.8   3948.5
vp9_inv_dct_dct_32x32_sub1_add_10_neon:    1710.7    944.7   1582.1   1045.4
vp9_inv_dct_dct_32x32_sub2_add_10_neon:   21040.7  16706.1  18687.7  13193.1
vp9_inv_dct_dct_32x32_sub4_add_10_neon:   22197.7  18282.7  19577.5  13918.6
vp9_inv_dct_dct_32x32_sub8_add_10_neon:   24511.5  20911.5  21472.5  15367.5
vp9_inv_dct_dct_32x32_sub12_add_10_neon:  26939.5  24264.3  23239.1  16830.3
vp9_inv_dct_dct_32x32_sub16_add_10_neon:  29419.5  26845.1  25020.6  18259.9
vp9_inv_dct_dct_32x32_sub20_add_10_neon:  31146.4  29633.5  26803.3  19721.7
vp9_inv_dct_dct_32x32_sub24_add_10_neon:  33376.3  32507.8  28642.4  21174.2
vp9_inv_dct_dct_32x32_sub28_add_10_neon:  35629.4  35439.6  30416.5  22625.7
vp9_inv_dct_dct_32x32_sub32_add_10_neon:  37269.9  37914.9  32271.9  24078.9

After:
vp9_inv_dct_dct_16x16_sub1_add_10_neon:     454.0    276.0    418.5    295.1
vp9_inv_dct_dct_16x16_sub2_add_10_neon:    2336.2   1886.0   2251.0   1458.6
vp9_inv_dct_dct_16x16_sub4_add_10_neon:    2531.0   2054.7   2402.8   1591.1
vp9_inv_dct_dct_16x16_sub8_add_10_neon:    3848.6   3491.1   3845.7   2554.8
vp9_inv_dct_dct_16x16_sub12_add_10_neon:   5703.8   4831.6   5230.8   3493.4
vp9_inv_dct_dct_16x16_sub16_add_10_neon:   6399.5   5567.0   5832.4   3951.5
vp9_inv_dct_dct_32x32_sub1_add_10_neon:    1722.1    938.5   1577.3   1044.5
vp9_inv_dct_dct_32x32_sub2_add_10_neon:   15003.5  11576.8  13105.8   9602.2
vp9_inv_dct_dct_32x32_sub4_add_10_neon:   15768.5  12677.2  13726.0  10138.1
vp9_inv_dct_dct_32x32_sub8_add_10_neon:   17278.8  14825.4  14907.5  11185.7
vp9_inv_dct_dct_32x32_sub12_add_10_neon:  22335.7  21544.5  20379.5  15019.8
vp9_inv_dct_dct_32x32_sub16_add_10_neon:  24165.6  23881.7  21938.6  16308.2
vp9_inv_dct_dct_32x32_sub20_add_10_neon:  31082.2  30860.9  26835.3  19711.3
vp9_inv_dct_dct_32x32_sub24_add_10_neon:  33102.6  31922.8  28638.3  21161.0
vp9_inv_dct_dct_32x32_sub28_add_10_neon:  35104.9  34867.5  30411.7  22621.2
vp9_inv_dct_dct_32x32_sub32_add_10_neon:  37438.1  39103.4  32217.8  24067.6

Signed-off-by: Martin Storsjö <martin@martin.st>
2017-03-19 22:54:33 +02:00
..
aac.h
aacpsdsp_init_arm.c Reinstate proper FFmpeg license for all files. 2013-08-30 15:47:38 +00:00
aacpsdsp_neon.S Reinstate proper FFmpeg license for all files. 2013-08-30 15:47:38 +00:00
ac3dsp_arm.S Reinstate proper FFmpeg license for all files. 2013-08-30 15:47:38 +00:00
ac3dsp_armv6.S Reinstate proper FFmpeg license for all files. 2013-08-30 15:47:38 +00:00
ac3dsp_init_arm.c Merge commit '4958f35a2ebc307049ff2104ffb944f5f457feb3' 2013-12-09 04:12:40 +01:00
ac3dsp_neon.S Merge commit '4958f35a2ebc307049ff2104ffb944f5f457feb3' 2013-12-09 04:12:40 +01:00
asm-offsets.h Merge commit '6a13505c069890cb0e2a07e29fd819a0cf2e73c1' 2014-04-30 00:23:01 +02:00
audiodsp_arm.h Merge commit '9a9e2f1c8aa4539a261625145e5c1f46a8106ac2' 2014-06-22 17:58:28 +02:00
audiodsp_init_arm.c Merge commit '9a9e2f1c8aa4539a261625145e5c1f46a8106ac2' 2014-06-22 17:58:28 +02:00
audiodsp_init_neon.c Merge commit '9a9e2f1c8aa4539a261625145e5c1f46a8106ac2' 2014-06-22 17:58:28 +02:00
audiodsp_neon.S Merge commit '9a9e2f1c8aa4539a261625145e5c1f46a8106ac2' 2014-06-22 17:58:28 +02:00
blockdsp_arm.h blockdsp: remove high bitdepth parameter 2015-10-02 04:38:40 +02:00
blockdsp_init_arm.c blockdsp: remove high bitdepth parameter 2015-10-02 04:38:40 +02:00
blockdsp_init_neon.c blockdsp: reindent after parameter removal 2015-10-03 23:34:56 +02:00
blockdsp_neon.S Merge commit 'e74433a8e6fc00c8dbde293c97a3e45384c2c1d9' 2014-06-19 04:54:38 +02:00
cabac.h avcodec/arm/cabac: fix inline cabac reader with the UNCHECKED bitstream reader 2014-03-15 01:08:45 +01:00
dca.h avcodec/dca: remove old decoder 2016-01-31 17:09:38 +01:00
fft_fixed_init_arm.c Merge commit '97aec6e75ef36ed0402653519daa8e1fc8ddb555' 2016-04-12 15:43:09 +01:00
fft_fixed_neon.S Merge commit 'f963f80399deb1a2b44c1bac3af7123e8a0c9e46' 2014-12-09 11:58:13 +01:00
fft_init_arm.c Merge commit '4c297249ac0f513a610a62691ce96d6b62f65b94' 2016-04-12 15:43:34 +01:00
fft_neon.S Merge commit 'f963f80399deb1a2b44c1bac3af7123e8a0c9e46' 2014-12-09 11:58:13 +01:00
fft_vfp.S Merge commit 'f963f80399deb1a2b44c1bac3af7123e8a0c9e46' 2014-12-09 11:58:13 +01:00
flacdsp_arm.S
flacdsp_init_arm.c lavc/flac: Fix encoding and decoding with high lpc. 2015-05-17 02:08:58 +02:00
fmtconvert_init_arm.c Merge commit '90b1b9350c0a97c4065ae9054b83e57f48a0de1f' 2016-01-02 11:21:36 +01:00
fmtconvert_neon.S Merge commit '90b1b9350c0a97c4065ae9054b83e57f48a0de1f' 2016-01-02 11:21:36 +01:00
fmtconvert_vfp.S Merge commit 'f0389eb777b1ab4291329d4f709098cdfa7384dc' 2013-08-29 16:10:39 +02:00
g722dsp_init_arm.c Merge commit '702458538d4e52809bcef460d39baabf061b16b5' 2015-02-16 02:16:29 +01:00
g722dsp_neon.S Merge commit '702458538d4e52809bcef460d39baabf061b16b5' 2015-02-16 02:16:29 +01:00
h264chroma_init_arm.c
h264cmc_neon.S avcodec: fix vc1dsp dependencies 2016-09-25 13:11:45 +02:00
h264dsp_init_arm.c lavc/arm: Use the neon vertical chroma loop filter also for H.264 4:2:2. 2015-01-31 10:05:24 +01:00
h264dsp_neon.S
h264idct_neon.S Merge commit '5bcbb516f2ff45290ef7995b081762e668693672' 2014-02-08 00:48:26 +01:00
h264pred_init_arm.c Merge commit '256ef19844892c6cf8e0386e3287bae970ec6320' 2015-07-18 02:13:22 +02:00
h264pred_neon.S
h264qpel_init_arm.c Merge commit '7fb993d338d88f2f62e0a358b6c9f3eb9a3a08ac' 2014-07-25 13:05:08 +02:00
h264qpel_neon.S
hevcdsp_arm.h hevcdsp: fix compilation for arm and aarch64 2015-03-12 20:01:01 +01:00
hevcdsp_deblock_neon.S hevcdsp: HEVC deblocking ARM NEON register clobber fix 2015-02-16 13:27:41 +01:00
hevcdsp_idct_neon.S Merge commit '1bd890ad173d79e7906c5e1d06bf0a06cca4519d' 2017-01-31 15:31:34 +01:00
hevcdsp_init_arm.c hevcdsp: fix compilation for arm and aarch64 2015-03-12 20:01:01 +01:00
hevcdsp_init_neon.c Merge commit '1bd890ad173d79e7906c5e1d06bf0a06cca4519d' 2017-01-31 15:31:34 +01:00
hevcdsp_qpel_neon.S avcodec/hevcdsp: ARM NEON optimized qpel functions 2015-02-25 18:39:51 +01:00
hpeldsp_arm.h Merge commit '7151c5d04aed3b496c21f713dcb603e2cbdb9c49' 2014-01-14 14:38:10 +01:00
hpeldsp_arm.S Merge commit '831a1180785a786272cdcefb71566a770bfb879e' 2014-03-13 23:59:56 +01:00
hpeldsp_armv6.S Merge commit '61985ad72c47bbb668f2d3923bf5c9df83e79323' 2014-03-09 01:16:21 +01:00
hpeldsp_init_arm.c Merge commit '322a1dda973e802db7b57f2007fad3efcd5bab81' 2014-03-22 22:53:33 +01:00
hpeldsp_init_armv6.c Merge commit '7384b7a71338d960e421d6dc3d77da09b0a442cb' 2013-04-20 14:19:08 +02:00
hpeldsp_init_neon.c Merge commit '7384b7a71338d960e421d6dc3d77da09b0a442cb' 2013-04-20 14:19:08 +02:00
hpeldsp_neon.S arm: hpeldsp: Move half-pel assembly from dsputil to hpeldsp 2013-04-19 23:19:08 +03:00
idct.h Merge commit '4de8b60684ce13dff3e3d372dae4f49b9e53f755' 2014-07-21 01:56:22 +02:00
idctdsp_arm.h Merge commit 'e3fcb14347466095839c2a3c47ebecff02da891e' 2014-07-01 15:22:11 +02:00
idctdsp_arm.S avcodec/idctdsp: change {put,add}_pixels_clamped to ptrdiff_t line_size 2014-09-24 21:43:19 -03:00
idctdsp_armv6.S Merge commit 'e3fcb14347466095839c2a3c47ebecff02da891e' 2014-07-01 15:22:11 +02:00
idctdsp_init_arm.c Merge commit '7c6eb0a1b7bf1aac7f033a7ec6d8cacc3b5c2615' 2015-07-27 22:10:35 +02:00
idctdsp_init_armv5te.c Merge commit '4de8b60684ce13dff3e3d372dae4f49b9e53f755' 2014-07-21 01:56:22 +02:00
idctdsp_init_armv6.c Merge commit '7c6eb0a1b7bf1aac7f033a7ec6d8cacc3b5c2615' 2015-07-27 22:10:35 +02:00
idctdsp_init_neon.c avcodec/idctdsp: change {put,add}_pixels_clamped to ptrdiff_t line_size 2014-09-24 21:43:19 -03:00
idctdsp_neon.S Merge commit 'e3fcb14347466095839c2a3c47ebecff02da891e' 2014-07-01 15:22:11 +02:00
int_neon.S Merge commit '054013a0fc6f2b52c60cee3e051be8cc7f82cef3' 2014-05-30 00:59:15 +02:00
jrevdct_arm.S
lossless_audiodsp_init_arm.c apedsp: move to llauddsp 2014-06-05 20:31:59 +02:00
lossless_audiodsp_neon.S apedsp: move to llauddsp 2014-06-05 20:31:59 +02:00
Makefile arm: Add NEON optimizations for 10 and 12 bit vp9 loop filter 2017-01-24 22:35:59 +02:00
mathops.h
mdct_fixed_neon.S Reinstate proper FFmpeg license for all files. 2013-08-30 15:47:38 +00:00
mdct_neon.S Merge commit '5bcbb516f2ff45290ef7995b081762e668693672' 2014-02-08 00:48:26 +01:00
mdct_vfp.S armv6: Accelerate ff_imdct_half for general case (mdct_bits != 6) 2014-07-18 01:34:08 +03:00
me_cmp_armv6.S Merge commit '2d60444331fca1910510038dd3817bea885c2367' 2014-07-17 23:27:40 +02:00
me_cmp_init_arm.c Merge commit '9c12c6ff9539e926df0b2a2299e915ae71872600' 2014-11-24 12:13:00 +01:00
mlpdsp_armv5te.S Merge commit '4c81613df499ba81d64ea102b38d0c6686cc304c' 2014-12-10 00:51:26 +01:00
mlpdsp_armv6.S Merge commit '41ed7ab45fc693f7d7fc35664c0233f4c32d69bb' 2016-06-21 21:55:34 +02:00
mlpdsp_init_arm.c Merge remote-tracking branch 'qatar/master' 2014-03-26 21:23:09 +01:00
mpegaudiodsp_fixed_armv6.S Reinstate proper FFmpeg license for all files. 2013-08-30 15:47:38 +00:00
mpegaudiodsp_init_arm.c Reinstate proper FFmpeg license for all files. 2013-08-30 15:47:38 +00:00
mpegvideo_arm.c Merge commit '835f798c7d20bca89eb4f3593846251ad0d84e4b' 2014-08-15 20:11:56 +02:00
mpegvideo_arm.h Merge commit '835f798c7d20bca89eb4f3593846251ad0d84e4b' 2014-08-15 20:11:56 +02:00
mpegvideo_armv5te_s.S
mpegvideo_armv5te.c Merge commit '41ed7ab45fc693f7d7fc35664c0233f4c32d69bb' 2016-06-21 21:55:34 +02:00
mpegvideo_neon.S Merge commit '5bcbb516f2ff45290ef7995b081762e668693672' 2014-02-08 00:48:26 +01:00
mpegvideoencdsp_armv6.S Merge commit 'c166148409fe8f0dbccef2fe684286a40ba1e37d' 2014-07-07 15:36:58 +02:00
mpegvideoencdsp_init_arm.c Merge commit 'c166148409fe8f0dbccef2fe684286a40ba1e37d' 2014-07-07 15:36:58 +02:00
neon.S Reinstate proper FFmpeg license for all files. 2013-08-30 15:47:38 +00:00
neontest.c avcodec: fix arguments on xmm/neon clobber test wrappers 2016-10-02 02:15:47 -03:00
pixblockdsp_armv6.S Merge commit 'f46bb608d9d76c543e4929dc8cffe36b84bd789e' 2014-07-10 01:22:14 +02:00
pixblockdsp_init_arm.c avcodec: Change get_pixels() to ptrdiff_t linesize 2014-08-06 15:50:54 +02:00
rdft_init_arm.c arm/rdft_init: fix license header 2016-04-12 15:01:19 -03:00
rdft_neon.S
rv34dsp_init_arm.c
rv34dsp_neon.S Reinstate proper FFmpeg license for all files. 2013-08-30 15:47:38 +00:00
rv40dsp_init_arm.c Merge commit '7fb993d338d88f2f62e0a358b6c9f3eb9a3a08ac' 2014-07-25 13:05:08 +02:00
rv40dsp_neon.S Reinstate proper FFmpeg license for all files. 2013-08-30 15:47:38 +00:00
sbrdsp_init_arm.c Reinstate proper FFmpeg license for all files. 2013-08-30 15:47:38 +00:00
sbrdsp_neon.S Reinstate proper FFmpeg license for all files. 2013-08-30 15:47:38 +00:00
simple_idct_arm.S Merge commit '41ed7ab45fc693f7d7fc35664c0233f4c32d69bb' 2016-06-21 21:55:34 +02:00
simple_idct_armv5te.S Merge commit '014852e932dab6e9cf2a53e7a17ce8321f3e922c' 2017-03-19 16:12:07 +01:00
simple_idct_armv6.S
simple_idct_neon.S
startcode_armv6.S h264: Move start code search functions into separate source files. 2014-08-04 22:22:54 +02:00
startcode.h Merge commit 'db7f1c7c5a1d37e7f4da64a79a97bea1c4b6e9f8' 2014-08-05 12:46:10 +02:00
synth_filter_init_arm.c avcodec/synth_filter: split off remaining code from dcadec files 2016-01-25 14:57:38 -03:00
synth_filter_neon.S
synth_filter_vfp.S Merge commit '7e18a727d2c2a19f22fcf68875d1b05fd2eafcef' 2014-07-18 13:17:29 +02:00
vc1dsp_init_arm.c Fix compile error on arm4/arm5 platform 2014-09-23 21:11:05 +02:00
vc1dsp_init_neon.c Merge commit '896a5bff64264f4d01ed98eacc97a67260c1e17e' 2014-06-03 18:19:21 +02:00
vc1dsp_neon.S Merge commit '896a5bff64264f4d01ed98eacc97a67260c1e17e' 2014-06-03 18:19:21 +02:00
vc1dsp.h Merge commit '832e19063209a5f355af733d1a45f5051f49ce33' 2013-12-20 23:12:16 +01:00
videodsp_arm.h
videodsp_armv5te.S arm: use a local label instead of the function symbol in ff_prefetch_arm 2015-07-20 23:10:29 +02:00
videodsp_init_arm.c
videodsp_init_armv5te.c
vorbisdsp_init_arm.c
vorbisdsp_neon.S
vp3dsp_init_arm.c Merge commit '6892df9294d93322d43255ada299507465bc93c8' 2017-03-19 18:41:26 +01:00
vp3dsp_neon.S Merge remote-tracking branch 'qatar/master' 2014-01-08 05:44:56 +01:00
vp6dsp_init_arm.c Merge commit '721d57e608dc4fd6c86f27c5ae76ef559d646220' 2017-03-19 17:15:24 -03:00
vp6dsp_neon.S Merge commit '8506ff97c9ea4a1f52983497ecf8d4ef193403a9' 2013-08-24 11:04:11 +02:00
vp8_armv6.S
vp8.h arm: asm decode_block_coeffs_internal is vp8 specific 2014-04-04 10:39:29 +02:00
vp8dsp_armv6.S Merge commit '802727b538b484e3f9d1345bfcc4ab24cfea8898' 2017-03-19 15:18:31 -03:00
vp8dsp_init_arm.c Merge commit 'ac4b32df71bd932838043a4838b86d11e169707f' 2014-04-04 14:46:10 +02:00
vp8dsp_init_armv6.c Merge commit 'ac4b32df71bd932838043a4838b86d11e169707f' 2014-04-04 14:46:10 +02:00
vp8dsp_init_neon.c Merge commit 'ac4b32df71bd932838043a4838b86d11e169707f' 2014-04-04 14:46:10 +02:00
vp8dsp_neon.S Merge commit 'e8b96a77010dd62624c3c65c357d7ae3b397ceaa' 2016-11-14 15:21:49 +01:00
vp8dsp.h Merge commit 'ac4b32df71bd932838043a4838b86d11e169707f' 2014-04-04 14:46:10 +02:00
vp9dsp_init_10bpp_arm.c arm: Add NEON optimizations for 10 and 12 bit vp9 MC 2017-01-24 22:35:50 +02:00
vp9dsp_init_12bpp_arm.c arm: Add NEON optimizations for 10 and 12 bit vp9 MC 2017-01-24 22:35:50 +02:00
vp9dsp_init_16bpp_arm_template.c arm: Add NEON optimizations for 10 and 12 bit vp9 loop filter 2017-01-24 22:35:59 +02:00
vp9dsp_init_arm.c arm: vp9lpf: Implement the mix2_44 function with one single filter pass 2017-03-11 13:14:51 +02:00
vp9dsp_init.h arm: Add NEON optimizations for 10 and 12 bit vp9 MC 2017-01-24 22:35:50 +02:00
vp9itxfm_16bpp_neon.S arm: vp9itxfm16: Do a simpler half/quarter idct16/idct32 when possible 2017-03-19 22:54:33 +02:00
vp9itxfm_neon.S arm/aarch64: vp9: Fix vertical alignment 2017-03-19 22:53:32 +02:00
vp9lpf_16bpp_neon.S arm: Add NEON optimizations for 10 and 12 bit vp9 loop filter 2017-01-24 22:35:59 +02:00
vp9lpf_neon.S arm/aarch64: vp9: Fix vertical alignment 2017-03-19 22:53:32 +02:00
vp9mc_16bpp_neon.S arm: Add NEON optimizations for 10 and 12 bit vp9 MC 2017-01-24 22:35:50 +02:00
vp9mc_neon.S arm: vp9mc: Calculate less unused data in the 4 pixel wide horizontal filter 2017-03-11 13:14:47 +02:00
vp56_arith.h