FFmpeg

mirror of https://git.ffmpeg.org/ffmpeg.git synced 2024-09-20 21:36:48 +00:00

History

Martin Storsjö 5eb5aec475 arm: vp9itxfm: Do a simpler half/quarter idct16/idct32 when possible This work is sponsored by, and copyright, Google. This avoids loading and calculating coefficients that we know will be zero, and avoids filling the temp buffer with zeros in places where we know the second pass won't read. This gives a pretty substantial speedup for the smaller subpartitions. The code size increases from 12388 bytes to 19784 bytes. The idct16/32_end macros are moved above the individual functions; the instructions themselves are unchanged, but since new functions are added at the same place where the code is moved from, the diff looks rather messy. Before: Cortex A7 A8 A9 A53 vp9_inv_dct_dct_16x16_sub1_add_neon: 273.0 189.5 212.0 235.8 vp9_inv_dct_dct_16x16_sub2_add_neon: 2102.1 1521.7 1736.2 1265.8 vp9_inv_dct_dct_16x16_sub4_add_neon: 2104.5 1533.0 1736.6 1265.5 vp9_inv_dct_dct_16x16_sub8_add_neon: 2484.8 1828.7 2014.4 1506.5 vp9_inv_dct_dct_16x16_sub12_add_neon: 2851.2 2117.8 2294.8 1753.2 vp9_inv_dct_dct_16x16_sub16_add_neon: 3239.4 2408.3 2543.5 1994.9 vp9_inv_dct_dct_32x32_sub1_add_neon: 758.3 456.7 864.5 553.9 vp9_inv_dct_dct_32x32_sub2_add_neon: 10776.7 7949.8 8567.7 6819.7 vp9_inv_dct_dct_32x32_sub4_add_neon: 10865.6 8131.5 8589.6 6816.3 vp9_inv_dct_dct_32x32_sub8_add_neon: 12053.9 9271.3 9387.7 7564.0 vp9_inv_dct_dct_32x32_sub12_add_neon: 13328.3 10463.2 10217.0 8321.3 vp9_inv_dct_dct_32x32_sub16_add_neon: 14176.4 11509.5 11018.7 9062.3 vp9_inv_dct_dct_32x32_sub20_add_neon: 15301.5 12999.9 11855.1 9828.2 vp9_inv_dct_dct_32x32_sub24_add_neon: 16482.7 14931.5 12650.1 10575.0 vp9_inv_dct_dct_32x32_sub28_add_neon: 17589.5 15811.9 13482.8 11333.4 vp9_inv_dct_dct_32x32_sub32_add_neon: 18696.2 17049.2 14355.6 12089.7 After: vp9_inv_dct_dct_16x16_sub1_add_neon: 273.0 189.5 211.7 235.8 vp9_inv_dct_dct_16x16_sub2_add_neon: 1203.5 998.2 1035.3 763.0 vp9_inv_dct_dct_16x16_sub4_add_neon: 1203.5 998.1 1035.5 760.8 vp9_inv_dct_dct_16x16_sub8_add_neon: 1926.1 1610.6 1722.1 1271.7 vp9_inv_dct_dct_16x16_sub12_add_neon: 2873.2 2129.7 2285.1 1757.3 vp9_inv_dct_dct_16x16_sub16_add_neon: 3221.4 2520.3 2557.6 2002.1 vp9_inv_dct_dct_32x32_sub1_add_neon: 753.0 457.5 866.6 554.6 vp9_inv_dct_dct_32x32_sub2_add_neon: 7554.6 5652.4 6048.4 4920.2 vp9_inv_dct_dct_32x32_sub4_add_neon: 7549.9 5685.0 6046.9 4925.7 vp9_inv_dct_dct_32x32_sub8_add_neon: 8336.9 6704.5 6604.0 5478.0 vp9_inv_dct_dct_32x32_sub12_add_neon: 10914.0 9777.2 9240.4 7416.9 vp9_inv_dct_dct_32x32_sub16_add_neon: 11859.2 11223.3 9966.3 8095.1 vp9_inv_dct_dct_32x32_sub20_add_neon: 15237.1 13029.4 11838.3 9829.4 vp9_inv_dct_dct_32x32_sub24_add_neon: 16293.2 14379.8 12644.9 10572.0 vp9_inv_dct_dct_32x32_sub28_add_neon: 17424.3 15734.7 13473.0 11326.9 vp9_inv_dct_dct_32x32_sub32_add_neon: 18531.3 17457.0 14298.6 12080.0 Signed-off-by: Martin Storsjö <martin@martin.st>		2017-02-09 12:32:00 +02:00
..
aac.h
aacpsdsp_init_arm.c
aacpsdsp_neon.S
ac3dsp_arm.S
ac3dsp_armv6.S
ac3dsp_init_arm.c
ac3dsp_neon.S
apedsp_init_arm.c	dsputil: Move APE-specific bits into apedsp	2014-05-29 06:41:15 -07:00
apedsp_neon.S	dsputil: Move APE-specific bits into apedsp	2014-05-29 06:41:15 -07:00
asm-offsets.h	mpegvideo: move the MpegEncContext fields used from arm asm to the beginning	2014-04-29 14:49:42 +02:00
audiodsp_arm.h	dsputil: Split audio operations off into a separate context	2014-06-22 06:20:15 -07:00
audiodsp_init_arm.c	dsputil: Split audio operations off into a separate context	2014-06-22 06:20:15 -07:00
audiodsp_init_neon.c	audiodsp: reorder arguments for vector_clipf	2016-09-22 09:47:52 +02:00
audiodsp_neon.S	audiodsp: reorder arguments for vector_clipf	2016-09-22 09:47:52 +02:00
blockdsp_arm.h	blockdsp: drop the high_bit_depth parameter	2016-09-22 09:47:52 +02:00
blockdsp_init_arm.c	blockdsp: drop the high_bit_depth parameter	2016-09-22 09:47:52 +02:00
blockdsp_init_neon.c	blockdsp: drop the high_bit_depth parameter	2016-09-22 09:47:52 +02:00
blockdsp_neon.S	dsputil: Split clear_block/fill_block off into a separate context	2014-06-18 14:07:23 -07:00
cabac.h
dca.h
dcadsp_init_arm.c	dca: remove unused decode_hf function and quant_d tables	2015-12-24 13:58:18 +01:00
dcadsp_neon.S	dca: remove unused decode_hf function and quant_d tables	2015-12-24 13:58:18 +01:00
dcadsp_vfp.S
fft_fixed_init_arm.c	fft: Split MDCT bits off from FFT	2016-03-01 10:18:28 +01:00
fft_fixed_neon.S	arm: Use .data.rel.ro for const data with relocations	2014-12-09 11:43:25 +02:00
fft_init_arm.c	fft: Split MDCT bits off from FFT	2016-03-01 10:18:28 +01:00
fft_neon.S	arm: Use .data.rel.ro for const data with relocations	2014-12-09 11:43:25 +02:00
fft_vfp.S	arm: Use .data.rel.ro for const data with relocations	2014-12-09 11:43:25 +02:00
flacdsp_arm.S
flacdsp_init_arm.c
fmtconvert_init_arm.c	arm: add ff_int32_to_float_fmul_array8_neon	2015-12-14 16:45:02 +01:00
fmtconvert_neon.S	arm: add ff_int32_to_float_fmul_array8_neon	2015-12-14 16:45:02 +01:00
fmtconvert_vfp.S
g722dsp_init_arm.c	g722: Add ARM NEON implementation for g722_apply_qmf()	2015-02-15 22:47:21 +02:00
g722dsp_neon.S	g722: Add ARM NEON implementation for g722_apply_qmf()	2015-02-15 22:47:21 +02:00
h264chroma_init_arm.c	h264chroma: Change type of stride parameters to ptrdiff_t	2016-09-29 14:48:04 +02:00
h264cmc_neon.S	h264chroma: Change type of stride parameters to ptrdiff_t	2016-09-29 14:48:04 +02:00
h264dsp_init_arm.c	h264: Move start code search functions into separate source files.	2014-08-04 22:22:54 +02:00
h264dsp_neon.S
h264idct_neon.S
h264pred_init_arm.c	h264: arm: use intra pred8x8 functions only for chroma_format_idc <= 1	2015-07-18 00:28:49 +02:00
h264pred_neon.S
h264qpel_init_arm.c	qpeldsp: Mark source pointer in qpel_mc_func function pointer const	2014-07-25 02:52:54 -07:00
h264qpel_neon.S
hpeldsp_arm.h
hpeldsp_arm.S	hpeldsp: arm: Update comments left behind in `25841dfe80`	2016-09-29 14:48:03 +02:00
hpeldsp_armv6.S
hpeldsp_init_arm.c
hpeldsp_init_armv6.c
hpeldsp_init_neon.c
hpeldsp_neon.S
idct.h	idct: Change type of array stride parameters to ptrdiff_t	2016-09-29 14:48:03 +02:00
idctdsp_arm.h	dsputil: Split off IDCT bits into their own context	2014-06-30 07:58:46 -07:00
idctdsp_arm.S	idct: Change type of array stride parameters to ptrdiff_t	2016-09-29 14:48:03 +02:00
idctdsp_armv6.S	dsputil: Split off IDCT bits into their own context	2014-06-30 07:58:46 -07:00
idctdsp_init_arm.c	idct: Change type of array stride parameters to ptrdiff_t	2016-09-29 14:48:03 +02:00
idctdsp_init_armv5te.c	idct: Move arm-specific declarations to a header in the arm directory	2014-07-20 13:02:17 -07:00
idctdsp_init_armv6.c	idct: Change type of array stride parameters to ptrdiff_t	2016-09-29 14:48:03 +02:00
idctdsp_init_neon.c	idct: Move arm-specific declarations to a header in the arm directory	2014-07-20 13:02:17 -07:00
idctdsp_neon.S	dsputil: Split off IDCT bits into their own context	2014-06-30 07:58:46 -07:00
int_neon.S	dsputil: Move APE-specific bits into apedsp	2014-05-29 06:41:15 -07:00
jrevdct_arm.S
Makefile	arm: vp9: Add NEON loop filters	2016-11-11 14:16:42 +02:00
mathops.h
mdct_fixed_init_arm.c	fft: Split MDCT bits off from FFT	2016-03-01 10:18:28 +01:00
mdct_fixed_neon.S
mdct_init_arm.c	fft: Split MDCT bits off from FFT	2016-03-01 10:18:28 +01:00
mdct_neon.S
mdct_vfp.S	armv6: Accelerate ff_imdct_half for general case (mdct_bits != 6)	2014-07-18 01:34:08 +03:00
me_cmp_armv6.S	dsputil: Split motion estimation compare bits off into their own context	2014-07-17 09:07:10 -07:00
me_cmp_init_arm.c	motion_est: convert stride to ptrdiff_t	2014-11-24 01:30:10 +00:00
mlpdsp_armv5te.S	arm: mlpdsp: handle pic offset calculation in a macro	2014-12-09 22:00:08 +01:00
mlpdsp_armv6.S	cosmetics: Fix spelling mistakes	2016-05-04 18:16:21 +02:00
mlpdsp_init_arm.c	truehd: add hand-scheduled ARM asm version of ff_mlp_pack_output.	2014-03-26 19:54:32 +02:00
mpegaudiodsp_fixed_armv6.S
mpegaudiodsp_init_arm.c	Add av_cold attributes to arch-specific init functions	2013-02-05 17:01:05 +01:00
mpegvideo_arm.c	mpegvideo: cosmetics: Lowercase ugly uppercase MPV_ function name prefixes	2014-08-15 01:26:33 -07:00
mpegvideo_arm.h	mpegvideo: cosmetics: Lowercase ugly uppercase MPV_ function name prefixes	2014-08-15 01:26:33 -07:00
mpegvideo_armv5te_s.S
mpegvideo_armv5te.c	cosmetics: Fix spelling mistakes	2016-05-04 18:16:21 +02:00
mpegvideo_neon.S
mpegvideoencdsp_armv6.S	dsputil: Move pix_sum, pix_norm1, shrink function pointers to mpegvideoenc	2014-07-06 14:26:53 -07:00
mpegvideoencdsp_init_arm.c	dsputil: Move pix_sum, pix_norm1, shrink function pointers to mpegvideoenc	2014-07-06 14:26:53 -07:00
neon.S
neontest.c	lavc: add clobber tests for the new encoding/decoding API	2016-09-28 10:01:52 +02:00
pixblockdsp_armv6.S	dsputil: Split off pixel block routines into their own context	2014-07-09 08:05:26 -07:00
pixblockdsp_init_arm.c	pixblockdsp: Change type of stride parameters to ptrdiff_t	2016-09-14 14:12:36 +02:00
rdft_init_arm.c	rdft: arm: Split RDFT initialization into a separate file	2016-02-26 14:34:58 +01:00
rdft_neon.S	ARM: set Tag_ABI_align_preserved in all asm files	2012-10-02 19:47:56 +01:00
rv34dsp_init_arm.c
rv34dsp_neon.S	Drop DCTELEM typedef	2013-01-22 18:32:56 -08:00
rv40dsp_init_arm.c	qpeldsp: Mark source pointer in qpel_mc_func function pointer const	2014-07-25 02:52:54 -07:00
rv40dsp_neon.S
sbrdsp_init_arm.c
sbrdsp_neon.S
simple_idct_arm.S	cosmetics: Fix spelling mistakes	2016-05-04 18:16:21 +02:00
simple_idct_armv5te.S	simple_idct: arm: Drop disabled code variant	2016-08-17 12:21:54 +02:00
simple_idct_armv6.S	idct: Change type of array stride parameters to ptrdiff_t	2016-09-29 14:48:03 +02:00
simple_idct_neon.S	idct: Change type of array stride parameters to ptrdiff_t	2016-09-29 14:48:03 +02:00
startcode_armv6.S	h264: Move start code search functions into separate source files.	2014-08-04 22:22:54 +02:00
startcode.h	h264: Move start code search functions into separate source files.	2014-08-04 22:22:54 +02:00
synth_filter_neon.S
synth_filter_vfp.S	arm: cosmetics: Consistently use lowercase for shift operators	2014-07-18 11:17:40 +03:00
vc1dsp_init_arm.c	vc-1: Add platform-specific start code search routine to VC1DSPContext.	2014-08-04 22:22:54 +02:00
vc1dsp_init_neon.c	h264chroma: Change type of stride parameters to ptrdiff_t	2016-09-29 14:48:04 +02:00
vc1dsp_neon.S	idct: Change type of array stride parameters to ptrdiff_t	2016-09-29 14:48:03 +02:00
vc1dsp.h
videodsp_arm.h
videodsp_armv5te.S	arm: use a local label instead of the function symbol in ff_prefetch_arm	2015-07-20 23:10:29 +02:00
videodsp_init_arm.c
videodsp_init_armv5te.c
vorbisdsp_init_arm.c
vorbisdsp_neon.S
vp3dsp_init_arm.c	vp3: Change type of stride parameters to ptrdiff_t	2016-08-26 11:36:26 +02:00
vp3dsp_neon.S
vp6dsp_init_arm.c	vp56: Separate VP5 and VP6 dsp initialization	2016-08-26 11:50:22 +02:00
vp6dsp_neon.S
vp8_armv6.S
vp8.h	arm: asm decode_block_coeffs_internal is vp8 specific	2014-04-04 10:39:29 +02:00
vp8dsp_armv6.S	vp8: Update some assembly comments left unchanged in `bd66f073fe`	2016-08-26 11:36:53 +02:00
vp8dsp_init_arm.c	On2 VP7 decoder	2014-04-04 04:00:11 +02:00
vp8dsp_init_armv6.c	On2 VP7 decoder	2014-04-04 04:00:11 +02:00
vp8dsp_init_neon.c	On2 VP7 decoder	2014-04-04 04:00:11 +02:00
vp8dsp_neon.S	arm: Fix a typo in a comment	2016-07-06 22:58:51 +03:00
vp8dsp.h	On2 VP7 decoder	2014-04-04 04:00:11 +02:00
vp9dsp_init_arm.c	arm: vp9: Add NEON loop filters	2016-11-11 14:16:42 +02:00
vp9itxfm_neon.S	arm: vp9itxfm: Do a simpler half/quarter idct16/idct32 when possible	2017-02-09 12:32:00 +02:00
vp9lpf_neon.S	arm: vp9: Add NEON loop filters	2016-11-11 14:16:42 +02:00
vp9mc_neon.S	arm: vp9mc: Fix vertical alignment of operands	2017-01-03 14:15:45 +02:00
vp56_arith.h