FFmpeg

mirror of https://git.ffmpeg.org/ffmpeg.git synced 2024-09-20 21:36:48 +00:00

History

Martin Storsjö dd299a2d6d arm: vp9: Add NEON loop filters This work is sponsored by, and copyright, Google. The implementation tries to have smart handling of cases where no pixels need the full filtering for the 8/16 width filters, skipping both calculation and writeback of the unmodified pixels in those cases. The actual effect of this is hard to test with checkasm though, since it tests the full filtering, and the benefit depends on how many filtered blocks use the shortcut. Examples of relative speedup compared to the C version, from checkasm: Cortex A7 A8 A9 A53 vp9_loop_filter_h_4_8_neon: 2.72 2.68 1.78 3.15 vp9_loop_filter_h_8_8_neon: 2.36 2.38 1.70 2.91 vp9_loop_filter_h_16_8_neon: 1.80 1.89 1.45 2.01 vp9_loop_filter_h_16_16_neon: 2.81 2.78 2.18 3.16 vp9_loop_filter_mix2_h_44_16_neon: 2.65 2.67 1.93 3.05 vp9_loop_filter_mix2_h_48_16_neon: 2.46 2.38 1.81 2.85 vp9_loop_filter_mix2_h_84_16_neon: 2.50 2.41 1.73 2.85 vp9_loop_filter_mix2_h_88_16_neon: 2.77 2.66 1.96 3.23 vp9_loop_filter_mix2_v_44_16_neon: 4.28 4.46 3.22 5.70 vp9_loop_filter_mix2_v_48_16_neon: 3.92 4.00 3.03 5.19 vp9_loop_filter_mix2_v_84_16_neon: 3.97 4.31 2.98 5.33 vp9_loop_filter_mix2_v_88_16_neon: 3.91 4.19 3.06 5.18 vp9_loop_filter_v_4_8_neon: 4.53 4.47 3.31 6.05 vp9_loop_filter_v_8_8_neon: 3.58 3.99 2.92 5.17 vp9_loop_filter_v_16_8_neon: 3.40 3.50 2.81 4.68 vp9_loop_filter_v_16_16_neon: 4.66 4.41 3.74 6.02 The speedup vs C code is around 2-6x. The numbers are quite inconclusive though, since the checkasm test runs multiple filterings on top of each other, so later rounds might end up with different codepaths (different decisions on which filter to apply, based on input pixel differences). Disabling the early-exit in the asm doesn't give a fair comparison either though, since the C code only does the necessary calcuations for each row. Based on START_TIMER/STOP_TIMER wrapping around a few individual functions, the speedup vs C code is around 4-9x. This is pretty similar in runtime to the corresponding routines in libvpx. (This is comparing vpx_lpf_vertical_16_neon, vpx_lpf_horizontal_edge_8_neon and vpx_lpf_horizontal_edge_16_neon to vp9_loop_filter_h_16_8_neon, vp9_loop_filter_v_16_8_neon and vp9_loop_filter_v_16_16_neon - note that the naming of horizonal and vertical is flipped between the libraries.) In order to have stable, comparable numbers, the early exits in both asm versions were disabled, forcing the full filtering codepath. Cortex A7 A8 A9 A53 vp9_loop_filter_h_16_8_neon: 597.2 472.0 482.4 415.0 libvpx vpx_lpf_vertical_16_neon: 626.0 464.5 470.7 445.0 vp9_loop_filter_v_16_8_neon: 500.2 422.5 429.7 295.0 libvpx vpx_lpf_horizontal_edge_8_neon: 586.5 414.5 415.6 383.2 vp9_loop_filter_v_16_16_neon: 905.0 784.7 791.5 546.0 libvpx vpx_lpf_horizontal_edge_16_neon: 1060.2 751.7 743.5 685.2 Our version is consistently faster on on A7 and A53, marginally slower on A8, and sometimes faster, sometimes slower on A9 (marginally slower in all three tests in this particular test run). Signed-off-by: Martin Storsjö <martin@martin.st>		2016-11-11 14:16:42 +02:00
..
aac.h
aacpsdsp_init_arm.c
aacpsdsp_neon.S
ac3dsp_arm.S
ac3dsp_armv6.S
ac3dsp_init_arm.c
ac3dsp_neon.S
apedsp_init_arm.c	dsputil: Move APE-specific bits into apedsp	2014-05-29 06:41:15 -07:00
apedsp_neon.S	dsputil: Move APE-specific bits into apedsp	2014-05-29 06:41:15 -07:00
asm-offsets.h	mpegvideo: move the MpegEncContext fields used from arm asm to the beginning	2014-04-29 14:49:42 +02:00
audiodsp_arm.h	dsputil: Split audio operations off into a separate context	2014-06-22 06:20:15 -07:00
audiodsp_init_arm.c	dsputil: Split audio operations off into a separate context	2014-06-22 06:20:15 -07:00
audiodsp_init_neon.c	audiodsp: reorder arguments for vector_clipf	2016-09-22 09:47:52 +02:00
audiodsp_neon.S	audiodsp: reorder arguments for vector_clipf	2016-09-22 09:47:52 +02:00
blockdsp_arm.h	blockdsp: drop the high_bit_depth parameter	2016-09-22 09:47:52 +02:00
blockdsp_init_arm.c	blockdsp: drop the high_bit_depth parameter	2016-09-22 09:47:52 +02:00
blockdsp_init_neon.c	blockdsp: drop the high_bit_depth parameter	2016-09-22 09:47:52 +02:00
blockdsp_neon.S	dsputil: Split clear_block/fill_block off into a separate context	2014-06-18 14:07:23 -07:00
cabac.h	arm: get_cabac inline asm	2014-03-09 00:45:34 +01:00
dca.h	dcadec: simplify decoding of VQ high frequencies	2014-02-28 13:03:22 +01:00
dcadsp_init_arm.c	dca: remove unused decode_hf function and quant_d tables	2015-12-24 13:58:18 +01:00
dcadsp_neon.S	dca: remove unused decode_hf function and quant_d tables	2015-12-24 13:58:18 +01:00
dcadsp_vfp.S	dcadec: remove scaling in lfe_interpolation_fir	2014-02-28 13:00:47 +01:00
fft_fixed_init_arm.c	fft: Split MDCT bits off from FFT	2016-03-01 10:18:28 +01:00
fft_fixed_neon.S	arm: Use .data.rel.ro for const data with relocations	2014-12-09 11:43:25 +02:00
fft_init_arm.c	fft: Split MDCT bits off from FFT	2016-03-01 10:18:28 +01:00
fft_neon.S	arm: Use .data.rel.ro for const data with relocations	2014-12-09 11:43:25 +02:00
fft_vfp.S	arm: Use .data.rel.ro for const data with relocations	2014-12-09 11:43:25 +02:00
flacdsp_arm.S
flacdsp_init_arm.c
fmtconvert_init_arm.c	arm: add ff_int32_to_float_fmul_array8_neon	2015-12-14 16:45:02 +01:00
fmtconvert_neon.S	arm: add ff_int32_to_float_fmul_array8_neon	2015-12-14 16:45:02 +01:00
fmtconvert_vfp.S
g722dsp_init_arm.c	g722: Add ARM NEON implementation for g722_apply_qmf()	2015-02-15 22:47:21 +02:00
g722dsp_neon.S	g722: Add ARM NEON implementation for g722_apply_qmf()	2015-02-15 22:47:21 +02:00
h264chroma_init_arm.c	h264chroma: Change type of stride parameters to ptrdiff_t	2016-09-29 14:48:04 +02:00
h264cmc_neon.S	h264chroma: Change type of stride parameters to ptrdiff_t	2016-09-29 14:48:04 +02:00
h264dsp_init_arm.c	h264: Move start code search functions into separate source files.	2014-08-04 22:22:54 +02:00
h264dsp_neon.S
h264idct_neon.S	arm: Add X() around all references to extern symbols	2014-02-07 15:13:58 +02:00
h264pred_init_arm.c	h264: arm: use intra pred8x8 functions only for chroma_format_idc <= 1	2015-07-18 00:28:49 +02:00
h264pred_neon.S
h264qpel_init_arm.c	qpeldsp: Mark source pointer in qpel_mc_func function pointer const	2014-07-25 02:52:54 -07:00
h264qpel_neon.S
hpeldsp_arm.h	arm: Use full filenames as multiple inclusion guards	2014-01-14 00:04:52 +01:00
hpeldsp_arm.S	hpeldsp: arm: Update comments left behind in `25841dfe80`	2016-09-29 14:48:03 +02:00
hpeldsp_armv6.S	arm: hpeldsp: fix put_pixels8_y2_{,no_rnd_}armv6	2014-03-08 18:31:57 +01:00
hpeldsp_init_arm.c	dsputil: Refactor duplicated CALL_2X_PIXELS / PIXELS16 macros	2014-03-22 06:17:29 -07:00
hpeldsp_init_armv6.c
hpeldsp_init_neon.c
hpeldsp_neon.S
idct.h	idct: Change type of array stride parameters to ptrdiff_t	2016-09-29 14:48:03 +02:00
idctdsp_arm.h	dsputil: Split off IDCT bits into their own context	2014-06-30 07:58:46 -07:00
idctdsp_arm.S	idct: Change type of array stride parameters to ptrdiff_t	2016-09-29 14:48:03 +02:00
idctdsp_armv6.S	dsputil: Split off IDCT bits into their own context	2014-06-30 07:58:46 -07:00
idctdsp_init_arm.c	idct: Change type of array stride parameters to ptrdiff_t	2016-09-29 14:48:03 +02:00
idctdsp_init_armv5te.c	idct: Move arm-specific declarations to a header in the arm directory	2014-07-20 13:02:17 -07:00
idctdsp_init_armv6.c	idct: Change type of array stride parameters to ptrdiff_t	2016-09-29 14:48:03 +02:00
idctdsp_init_neon.c	idct: Move arm-specific declarations to a header in the arm directory	2014-07-20 13:02:17 -07:00
idctdsp_neon.S	dsputil: Split off IDCT bits into their own context	2014-06-30 07:58:46 -07:00
int_neon.S	dsputil: Move APE-specific bits into apedsp	2014-05-29 06:41:15 -07:00
jrevdct_arm.S
Makefile	arm: vp9: Add NEON loop filters	2016-11-11 14:16:42 +02:00
mathops.h
mdct_fixed_init_arm.c	fft: Split MDCT bits off from FFT	2016-03-01 10:18:28 +01:00
mdct_fixed_neon.S
mdct_init_arm.c	fft: Split MDCT bits off from FFT	2016-03-01 10:18:28 +01:00
mdct_neon.S	arm: Add X() around all references to extern symbols	2014-02-07 15:13:58 +02:00
mdct_vfp.S	armv6: Accelerate ff_imdct_half for general case (mdct_bits != 6)	2014-07-18 01:34:08 +03:00
me_cmp_armv6.S	dsputil: Split motion estimation compare bits off into their own context	2014-07-17 09:07:10 -07:00
me_cmp_init_arm.c	motion_est: convert stride to ptrdiff_t	2014-11-24 01:30:10 +00:00
mlpdsp_armv5te.S	arm: mlpdsp: handle pic offset calculation in a macro	2014-12-09 22:00:08 +01:00
mlpdsp_armv6.S	cosmetics: Fix spelling mistakes	2016-05-04 18:16:21 +02:00
mlpdsp_init_arm.c	truehd: add hand-scheduled ARM asm version of ff_mlp_pack_output.	2014-03-26 19:54:32 +02:00
mpegaudiodsp_fixed_armv6.S
mpegaudiodsp_init_arm.c
mpegvideo_arm.c	mpegvideo: cosmetics: Lowercase ugly uppercase MPV_ function name prefixes	2014-08-15 01:26:33 -07:00
mpegvideo_arm.h	mpegvideo: cosmetics: Lowercase ugly uppercase MPV_ function name prefixes	2014-08-15 01:26:33 -07:00
mpegvideo_armv5te_s.S
mpegvideo_armv5te.c	cosmetics: Fix spelling mistakes	2016-05-04 18:16:21 +02:00
mpegvideo_neon.S	arm: Add X() around all references to extern symbols	2014-02-07 15:13:58 +02:00
mpegvideoencdsp_armv6.S	dsputil: Move pix_sum, pix_norm1, shrink function pointers to mpegvideoenc	2014-07-06 14:26:53 -07:00
mpegvideoencdsp_init_arm.c	dsputil: Move pix_sum, pix_norm1, shrink function pointers to mpegvideoenc	2014-07-06 14:26:53 -07:00
neon.S
neontest.c	lavc: add clobber tests for the new encoding/decoding API	2016-09-28 10:01:52 +02:00
pixblockdsp_armv6.S	dsputil: Split off pixel block routines into their own context	2014-07-09 08:05:26 -07:00
pixblockdsp_init_arm.c	pixblockdsp: Change type of stride parameters to ptrdiff_t	2016-09-14 14:12:36 +02:00
rdft_init_arm.c	rdft: arm: Split RDFT initialization into a separate file	2016-02-26 14:34:58 +01:00
rdft_neon.S
rv34dsp_init_arm.c
rv34dsp_neon.S
rv40dsp_init_arm.c	qpeldsp: Mark source pointer in qpel_mc_func function pointer const	2014-07-25 02:52:54 -07:00
rv40dsp_neon.S
sbrdsp_init_arm.c
sbrdsp_neon.S
simple_idct_arm.S	cosmetics: Fix spelling mistakes	2016-05-04 18:16:21 +02:00
simple_idct_armv5te.S	simple_idct: arm: Drop disabled code variant	2016-08-17 12:21:54 +02:00
simple_idct_armv6.S	idct: Change type of array stride parameters to ptrdiff_t	2016-09-29 14:48:03 +02:00
simple_idct_neon.S	idct: Change type of array stride parameters to ptrdiff_t	2016-09-29 14:48:03 +02:00
startcode_armv6.S	h264: Move start code search functions into separate source files.	2014-08-04 22:22:54 +02:00
startcode.h	h264: Move start code search functions into separate source files.	2014-08-04 22:22:54 +02:00
synth_filter_neon.S
synth_filter_vfp.S	arm: cosmetics: Consistently use lowercase for shift operators	2014-07-18 11:17:40 +03:00
vc1dsp_init_arm.c	vc-1: Add platform-specific start code search routine to VC1DSPContext.	2014-08-04 22:22:54 +02:00
vc1dsp_init_neon.c	h264chroma: Change type of stride parameters to ptrdiff_t	2016-09-29 14:48:04 +02:00
vc1dsp_neon.S	idct: Change type of array stride parameters to ptrdiff_t	2016-09-29 14:48:03 +02:00
vc1dsp.h
videodsp_arm.h
videodsp_armv5te.S	arm: use a local label instead of the function symbol in ff_prefetch_arm	2015-07-20 23:10:29 +02:00
videodsp_init_arm.c
videodsp_init_armv5te.c
vorbisdsp_init_arm.c
vorbisdsp_neon.S
vp3dsp_init_arm.c	vp3: Change type of stride parameters to ptrdiff_t	2016-08-26 11:36:26 +02:00
vp3dsp_neon.S	arm: Add a missing # as prefix for an immediate constant	2014-01-07 19:30:13 +02:00
vp6dsp_init_arm.c	vp56: Separate VP5 and VP6 dsp initialization	2016-08-26 11:50:22 +02:00
vp6dsp_neon.S
vp8_armv6.S
vp8.h	arm: asm decode_block_coeffs_internal is vp8 specific	2014-04-04 10:39:29 +02:00
vp8dsp_armv6.S	vp8: Update some assembly comments left unchanged in `bd66f073fe`	2016-08-26 11:36:53 +02:00
vp8dsp_init_arm.c	On2 VP7 decoder	2014-04-04 04:00:11 +02:00
vp8dsp_init_armv6.c	On2 VP7 decoder	2014-04-04 04:00:11 +02:00
vp8dsp_init_neon.c	On2 VP7 decoder	2014-04-04 04:00:11 +02:00
vp8dsp_neon.S	arm: Fix a typo in a comment	2016-07-06 22:58:51 +03:00
vp8dsp.h	On2 VP7 decoder	2014-04-04 04:00:11 +02:00
vp9dsp_init_arm.c	arm: vp9: Add NEON loop filters	2016-11-11 14:16:42 +02:00
vp9itxfm_neon.S	arm: vp9: Add NEON itxfm routines	2016-11-11 11:09:05 +02:00
vp9lpf_neon.S	arm: vp9: Add NEON loop filters	2016-11-11 14:16:42 +02:00
vp9mc_neon.S	arm: vp9mc: Use a different helper register for PIC loads	2016-11-10 14:01:04 +02:00
vp56_arith.h