Janne Grunau
|
28a8b5413b
|
h264/aarch64: add intra loop filter neon asm
Add my neon asm from x264 relicensed under the LGPL 2.1 or later. Ported
(x264 uses nv12 chroma) and optimized.
Cycle count for checkasm --bench on a Snapdragon 820e:
h264_h_loop_filter_luma_intra_8bpp_c: 60.0
h264_h_loop_filter_luma_intra_8bpp_neon: 54.2
h264_v_loop_filter_luma_intra_8bpp_c: 148.3
h264_v_loop_filter_luma_intra_8bpp_neon: 73.8
h264_h_loop_filter_chroma_intra_8bpp_c: 27.8
h264_h_loop_filter_chroma_intra_8bpp_neon: 21.4
h264_h_loop_filter_chroma_mbaff_intra_8bpp_c: 15.8
h264_h_loop_filter_chroma_mbaff_intra_8bpp_neon: 15.7
h264_v_loop_filter_chroma_intra_8bpp_c: 45.8
h264_v_loop_filter_chroma_intra_8bpp_neon: 17.3
|
2019-01-26 12:05:10 +01:00 |
|
Janne Grunau
|
846c3d6aca
|
h264/aarch64: optimize neon loop filter
Exit as soon as possible if no filtering will be done.
Improves the checkasm --bench cycle count on a Snapdragon 820e:
h264_h_loop_filter_luma_8bpp_c: 72.4 -> 72.5
h264_h_loop_filter_luma_8bpp_neon: 97.1 -> 56.3
h264_v_loop_filter_luma_8bpp_c: 174.0 -> 173.5
h264_v_loop_filter_luma_8bpp_neon: 62.9 -> 60.9
h264_h_loop_filter_chroma_8bpp_c: 30.2 -> 30.3
h264_h_loop_filter_chroma_8bpp_neon: 51.6 -> 25.7
h264_v_loop_filter_chroma_8bpp_c: 57.3 -> 57.3
h264_v_loop_filter_chroma_8bpp_neon: 28.0 -> 24.0
|
2019-01-26 12:05:10 +01:00 |
|
Janne Grunau
|
bb515e3a73
|
h264/aarch64: sign extend int stride in loop filter asm
|
2019-01-26 12:05:10 +01:00 |
|
Janne Grunau
|
f896bca03f
|
aarch64: h264 (bi)weight NEON optimizations
Ported from ARMv7 NEON.
|
2014-01-15 12:31:07 +01:00 |
|
Janne Grunau
|
36e3b1f2fd
|
aarch64: h264 loop filter NEON optimizations
Ported from ARMv7 NEON.
|
2014-01-15 12:31:04 +01:00 |
|