FFmpeg

mirror of https://git.ffmpeg.org/ffmpeg.git synced 2024-10-19 21:13:25 +00:00

Author	SHA1	Message	Date
Reimar Döffinger	dcff15692d	hevcdsp_idct_neon.S: Avoid unnecessary mov. ret can be given an argument instead. This is also consistent with how other assembler code in FFmpeg does it. Signed-off-by: Reimar Döffinger <Reimar.Doeffinger@gmx.de>	2023-07-29 16:05:23 +02:00
Rémi Denis-Courmont	82cb4b1c05	lavc/aarch64: remove bogus HAVE_VFP guard The IMDCT offset is only relevant for NEON optimisations. There are no VFP optimisations here that would justify the HAVE_VFP flag. In practice, this makes no difference because HAVE_NEON is practically always true for standard Armv8 platforms.	2023-07-15 22:56:30 +03:00
Logan Lyu	9557bf26b3	lavc/aarch64: new optimization for 8-bit hevc_epel_uni_w_hv put_hevc_epel_uni_w_hv4_8_c: 254.6 put_hevc_epel_uni_w_hv4_8_i8mm: 102.9 put_hevc_epel_uni_w_hv6_8_c: 411.6 put_hevc_epel_uni_w_hv6_8_i8mm: 221.6 put_hevc_epel_uni_w_hv8_8_c: 669.4 put_hevc_epel_uni_w_hv8_8_i8mm: 214.9 put_hevc_epel_uni_w_hv12_8_c: 1412.6 put_hevc_epel_uni_w_hv12_8_i8mm: 481.4 put_hevc_epel_uni_w_hv16_8_c: 2425.4 put_hevc_epel_uni_w_hv16_8_i8mm: 647.4 put_hevc_epel_uni_w_hv24_8_c: 5384.1 put_hevc_epel_uni_w_hv24_8_i8mm: 1450.6 put_hevc_epel_uni_w_hv32_8_c: 9470.9 put_hevc_epel_uni_w_hv32_8_i8mm: 2497.1 put_hevc_epel_uni_w_hv48_8_c: 20930.1 put_hevc_epel_uni_w_hv48_8_i8mm: 5635.9 put_hevc_epel_uni_w_hv64_8_c: 36682.9 put_hevc_epel_uni_w_hv64_8_i8mm: 9712.6 Signed-off-by: Martin Storsjö <martin@martin.st>	2023-07-14 21:19:12 +03:00
Logan Lyu	d48c89701c	lavc/aarch64: new optimization for 8-bit hevc_epel_h put_hevc_epel_h4_8_c: 67.1 put_hevc_epel_h4_8_i8mm: 21.1 put_hevc_epel_h6_8_c: 147.1 put_hevc_epel_h6_8_i8mm: 45.1 put_hevc_epel_h8_8_c: 237.4 put_hevc_epel_h8_8_i8mm: 72.1 put_hevc_epel_h12_8_c: 527.4 put_hevc_epel_h12_8_i8mm: 115.4 put_hevc_epel_h16_8_c: 943.6 put_hevc_epel_h16_8_i8mm: 153.9 put_hevc_epel_h24_8_c: 2105.4 put_hevc_epel_h24_8_i8mm: 384.4 put_hevc_epel_h32_8_c: 3631.4 put_hevc_epel_h32_8_i8mm: 519.9 put_hevc_epel_h48_8_c: 8082.1 put_hevc_epel_h48_8_i8mm: 1110.4 put_hevc_epel_h64_8_c: 14400.6 put_hevc_epel_h64_8_i8mm: 2057.1 put_hevc_qpel_h4_8_c: 124.9 put_hevc_qpel_h4_8_neon: 43.1 put_hevc_qpel_h4_8_i8mm: 33.1 put_hevc_qpel_h6_8_c: 269.4 put_hevc_qpel_h6_8_neon: 90.6 put_hevc_qpel_h6_8_i8mm: 61.4 put_hevc_qpel_h8_8_c: 477.6 put_hevc_qpel_h8_8_neon: 82.1 put_hevc_qpel_h8_8_i8mm: 99.9 put_hevc_qpel_h12_8_c: 1062.4 put_hevc_qpel_h12_8_neon: 226.9 put_hevc_qpel_h12_8_i8mm: 170.9 put_hevc_qpel_h16_8_c: 1880.6 put_hevc_qpel_h16_8_neon: 302.9 put_hevc_qpel_h16_8_i8mm: 251.4 put_hevc_qpel_h24_8_c: 4221.9 put_hevc_qpel_h24_8_neon: 893.9 put_hevc_qpel_h24_8_i8mm: 626.1 put_hevc_qpel_h32_8_c: 7437.6 put_hevc_qpel_h32_8_neon: 1189.9 put_hevc_qpel_h32_8_i8mm: 959.1 put_hevc_qpel_h48_8_c: 16838.4 put_hevc_qpel_h48_8_neon: 2727.9 put_hevc_qpel_h48_8_i8mm: 2163.9 put_hevc_qpel_h64_8_c: 29982.1 put_hevc_qpel_h64_8_neon: 4777.6 Signed-off-by: Martin Storsjö <martin@martin.st>	2023-07-14 21:19:12 +03:00
Logan Lyu	668eb4c00e	lavc/aarch64: new optimization for 8-bit hevc_epel_uni_w_v put_hevc_epel_uni_w_v4_8_c: 116.1 put_hevc_epel_uni_w_v4_8_neon: 48.6 put_hevc_epel_uni_w_v6_8_c: 248.9 put_hevc_epel_uni_w_v6_8_neon: 80.6 put_hevc_epel_uni_w_v8_8_c: 383.9 put_hevc_epel_uni_w_v8_8_neon: 91.9 put_hevc_epel_uni_w_v12_8_c: 806.1 put_hevc_epel_uni_w_v12_8_neon: 202.9 put_hevc_epel_uni_w_v16_8_c: 1411.1 put_hevc_epel_uni_w_v16_8_neon: 289.9 put_hevc_epel_uni_w_v24_8_c: 3168.9 put_hevc_epel_uni_w_v24_8_neon: 619.4 put_hevc_epel_uni_w_v32_8_c: 5632.9 put_hevc_epel_uni_w_v32_8_neon: 1161.1 put_hevc_epel_uni_w_v48_8_c: 12406.1 put_hevc_epel_uni_w_v48_8_neon: 2476.4 put_hevc_epel_uni_w_v64_8_c: 22001.4 put_hevc_epel_uni_w_v64_8_neon: 4343.9 Signed-off-by: Martin Storsjö <martin@martin.st>	2023-07-14 21:19:12 +03:00
Logan Lyu	0c604b1913	lavc/aarch64: new optimization for 8-bit hevc_epel_uni_w_h put_hevc_epel_uni_w_h4_8_c: 126.1 put_hevc_epel_uni_w_h4_8_i8mm: 41.6 put_hevc_epel_uni_w_h6_8_c: 222.9 put_hevc_epel_uni_w_h6_8_i8mm: 91.4 put_hevc_epel_uni_w_h8_8_c: 374.4 put_hevc_epel_uni_w_h8_8_i8mm: 102.1 put_hevc_epel_uni_w_h12_8_c: 806.1 put_hevc_epel_uni_w_h12_8_i8mm: 225.6 put_hevc_epel_uni_w_h16_8_c: 1414.4 put_hevc_epel_uni_w_h16_8_i8mm: 333.4 put_hevc_epel_uni_w_h24_8_c: 3128.6 put_hevc_epel_uni_w_h24_8_i8mm: 713.1 put_hevc_epel_uni_w_h32_8_c: 5519.1 put_hevc_epel_uni_w_h32_8_i8mm: 1118.1 put_hevc_epel_uni_w_h48_8_c: 12364.4 put_hevc_epel_uni_w_h48_8_i8mm: 2541.1 put_hevc_epel_uni_w_h64_8_c: 21925.9 put_hevc_epel_uni_w_h64_8_i8mm: 4383.6 Signed-off-by: Martin Storsjö <martin@martin.st>	2023-07-14 21:19:12 +03:00
Logan Lyu	e652e7dcda	lavc/aarch64: new optimization for 8-bit hevc_pel_uni_pixels put_hevc_pel_uni_pixels4_8_c: 35.9 put_hevc_pel_uni_pixels4_8_neon: 7.6 put_hevc_pel_uni_pixels6_8_c: 46.1 put_hevc_pel_uni_pixels6_8_neon: 20.6 put_hevc_pel_uni_pixels8_8_c: 53.4 put_hevc_pel_uni_pixels8_8_neon: 11.6 put_hevc_pel_uni_pixels12_8_c: 89.1 put_hevc_pel_uni_pixels12_8_neon: 25.9 put_hevc_pel_uni_pixels16_8_c: 106.4 put_hevc_pel_uni_pixels16_8_neon: 20.4 put_hevc_pel_uni_pixels24_8_c: 137.6 put_hevc_pel_uni_pixels24_8_neon: 47.1 put_hevc_pel_uni_pixels32_8_c: 173.6 put_hevc_pel_uni_pixels32_8_neon: 54.1 put_hevc_pel_uni_pixels48_8_c: 268.1 put_hevc_pel_uni_pixels48_8_neon: 117.1 put_hevc_pel_uni_pixels64_8_c: 346.1 put_hevc_pel_uni_pixels64_8_neon: 205.9 Signed-off-by: Martin Storsjö <martin@martin.st>	2023-07-14 21:19:12 +03:00
Logan Lyu	e79686be96	lavc/aarch64: new optimization for 8-bit hevc_qpel_h hevc_qpel_uni_w_hv Signed-off-by: Martin Storsjö <martin@martin.st>	2023-06-06 12:50:18 +03:00
Logan Lyu	15972cce8c	lavc/aarch64: new optimization for 8-bit hevc_qpel_uni_w_h Signed-off-by: Martin Storsjö <martin@martin.st>	2023-06-06 12:50:18 +03:00
Logan Lyu	0b7356c1b4	lavc/aarch64: new optimization for 8-bit hevc_pel_uni_w_pixels and qpel_uni_w_v Signed-off-by: Martin Storsjö <martin@martin.st>	2023-06-06 12:50:18 +03:00
xufuji456	bd2f00f665	codec/aarch64/hevc: add transform_luma_neon got 56% speed up (run_count=1000, CPU=Cortex A53) transform_4x4_luma_neon: 45 transform_4x4_luma_c: 103 Signed-off-by: xufuji456 <839789740@qq.com> Signed-off-by: Martin Storsjö <martin@martin.st>	2023-04-14 12:07:57 +03:00
xufuji456	00a062b8d5	codec/aarch64/hevc:add idct_32x32_neon got 73% speed up (run_count=1000, CPU=Cortex A53) idct_32x32_neon: 4826 idct_32x32_c: 18236 idct_32x32_neon: 4824 idct_32x32_c: 18149 idct_32x32_neon: 4937 idct_32x32_c: 18333 Signed-off-by: Martin Storsjö <martin@martin.st>	2023-04-12 15:58:09 +03:00
J. Dekker	b564ad8eac	lavc/aarch64: add hevc deblock chroma 8-12bit Benched on Ampere Altra: hevc_h_loop_filter_chroma8_c: 367.7 hevc_h_loop_filter_chroma8_neon: 31.0 hevc_h_loop_filter_chroma10_c: 396.7 hevc_h_loop_filter_chroma10_neon: 27.5 hevc_h_loop_filter_chroma12_c: 377.0 hevc_h_loop_filter_chroma12_neon: 31.7 hevc_v_loop_filter_chroma8_c: 369.0 hevc_v_loop_filter_chroma8_neon: 55.0 hevc_v_loop_filter_chroma10_c: 389.0 hevc_v_loop_filter_chroma10_neon: 54.0 hevc_v_loop_filter_chroma12_c: 389.5 hevc_v_loop_filter_chroma12_neon: 53.0 Signed-off-by: J. Dekker <jdek@itanimul.li>	2023-04-06 06:54:26 +02:00
J. Dekker	37cde570bc	lavc/aarch64: add clip N macro Signed-off-by: J. Dekker <jdek@itanimul.li>	2023-03-22 14:48:13 +01:00
xufuji456	4b4de07721	libavcodec/hevc: add hevc idct4x4 neon of aarch64 Signed-off-by: Martin Storsjö <martin@martin.st>	2023-02-28 13:12:52 +02:00
Martin Storsjö	ec7fa13eb0	aarch64: hevcdsp_idct: Reuse preexisting macros for transposes Signed-off-by: Martin Storsjö <martin@martin.st>	2023-02-28 11:48:54 +02:00
Lynne	e0661fc805	dca_core: convert to lavu/tx Thanks to Martin Storsjö <martin@martin.st> for fixing and testing the arm32 and aarch64 changes.	2022-11-06 14:39:36 +01:00
J. Dekker	9bed814e1d	lavc/aarch64: add hevc horizontal qpel/uni/bi checkasm --benchmark on Ampere Altra (Neoverse N1): put_hevc_qpel_bi_h4_8_c: 170.7 put_hevc_qpel_bi_h4_8_neon: 64.5 put_hevc_qpel_bi_h6_8_c: 373.7 put_hevc_qpel_bi_h6_8_neon: 130.2 put_hevc_qpel_bi_h8_8_c: 662.0 put_hevc_qpel_bi_h8_8_neon: 138.5 put_hevc_qpel_bi_h12_8_c: 1529.5 put_hevc_qpel_bi_h12_8_neon: 422.0 put_hevc_qpel_bi_h16_8_c: 2735.5 put_hevc_qpel_bi_h16_8_neon: 560.5 put_hevc_qpel_bi_h24_8_c: 6015.7 put_hevc_qpel_bi_h24_8_neon: 1636.0 put_hevc_qpel_bi_h32_8_c: 10779.0 put_hevc_qpel_bi_h32_8_neon: 2204.5 put_hevc_qpel_bi_h48_8_c: 24375.0 put_hevc_qpel_bi_h48_8_neon: 4984.0 put_hevc_qpel_bi_h64_8_c: 42768.0 put_hevc_qpel_bi_h64_8_neon: 8795.7 put_hevc_qpel_h4_8_c: 149.0 put_hevc_qpel_h4_8_neon: 55.7 put_hevc_qpel_h6_8_c: 321.2 put_hevc_qpel_h6_8_neon: 106.0 put_hevc_qpel_h8_8_c: 578.7 put_hevc_qpel_h8_8_neon: 133.2 put_hevc_qpel_h12_8_c: 1279.0 put_hevc_qpel_h12_8_neon: 391.7 put_hevc_qpel_h16_8_c: 2286.2 put_hevc_qpel_h16_8_neon: 519.7 put_hevc_qpel_h24_8_c: 5100.7 put_hevc_qpel_h24_8_neon: 1546.2 put_hevc_qpel_h32_8_c: 9022.0 put_hevc_qpel_h32_8_neon: 2060.2 put_hevc_qpel_h48_8_c: 20293.5 put_hevc_qpel_h48_8_neon: 4656.7 put_hevc_qpel_h64_8_c: 36037.0 put_hevc_qpel_h64_8_neon: 8262.7 put_hevc_qpel_uni_h4_8_c: 162.2 put_hevc_qpel_uni_h4_8_neon: 61.7 put_hevc_qpel_uni_h6_8_c: 355.2 put_hevc_qpel_uni_h6_8_neon: 114.2 put_hevc_qpel_uni_h8_8_c: 651.0 put_hevc_qpel_uni_h8_8_neon: 135.7 put_hevc_qpel_uni_h12_8_c: 1412.5 put_hevc_qpel_uni_h12_8_neon: 402.7 put_hevc_qpel_uni_h16_8_c: 2551.0 put_hevc_qpel_uni_h16_8_neon: 533.5 put_hevc_qpel_uni_h24_8_c: 5782.2 put_hevc_qpel_uni_h24_8_neon: 1578.7 put_hevc_qpel_uni_h32_8_c: 10586.5 put_hevc_qpel_uni_h32_8_neon: 2102.2 put_hevc_qpel_uni_h48_8_c: 23812.0 put_hevc_qpel_uni_h48_8_neon: 4739.5 put_hevc_qpel_uni_h64_8_c: 42958.7 put_hevc_qpel_uni_h64_8_neon: 8366.5 Signed-off-by: J. Dekker <jdek@itanimul.li>	2022-10-25 14:56:38 +02:00
Reimar Döffinger	38cd829dce	aarch64: Implement stack spilling in a consistent way. Currently it is done in several different ways, which might cause needless dependencies or in case of tx_float_neon.S is incorrect. Reviewed-by: Martin Storsjö <martin@martin.st> Signed-off-by: Reimar Döffinger <Reimar.Doeffinger@gmx.de>	2022-10-11 09:12:02 +02:00
Grzegorz Bernacki	8f4b000c37	lavc/aarch64: Add neon implementation for vsse_intra8 Provide optimized implementation for vsse_intra8 for arm64. Performance tests are shown below. - vsse_5_c: 87.7 - vsse_5_neon: 26.2 Benchmarks and tests are run with checkasm tool on AWS Graviton 3. Co-authored-by: Martin Storsjö <martin@martin.st> Signed-off-by: Martin Storsjö <martin@martin.st>	2022-10-04 13:24:20 +03:00
Grzegorz Bernacki	bad67cb9fd	lavc/aarch64: Provide optimized implementation of vsse8 for arm64. Provide optimized implementation of vsse8 for arm64. Performance comparison tests are shown below. - vsse_1_c: 141.5 - vsse_1_neon: 32.5 Benchmarks and tests are run with checkasm tool on AWS Graviton 3. Signed-off-by: Grzegorz Bernacki <gjb@semihalf.com> Signed-off-by: Martin Storsjö <martin@martin.st>	2022-10-04 13:24:20 +03:00
Grzegorz Bernacki	faea56c9c7	lavc/aarch64: Provide neon implementation of nsse8 Add vectorized implementation of nsse8 function. Performance comparison tests are shown below. - nsse_1_c: 256.0 - nsse_1_neon: 82.7 Benchmarks and tests run with checkasm tool on AWS Graviton 3. Signed-off-by: Grzegorz Bernacki <gjb@semihalf.com> Signed-off-by: Martin Storsjö <martin@martin.st>	2022-10-04 13:24:20 +03:00
Grzegorz Bernacki	f401a2af21	lavc/aarch64: Add neon implementation for pix_abs8 functions. Provide optimized implementation of pix_abs8 function for arm64. Performance comparison tests are shown below: pix_abs_1_1_c: 162.5 pix_abs_1_1_neon: 27.0 pix_abs_1_2_c: 174.0 pix_abs_1_2_neon: 23.5 pix_abs_1_3_c: 203.2 pix_abs_1_3_neon: 34.7 Benchmarks and tests are run with checkasm tool on AWS Graviton 3. Co-authored-by: Martin Storsjö <martin@martin.st> Signed-off-by: Grzegorz Bernacki <gjb@semihalf.com> Signed-off-by: Martin Storsjö <martin@martin.st>	2022-10-04 13:24:04 +03:00
Martin Storsjö	8089fe072e	aarch64: me_cmp: Avoid using the non-unrolled codepath for the minimum unroll size Signed-off-by: Martin Storsjö <martin@martin.st>	2022-09-29 10:29:11 +03:00
Martin Storsjö	6f2ad7f951	aarch64: me_cmp: Avoid redundant loads in ff_pix_abs16_y2_neon This avoids one redundant load per row; pix3 from the previous iteration can be used as pix2 in the next one. Before: Cortex A53 A72 A73 pix_abs_0_2_neon: 138.0 59.7 48.0 After: pix_abs_0_2_neon: 109.7 50.2 39.5 Signed-off-by: Martin Storsjö <martin@martin.st>	2022-09-29 10:29:10 +03:00
Andreas Rheinhardt	9beba05311	avcodec/fmtconvert: Remove unused AVCodecContext parameter Unused since `d74a8cb7e4`. Reviewed-by: Rémi Denis-Courmont <remi@remlab.net> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-09-21 20:26:40 +02:00
Hubert Mazur	b2732115dd	lavc/aarch64: Add neon implementation for pix_median_abs8 Provide optimized implementation for pix_median_abs8 function. Performance comparison tests are shown below. - median_sad_1_c: 277.0 - median_sad_1_neon: 82.0 Benchmarks and tests run with checkasm tool on AWS Graviton 3. Signed-off-by: Hubert Mazur <hum@semihalf.com> Signed-off-by: Martin Storsjö <martin@martin.st>	2022-09-21 12:57:56 +03:00
Hubert Mazur	e9a6170213	lavc/aarch64: Add neon implementation for vsad8_intra Provide optimized implementation for vsad8_intra function. Performance comparison tests are shown below. - vsad_5_c: 94.7 - vsad_5_neon: 20.7 Benchmarks and tests run with checkasm tool on AWS Graviton 3. Signed-off-by: Hubert Mazur <hum@semihalf.com> Signed-off-by: Martin Storsjö <martin@martin.st>	2022-09-21 12:57:56 +03:00
Hubert Mazur	0ee535b1db	lavc/aarch64: Add neon implementation for pix_median_abs16 Provide optimized implementation for pix_median_abs16 function. Performance comparison tests are shown below. - median_sad_0_c: 720.5 - median_sad_0_neon: 127.2 Benchmarks and tests run with checkasm tool on AWS Graviton 3. Signed-off-by: Hubert Mazur <hum@semihalf.com> Signed-off-by: Martin Storsjö <martin@martin.st>	2022-09-21 12:57:56 +03:00
Rémi Denis-Courmont	b52034270a	lavc/vorbisdsp: use ptrdiff_t rather than intptr_t ... for a difference between pointers.	2022-09-19 13:51:00 -03:00
Andreas Rheinhardt	a54e53a1c4	avcodec/vp8dsp: Constify src in vp8_mc_func Reviewed-by: Peter Ross <pross@xvid.org> Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-09-11 20:57:51 +02:00
Hubert Mazur	06b98e396a	lavc/aarch64: Provide neon implementation of nsse16 Add vectorized implementation of nsse16 function. Performance comparison tests are shown below. - nsse_0_c: 682.2 - nsse_0_neon: 116.5 Benchmarks and tests run with checkasm tool on AWS Graviton 3. Co-authored-by: Martin Storsjö <martin@martin.st> Signed-off-by: Hubert Mazur <hum@semihalf.com> Signed-off-by: Martin Storsjö <martin@martin.st>	2022-09-09 10:19:46 +03:00
Hubert Mazur	908abe8032	lavc/aarch64: Add neon implementation for vsse_intra16 Provide optimized implementation for vsse_intra16 for arm64. Performance tests are shown below. - vsse_4_c: 155.2 - vsse_4_neon: 36.2 Benchmarks and tests are run with checkasm tool on AWS Graviton 3. Signed-off-by: Hubert Mazur <hum@semihalf.com> Signed-off-by: Martin Storsjö <martin@martin.st>	2022-09-09 10:19:46 +03:00
Hubert Mazur	ce03ea3e79	lavc/aarch64: Add neon implementation for vsad_intra16 Provide optimized implementation for vsad_intra16 function for arm64. Performance comparison tests are shown below. - vsad_4_c: 177.5 - vsad_4_neon: 23.5 Benchmarks and tests are run with checkasm tool on AWS Gravtion 3. Signed-off-by: Hubert Mazur <hum@semihalf.com> Signed-off-by: Martin Storsjö <martin@martin.st>	2022-09-09 10:19:46 +03:00
Hubert Mazur	c495a4b32d	lavc/aarch64: Add neon implementation of vsse16 Provide optimized implementation of vsse16 for arm64. Performance comparison tests are shown below. - vsse_0_c: 257.7 - vsse_0_neon: 59.2 Benchmarks and tests are run with checkasm tool on AWS Graviton 3. Signed-off-by: Hubert Mazur <hum@semihalf.com> Signed-off-by: Martin Storsjö <martin@martin.st>	2022-09-09 10:19:46 +03:00
Hubert Mazur	200f5e578f	lavc/aarch64: Add neon implementation for vsad16 Provide optimized implementation of vsad16 function for arm64. Performance comparison tests are shown below. - vsad_0_c: 285.2 - vsad_0_neon: 39.5 Benchmarks and tests are run with checkasm tool on AWS Graviton 3. Co-authored-by: Martin Storsjö <martin@martin.st> Signed-off-by: Hubert Mazur <hum@semihalf.com> Signed-off-by: Martin Storsjö <martin@martin.st>	2022-09-09 10:19:46 +03:00
Lynne	f99d15cca0	arm/fft: disable NEON optimizations for 131072pt transforms This has been broken since the start, and it was only discovered when I started testing my replacement for the FFT. Disable it, since there's no point in fixing slower code that's about to be removed anyway. The vfp version is not affected.	2022-08-29 07:13:43 +02:00
J. Dekker	ce2f47318b	lavc/aarch64: hevc_add_res add 12bit variants hevc_add_res_4x4_12_c: 46.0 hevc_add_res_4x4_12_neon: 18.7 hevc_add_res_8x8_12_c: 194.7 hevc_add_res_8x8_12_neon: 25.2 hevc_add_res_16x16_12_c: 716.0 hevc_add_res_16x16_12_neon: 69.7 hevc_add_res_32x32_12_c: 3820.7 hevc_add_res_32x32_12_neon: 261.0 Signed-off-by: J. Dekker <jdek@itanimul.li>	2022-08-18 15:04:43 +02:00
Martin Storsjö	48be6616d0	aarch64: me_cmp: Remove a leftover unnecessary instruction This was missed in `a2e45ad407`. Signed-off-by: Martin Storsjö <martin@martin.st>	2022-08-18 12:14:53 +03:00
Hubert Mazur	70efa4d011	lavc/aarch64: Add neon implementation for pix_abs8 Provide optimized implementation of pix_abs8 function for arm64. Performance comparison tests are shown below. - pix_abs_1_0_c: 101.2 - pix_abs_1_0_neon: 22.5 - sad_1_c: 101.2 - sad_1_neon: 22.5 Benchmarks and tests are run with checkasm tool on AWS Graviton 3. Signed-off-by: Martin Storsjö <martin@martin.st>	2022-08-18 12:07:26 +03:00
Hubert Mazur	74312e80d7	lavc/aarch64: Add neon implementation for sse8 Provide optimized implementation of sse8 function for arm64. Performance comparison tests are shown below. - sse_1_c: 130.7 - sse_1_neon: 29.7 Benchmarks and tests run with checkasm tool on AWS Graviton 3. Signed-off-by: Hubert Mazur <hum@semihalf.com> Signed-off-by: Martin Storsjö <martin@martin.st>	2022-08-18 12:07:26 +03:00
Hubert Mazur	a2e45ad407	lavc/aarch64: Add neon implementation for pix_abs16_y2 Provide optimized implementation of pix_abs16_y2 function for arm64. Performance comparison tests are shown below. pix_abs_0_2_c: 317.2 pix_abs_0_2_neon: 37.5 Benchmarks and tests run with checkasm tool on AWS Graviton 3. Signed-off-by: Hubert Mazur <hum@semihalf.com> Signed-off-by: Martin Storsjö <martin@martin.st>	2022-08-18 12:07:26 +03:00
Hubert Mazur	d7abb7d143	lavc/aarch64: Add neon implementation for sse4 Provide neon implementation for sse4 function. Performance comparison tests are shown below. - sse_2_c: 80.7 - sse_2_neon: 31.0 Benchmarks and tests are run with checkasm tool on AWS Graviton 3. Signed-off-by: Hubert Mazur <hum@semihalf.com> Signed-off-by: Martin Storsjö <martin@martin.st>	2022-08-18 12:07:26 +03:00
Hubert Mazur	ad251fd262	lavc/aarch64: Add neon implementation for sse16 Provide neon implementation for sse16 function. Performance comparison tests are shown below. - sse_0_c: 268.2 - sse_0_neon: 43.5 Benchmarks and tests run with checkasm tool on AWS Graviton 3. Signed-off-by: Hubert Mazur <hum@semihalf.com> Signed-off-by: Martin Storsjö <martin@martin.st>	2022-08-18 12:07:26 +03:00
Martin Storsjö	60109d5b3d	aarch64: me_cmp: Fix the indentation of function declarations Signed-off-by: Martin Storsjö <martin@martin.st>	2022-08-18 12:07:26 +03:00
J. Dekker	aa9eabb7a5	lavc/aarch64: reformat add_res funcs Signed-off-by: J. Dekker <jdek@itanimul.li>	2022-08-16 14:00:34 +02:00
Andreas Rheinhardt	333b32af8e	avcodec/h264chroma: Constify src in h264_chroma_mc_func Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-08-05 03:02:13 +02:00
Andreas Rheinhardt	b3bbbb14d0	avcodec/hevcdsp: Constify src pointers Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-08-05 02:54:04 +02:00
Andreas Rheinhardt	abb85429f3	avcodec/me_cmp: Constify me_cmp_func buffer parameters Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-07-31 03:31:53 +02:00
Andreas Rheinhardt	af43da3e4d	avcodec/videodsp: Constify buf in VideoDSPContext.prefetch Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-07-31 03:14:34 +02:00

1 2 3 4 5 ...

343 Commits