FFmpeg/libavcodec/aarch64
Martin Storsjö cdb1665f70 aarch64: Make transpose_4x4H do a regular transpose
Previously, ff_h264_idct_add_neon (originally in the arm version) used
a non-regular transpose in order to be able to use more instructions
that deal with registers as 128 bit register pairs. The aarch64
translation doesn't do it to the same extent, but brought along the
same structure since it was a straight translation.

This reshuffles ff_h264_idct_add_neon, bringing it closer to
the C implementation, making the transpose_4x4H macro do a regular
transpose, usable for other algorithms as well.

Previously, the third and fourth output from transpose_4x4H were
swapped, and prior to cc29d96d5a, the same inputs as well. In
addition to just swapping the outputs, also renumber the intermediate
registers for better readability (making the register order match
transpose_4x8B).

This runs with the same number of cycles as before.

Signed-off-by: Martin Storsjö <martin@martin.st>
2016-03-26 21:25:56 +02:00
..
asm-offsets.h arm64: port synth_filter_float_neon from arm 2015-12-14 16:45:01 +01:00
cabac.h
dcadsp_init.c dca: remove unused decode_hf function and quant_d tables 2015-12-24 13:58:18 +01:00
dcadsp_neon.S dca: remove unused decode_hf function and quant_d tables 2015-12-24 13:58:18 +01:00
fft_init_aarch64.c fft: Split MDCT bits off from FFT 2016-03-01 10:18:28 +01:00
fft_neon.S aarch64: Use .data.rel.ro for const data with relocations 2014-12-09 11:43:31 +02:00
fmtconvert_init.c arm64: int32_to_float_fmul neon asm 2015-12-14 16:45:02 +01:00
fmtconvert_neon.S arm64: int32_to_float_fmul neon asm 2015-12-14 16:45:02 +01:00
h264chroma_init_aarch64.c
h264cmc_neon.S h264: avoid using uninitialized memory in NEON chroma mc 2014-06-23 16:32:15 +02:00
h264dsp_init_aarch64.c
h264dsp_neon.S
h264idct_neon.S aarch64: Make transpose_4x4H do a regular transpose 2016-03-26 21:25:56 +02:00
h264pred_init.c h264: aarch64: intra prediction optimisations 2015-07-20 23:10:29 +02:00
h264pred_neon.S h264: aarch64: intra prediction optimisations 2015-07-20 23:10:29 +02:00
h264qpel_init_aarch64.c arm64: constify src in h264qpel dsp function definitions 2015-06-24 08:41:32 +02:00
h264qpel_neon.S
hpeldsp_init_aarch64.c
hpeldsp_neon.S
imdct15_init.c opus: Factor out imdct15 into a standalone component 2015-02-02 16:07:33 +01:00
imdct15_neon.S opus: Factor out imdct15 into a standalone component 2015-02-02 16:07:33 +01:00
Makefile fft: Split MDCT bits off from FFT 2016-03-01 10:18:28 +01:00
mdct_init.c fft: Split MDCT bits off from FFT 2016-03-01 10:18:28 +01:00
mdct_neon.S
mpegaudiodsp_init.c aarch64: NEON fixed/floating point MPADSP apply_window 2014-04-22 22:01:45 +02:00
mpegaudiodsp_neon.S aarch64: add ',' between assembler macro arguments where missing 2014-08-04 00:17:21 +02:00
neon.S aarch64: Make transpose_4x4H do a regular transpose 2016-03-26 21:25:56 +02:00
neontest.c
rv40dsp_init_aarch64.c
synth_filter_neon.S arm64: port synth_filter_float_neon from arm 2015-12-14 16:45:01 +01:00
vc1dsp_init_aarch64.c
videodsp_init.c
videodsp.S
vorbisdsp_init.c aarch64: NEON vorbis_inverse_coupling 2014-04-22 22:01:45 +02:00
vorbisdsp_neon.S aarch64: NEON vorbis_inverse_coupling 2014-04-22 22:01:45 +02:00