Commit Graph

511 Commits

Author SHA1 Message Date
Ronald S. Bultje
d56668bd80 floatdsp: move scalarproduct_float from dsputil to avfloatdsp.
This makes the aac decoder and all voice codecs independent of dsputil.
2013-01-22 11:55:42 -08:00
Ronald S. Bultje
5959bfaca3 floatdsp: move butterflies_float from dsputil to avfloatdsp.
This makes wmadec/enc, twinvq and mpegaudiodec (i.e. mp2/mp3)
independent of dsputil.
2013-01-22 11:55:42 -08:00
Ronald S. Bultje
42d3246948 floatdsp: move vector_fmul_reverse from dsputil to avfloatdsp.
Now, nellymoserenc and aacenc no longer depends on dsputil. Independent
of this patch, wmaprodec also does not depend on dsputil, so I removed
it from there also.
2013-01-22 11:55:42 -08:00
Ronald S. Bultje
55aa03b9f8 floatdsp: move vector_fmul_add from dsputil to avfloatdsp. 2013-01-22 11:55:42 -08:00
Ronald S. Bultje
1768e43ceb vorbisdsp: change block_size type from int to intptr_t.
This saves one instruction in the x86-64 assembly.
2013-01-20 22:26:42 -08:00
Janne Grunau
68f18f0351 videodsp_armv5te: remove #if HAVE_ARMV5TE_EXTERNAL
libavutil/arm/asm.S sets '.arch' depending on HAVE_ARMV5TE so that
assembling armv5te code will always succeed even if the default -march
flag does not support it. HAVE_ARMV5TE_EXTERNAL tests assembling code
with the default arch.
Fixes the missing symbol ff_prefetch_arm with --cpu= not including
armv5te.

CC: libav-stable@libav.org
2013-01-20 15:20:00 +01:00
Ronald S. Bultje
fef906c77c Move vorbis_inverse_coupling from dsputil to vorbisdspcontext.
Conveniently (together with Justin's earlier patches), this makes
our vorbis decoder entirely independent of dsputil.
2013-01-19 22:21:10 -08:00
Ronald S. Bultje
aeaf268e52 vp3: integrate clear_blocks with idct of previous block.
This is identical to what e.g. vp8 does, and prevents the function call
overhead (plus dependency on dsputil for this particular function).

Arm asm updated by Janne Grunau <janne-libav@jannau.net>.

Signed-off-by: Janne Grunau <janne-libav@jannau.net>
2013-01-19 22:04:55 -08:00
Justin Ruggles
e034cc6c60 lavc: Move vector_fmul_window to AVFloatDSPContext
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2013-01-16 10:45:45 +01:00
Luca Barbato
6906b19346 lavc: add missing files for arm
Across the many retouches those did not make the main commit.
2012-12-20 14:07:23 +01:00
Ronald S. Bultje
8c53d39e7f lavc: introduce VideoDSPContext
Move some functions from dsputil. The idea is that videodsp contains
functions that are useful for a large and varied set of video decoders.
Currently, it contains emulated_edge_mc() and prefetch().

Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2012-12-20 13:40:45 +01:00
Diego Biurrun
523c7bd23c misc typo, style and wording fixes 2012-12-18 13:36:51 +01:00
Mans Rullgard
b326755989 arm: rename ARMVFP config symbol to VFP
This is consistent with usual ARM nomenclature as well as with the
VFPV3 and NEON symbols which both lack the ARM prefix.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-12-07 16:54:04 +00:00
Mans Rullgard
a7831d509f arm: use HAVE*_INLINE/EXTERNAL macros for conditional compilation
These macros reflect the actual capabilities required here.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-12-07 16:54:03 +00:00
Mans Rullgard
92dad6687f arm: fix use of uninitialised value in ff_fft_fixed_init_arm()
When initialising an FFTContext for a plain FFT, mdct_bits is not set
and can contain a garbage value.  Since nbits is always valid and for
MDCT operation is mdct_bits - 2 checking this instead avoids using an
uninitialised value while having the same effect.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-12-07 13:11:57 +00:00
Justin Ruggles
284ea790d8 dsputil: move vector_fmul_scalar() to AVFloatDSPContext in libavutil 2012-11-26 11:29:06 -05:00
Ronald S. Bultje
95c89da36e Use ptrdiff_t instead of int for intra pred "stride" function parameter.
This way, SIMD-optimized functions don't have to sign-extend their
stride argument manually to be able to do pointer arithmetic.
2012-10-29 17:49:13 -07:00
Mans Rullgard
1846ddf0a7 ARM: fix overreads in neon h264 chroma mc
The loops were reading ahead one line, which could end up outside the
buffer for reference blocks at the edge of the picture.  Removing
this readahead has no measurable performance impact.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-10-20 01:28:38 +01:00
Diego Biurrun
9734b8ba56 Move avutil tables only used in libavcodec to libavcodec. 2012-10-11 18:29:36 +02:00
Jean-Baptiste Kempf
507dce2536 arm: call arm-specific rv34dsp init functions under if (ARCH_ARM)
Assign NEON specific function pointers after runtime check via
av_get_cpu_flags().

Signed-off-by: Janne Grunau <janne-libav@jannau.net>
2012-10-10 15:28:50 +02:00
Diego Biurrun
ac56ff9cc9 build: non-x86: Only compile mpegvideo optimizations when necessary 2012-10-09 14:45:59 +02:00
Mans Rullgard
5e826fd65e ARM: set Tag_ABI_align_preserved in all asm files
All our ARM asm preserves alignment so setting this attribute
in a common location is simpler.  This removes numerous warnings
when linking with armcc.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-10-02 19:47:56 +01:00
Mans Rullgard
a27a690fac ARM: swap source operands in some add instructions
This allows using a 16-bit opcode when generating Thumb2 code.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-09-20 17:07:18 +01:00
Mans Rullgard
7689eea49a flacdsp: arm optimised lpc filter 2012-09-15 23:54:21 +01:00
Anton Khirnov
36ef5369ee Replace all CODEC_ID_* with AV_CODEC_ID_* 2012-08-07 16:00:24 +02:00
Mans Rullgard
e6cd698955 ARMv6: vp8: fix stack allocation with Apple's assembler
In the GNU assembler, a relational expression, bizarrely, has the
value -1 if true, whereas in Apple's it is +1.  This patch makes
sure the correct expression is used in both cases.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-08-04 00:59:14 +01:00
Mans Rullgard
9829a81bcd ARM: vp56: allow inline asm to build with clang
The clang integrated assembler does not support pre-UAL syntax,
while gcc requires pre-UAL syntax for ARM code.  A patch[1] for
clang to support the old syntax as well has been ignored since
January.

This patch chooses the syntax appropriate for each compiler,
allowing both to build the code.  Notably, this change allows
building for iphone with the latest Apple Xcode update.

[1] http://llvm.org/bugs/show_bug.cgi?id=11855

Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-08-04 00:59:14 +01:00
Mans Rullgard
faa788227f ARM: use =const syntax instead of explicit literal pools
Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-08-01 10:32:24 +01:00
Mans Rullgard
998170913c ARM: use standard syntax for all LDRD/STRD instructions
The standard syntax requires two destination registers for
LDRD/STRD instructions.  Some versions of the GNU assembler
allow using only one with the second implicit, others are
more strict.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-08-01 10:32:24 +01:00
Mans Rullgard
28f9ab7029 vp3: move idct and loop filter pointers to new vp3dsp context
This moves all VP3-specific function pointers from dsputil to a
new vp3dsp context.  There is no reason to ever use the VP3 IDCT
where an MPEG2 IDCT is expected or vice versa.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-07-18 10:32:19 +01:00
Mans Rullgard
ab9f987661 build: add CONFIG_VP3DSP, reduce repetition in OBJS lists
Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-07-18 10:32:18 +01:00
Mans Rullgard
62634158b7 ARM: generate position independent code to access data symbols
This creates proper position independent code when accessing
data symbols if CONFIG_PIC is set.

References to external symbols should now use the movrelx macro.
Some additional code changes are required since this macro may
need a register to hold the GOT pointer.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-07-01 11:25:06 +01:00
Justin Ruggles
cb5042d02c float_dsp: Move vector_fmac_scalar() from libavcodec to libavutil 2012-06-18 18:01:14 -04:00
Justin Ruggles
d5a7229ba4 Add a float DSP framework to libavutil
Move vector_fmul() from DSPContext to AVFloatDSPContext.
2012-06-08 13:14:38 -04:00
Justin Ruggles
94d2b0d2fd ARM: Move asm.S from libavcodec to libavutil
This will allow for easier implementation of ARM-optimized functions in
libraries other than libavcodec.
2012-06-08 13:14:38 -04:00
Mans Rullgard
e54e6f25cf arm/neon: dsputil: use correct size specifiers on vld1/vst1
Change the size specifiers to match the actual element sizes
of the data.  This makes no practical difference with strict
alignment checking disabled (the default) other than somewhat
documenting the code.  With strict alignment checking on, it
avoids trapping the unaligned loads.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-05-10 22:56:37 +01:00
Mans Rullgard
2eba6898c9 arm: dsputil: prettify some conditional instructions in put_pixels macros
Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-05-10 22:56:09 +01:00
Mans Rullgard
cbc7d60afa arm: dsputil: fix overreads in put/avg_pixels functions
The vertically interpolating variants of these functions read
ahead one line to optimise the loop.  On the last line processed,
this might be outside the buffer.  Fix these invalid reads by
processing the last line outside the loop.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-05-10 14:39:34 +01:00
Mans Rullgard
96f7590efd aacps: NEON optimisations
Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-05-05 22:04:21 +01:00
Mans Rullgard
3d11c2d76d vp8: armv6: fix non-armv6t2 build
The assembler may fail to place literal pools close enough to
instructions referencing them.  An explicit .ltorg directive
fixes this.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-04-25 23:16:31 +01:00
Mans Rullgard
e4ac031233 vp8: armv6 optimisations
Based on patch by Ronald S. Bultje <rsbultje@gmail.com>,
partially ported from libvpx.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-04-25 21:41:39 +01:00
Mans Rullgard
b692d246ea vp8: arm: separate ARMv6 functions from NEON
This is a preparation for complete ARMv6 optimisations.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-04-25 21:41:39 +01:00
Mans Rullgard
dac78fd1d7 ARM: add some compatibility macros
This adds some macros simplifying Thumb and pre-v6T2 compatibility.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-04-25 21:41:39 +01:00
Mans Rullgard
d526c5338d ARM: allow runtime masking of CPU features
This allows masking CPU features with the -cpuflags avconv option
which is useful for testing different optimisations without rebuilding.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-04-22 12:30:45 +01:00
Mans Rullgard
2bcbd98459 Remove lowres video decoding
This feature is complex, of questionable utility, and slows down
normal decoding.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-04-21 18:56:19 +01:00
Diego Biurrun
7bb3a302fe build: Consistently handle conditional compilation for all optimization OBJS. 2012-04-12 09:00:49 +02:00
Christophe GISQUET
272b252c01 rv40dsp: implement prescaled versions for biweight.
Quite often, the original weights are multiple of 512. By prescaling them
by 1/512 when they are computed (once per frame), no intermediate shifting
is needed, and no prescaling on each call either.

The x86 code already used that trick.

Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2012-04-10 10:06:48 -07:00
Diego Biurrun
3dde147ff9 cosmetics: Consistently place static, inline and av_cold attributes/keywords. 2012-04-04 14:54:13 +02:00
Janne Grunau
363bd1c62c remove iwmmxt optimizations
The were broken since August of 2010 without anyone noticing until
three weeks ago. Nobody cares about it anymore and hopefully Marvell
will support NEON like in the PXA978 from now on.
2012-03-12 22:46:56 +01:00
Christophe GISQUET
7e1ce6a6ac dsputil: remove shift parameter from scalarproduct_int16
There is only one caller, which does not need the shifting. Other use cases
are situations where different roundings would be needed.

The x86 and neon versions are modified accordingly.

Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2012-03-07 10:29:52 -08:00
Ronald S. Bultje
bd66f073fe vp8: change int stride to ptrdiff_t stride.
On 64bit platforms with 32bit int, this means we won't have to sign-
extend the integer anymore.
2012-03-02 10:31:50 -08:00
Christophe GISQUET
2e74a5abc2 SBR DSP: use intptr_t for the ixh parameter.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2012-02-23 15:48:40 -08:00
Ronald S. Bultje
3ab9a2a557 rv34: change most "int stride" into "ptrdiff_t stride".
This prevents having to sign-extend on 64-bit systems with 32-bit ints,
such as x86-64. Also fixes crashes on systems where we don't do it and
arguments are not in registers, such as Win64 for all weight functions.
2012-02-20 14:58:25 -08:00
Martin Storsjö
efd29844eb mpegvideo: Add ff_ prefix to nonstatic functions
Signed-off-by: Martin Storsjö <martin@martin.st>
2012-02-15 22:07:23 +02:00
Martin Storsjö
9cf0841ef3 dsputil: Add ff_ prefix to the dsputil*_init* functions
Signed-off-by: Martin Storsjö <martin@martin.st>
2012-02-15 22:06:34 +02:00
Diego Biurrun
aa06d65693 arm: Add missing #include to vp8.h to fix a make checkheaders warning. 2012-02-09 12:26:47 +01:00
Diego Biurrun
32f3c541bc doxygen: Do not include license boilerplates in Doxygen comment blocks. 2012-02-06 19:39:24 +01:00
Mans Rullgard
cd2f98f365 ARM: ac3: fix ac3_bit_alloc_calc_bap_armv6
This function was broken when the start bin was not at the start
of a band.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-02-02 18:50:42 +00:00
Mans Rullgard
be822d77b6 aacsbr: ARM NEON optimised sbrdsp functions
Overall speedup of HE-AAC decoding 2.3x on Cortex-A8, 1.2x on A9.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-01-28 14:56:18 +00:00
Felipe Contreras
c3d5e290ca ARM: fix build with FFT enabled and MDCT disabled
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-01-20 16:14:01 +00:00
Janne Grunau
9e12002f11 rv34: add NEON rv34_idct_add
Overall almost 4% faster, idct_add down from 350 to 85 cycles, idct_dc_add
down from 83 to 30 cycles.

squash: rv34 idct rearrange partial register loads
2012-01-16 19:26:41 +01:00
Christophe GISQUET
9ba9c34024 rv34: 1-pass inter MB reconstruction
Implement 1-pass inverse transform and reconstruction for inter blocks.
2012-01-16 19:26:41 +01:00
Mans Rullgard
71b3a63e9c ARM: fix Thumb-mode simple_idct_arm
The alignment directive must obviously precede the label.
This was never noticed in ARM mode since the location is
already aligned there.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-01-13 19:09:59 +00:00
Mans Rullgard
5c5e1ea3cd ARM: 4-byte align start of all asm functions
Due to apprent bugs in the GNU assembler and/or linker, relocations
can be incorrectly processed if the alignment of a Thumb instruction
is changed in the output file compared to the input object.

This fixes crashes in h264 decoding with Thumb enabled. No effect in
ARM mode since everything is 4-byte aligned there.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-01-13 19:09:59 +00:00
Mans Rullgard
81dc6a2a3c ARM: rv34: fix asm syntax in dc transform functions
Signed-off-by: Mans Rullgard <mans@mansr.com>
Signed-off-by: Janne Grunau <janne-libav@jannau.net>
2012-01-12 22:11:13 +01:00
Janne Grunau
e1e369049e rv34: NEON optimised dc only inverse transform
30-50% faster than the C implementation, 0.5% overall speedup on
bourne.rmvb.
2012-01-12 18:33:55 +01:00
Christophe GISQUET
98f24ecd6c rv34: joint coefficient decoding and dequantization
Perform dequantization while decoding coefficients instead of performing it
on the entire coefficients buffer.

Since quantized coefficients are very sparse, this usually causes a small
speedup. Speedup of around 1% on Panda board compared to the removed here
neon code. Global speedup is probably around 3%.

Signed-off-by: Kostya Shishkov <kostya.shishkov@gmail.com>
2012-01-04 10:30:01 +01:00
Mans Rullgard
11b1db2759 rv40: NEON optimised weak loop filter
Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-12-16 14:36:01 +00:00
Mans Rullgard
b536c7a3e1 ARM: fix external symbol refs in rv40 asm
External symbol references need prefixes on some systems.
This should fix build errors on Darwin.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-12-15 11:02:59 +00:00
Mans Rullgard
f7de52354f ARM: dca: disable optimised decode_blockcodes() for old gcc
Old gcc versions have trouble compiling this function, and
no simple, targeted test is possible.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-12-15 01:02:58 +00:00
Mans Rullgard
71ce76027d rv40: NEON optimised loop filter strength selection
Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-12-14 11:26:30 +00:00
Mans Rullgard
4722a03c75 rv34: NEON optimised 4x4 dequant
Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-12-13 12:06:21 +00:00
Mans Rullgard
392107ad07 rv40: NEON optimised rv40 qpel motion compensation
Based on patch by Janne Grunau.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-12-07 22:38:14 +00:00
Janne Grunau
6c88988866 rv40: NEON optimised weighted prediction
Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-12-06 13:48:25 +00:00
Janne Grunau
f5c05b9aa5 rv40: NEON optimised chroma MC
Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-12-06 13:48:25 +00:00
Mans Rullgard
f054a82727 ARM: move NEON H264 chroma mc to a separate file
This allows sharing code with the rv40 version of these functions.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-12-06 13:48:24 +00:00
Janne Grunau
42d32cf53c rv34: NEON optimised inverse transform functions
Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-12-06 13:48:24 +00:00
Mans Rullgard
59807fee6d ARM: h264dsp_neon cosmetics
- Replace 'ip' with 'r12'.
- Use correct size designators for vld1/vst1.
- Whitespace fixes.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-12-02 19:59:18 +00:00
Janne Grunau
a760f530bb ARM: make some NEON macros reusable
Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-12-02 19:59:18 +00:00
Mans Rullgard
3adba2de3d ARM: fix indentation in ff_dsputil_init_neon()
Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-12-01 19:41:36 +00:00
Mans Rullgard
96fef6cf31 ARM: NEON put/avg_pixels8/16 cosmetics
This makes whitespace and register names consistent with
the style used in more recent code.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-12-01 19:41:36 +00:00
Mans Rullgard
716f1705e9 ARM: add remaining NEON avg_pixels8/16 functions 2011-12-01 19:41:36 +00:00
Mans Rullgard
94267ddfb2 ARM: clean up NEON put/avg_pixels macros
Although this adds a few lines, the macro calls are less convoluted.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-12-01 19:41:35 +00:00
Mans Rullgard
00a856e3f9 dca: ARMv6 optimised decode_blockcode()
This is a hand-tuned version of the code with impossible parts of
the FASTDIV function ommitted.

2-5% faster overall on Cortex-A8.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-11-25 13:19:53 +00:00
Mans Rullgard
3a0b72dee0 ARM: remove needless .text/.align directives
The 'function' macro already includes the appropriate
directives.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-11-23 15:06:50 +00:00
Mans Rullgard
8ee2b4672f ARM: add explicit .arch and .fpu directives to asm.S
This prevents build errors when compiler and assembler default
targets differ.  Ideally each file would declare the highest
level it requires.  This is however not easily possible as it
complicates assembling pre-armv6t2 code in Thumb-2 mode.

HAVE_NEON is used as indicator for ARMv7-A since no other
symbol exists for this and NEON is only available in this
variant.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-11-22 12:13:02 +00:00
Diego Biurrun
ce33320b30 Remove redundant filename self-references inside files.
Filenames are brittle across renames and add no useful information.
2011-11-08 17:52:56 +01:00
Anton Khirnov
acffe45732 mpegvideo: remove some unused variables from MpegEncContext. 2011-10-23 14:13:40 +02:00
Ronald S. Bultje
c2d337429c H264: change weight/biweight functions to take a height argument.
Neon parts by Mans Rullgard <mans@mansr.com>.
2011-10-21 01:00:45 -07:00
Baptiste Coudurier
76741b0e56 h264: 4:2:2 intra decoding support
Signed-off-by: Diego Biurrun <diego@biurrun.de>
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2011-10-21 01:00:41 -07:00
Mans Rullgard
6308729e68 ARM: check for inline asm 'y' operand modifier support
The inline asm added in bf5d46d uses the 'y' modifier which
is only supported from gcc 4.5.  This check allows building
with older compilers.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-10-03 08:56:24 +01:00
Mans Rullgard
bf5d46d8e6 dca: NEON optimised high freq VQ decoding
Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-09-30 19:01:23 +01:00
Mans Rullgard
baf6b738f2 ARM: NEON optimised vector_fmac_scalar()
Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-09-28 15:56:09 +01:00
Anton Khirnov
297d9cb3dc mpeg12enc: add intra_vlc private option.
Deprecate CODEC_FLAG2_INTRA_VLC.
2011-08-31 13:19:14 +02:00
Måns Rullgård
9a83adaf34 arm: Avoid using the movw instruction needlessly
This fixes building for ARM11 without Thumb2.

Signed-off-by: Martin Storsjö <martin@martin.st>
2011-08-03 11:56:58 +03:00
Martin Storsjö
d0a2f0af9d Move an int64_t down in MpegEncContext
This allows using the same arm assembler offsets for both EABI
and the mach-o ABI.

Signed-off-by: Martin Storsjö <martin@martin.st>
2011-08-03 11:56:56 +03:00
Mans Rullgard
cbd58a872d dsputil: remove some unused functions
Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-07-27 16:05:49 +01:00
Mans Rullgard
a617c6aaa3 dsputil: update per-arch init funcs for non-h264 high bit depth
Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-07-21 18:10:58 +01:00
Mans Rullgard
874f1a901d dsputil: template get_pixels() for different bit depths
Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-07-21 18:10:58 +01:00
Mans Rullgard
e7a972e113 simple_idct: add 10-bit version
Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-07-20 17:49:48 +01:00
Diego Biurrun
8342a82680 arm: remove disabled function dct_unquantize_h263_inter_iwmmxt() 2011-07-16 19:15:01 +02:00
Mans Rullgard
11043d80f6 ARM: use const macro to define constant data in asm
Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-07-10 17:56:06 +01:00
Mans Rullgard
fce1e43410 ARM: workaround for bug in GNU assembler
Some versions of the GNU assembler do not handle 64-bit
immediate operands containing arithmetic.  Writing the
value out in full works correctly.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-07-05 18:39:23 +01:00
Mans Rullgard
3824ef08e0 ARM: allow unaligned buffer in fixed-point NEON FFT4
This function is called with only 8-byte alignment from
imdct for size 16.  The fft4 function is not called for
the larger FFT or MDCT sizes, so this has no impact on
typical uses.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-07-04 20:36:35 +01:00
Mans Rullgard
5dd045ebc1 ARM: ac3: update ff_ac3_extract_exponents_neon per 8b7b2d6
Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-07-02 18:24:41 +01:00
Mans Rullgard
8aa63f0b31 ARM: NEON optimised vector_clip_int32()
Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-07-02 18:24:41 +01:00
Mans Rullgard
a3e1f80e8b ARM: remove check for PLD instruction
PLD is present in ARMv5TE and later, which is checked for separately.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-06-29 21:57:03 +01:00
Mans Rullgard
8986fddc2b ARM: allow building in Thumb2 mode
Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-06-23 07:31:54 +01:00
Mans Rullgard
88ff180ad6 ARM: update ff_h264_idct8_add4_neon for 4:4:4 changes
Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-06-15 13:19:40 +01:00
Mans Rullgard
e897a633cd ARM: factor some repetitive code into macros
Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-06-14 10:43:54 +01:00
Jason Garrett-Glaser
c90b94424c 4:4:4 H.264 decoding support
Note: this is 4:4:4 from the 2007 spec revision, not the previous (now deprecated) 4:4:4 mode in H.264.
2011-06-13 21:16:30 -07:00
Mans Rullgard
9776e25db9 ARM: jrevdct_arm: simplify stack usage
Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-06-13 12:30:22 +01:00
Mans Rullgard
13743c7ab0 ARM: jrevdct_arm: use push/pop mnemonics
Use push/pop instead of stmdb/ldmia for stack operations.  This
is the preferred syntax.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-06-13 12:30:22 +01:00
Mans Rullgard
77cdfde73e ARM: jrevdct_arm: misc cleanup
- use 'const' macro to define coeff table
- add missing endfunc
- remove superflous directives

Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-06-13 12:30:22 +01:00
Mans Rullgard
5c46ad1da0 ARM: optimised mpadsp_apply_window_fixed
Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-06-13 11:33:44 +01:00
Mans Rullgard
21c6512542 ARM: remove MUL64 and MAC64 inline asm
Current GCC versions know how to generate these instructions
properly and avoiding inline asm gives better code.  The MULH
function for ARMv5 uses the same instruction and is also not
needed any more.

The MLS64 macro remains since negating an input would normally
not be allowed as it would fail for INT_MIN.  In our uses, the
inputs never have this value and thus negating is safe.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-06-06 17:33:40 +01:00
Mans Rullgard
594fbe42c6 ARM: remove MULL inline asm
Reasonable gcc versions get this one right on their own.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-06-04 21:33:23 +01:00
Mans Rullgard
8e112df409 ARM: ac3dsp: optimised update_bap_counts()
Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-06-01 15:45:13 +01:00
Mans Rullgard
c51695dbf6 ARM: fix MUL64 inline asm for pre-armv6
Prior to ARMv6, the destination registers of the SMULL instruction
must be distinct from the first source register.  Marking the
output early-clobber ensures it is allocated unique registers.

This restriction is dropped in ARMv6 and later, so allowing overlap
between input and output registers there might give better code.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-05-31 22:21:00 +01:00
Mans Rullgard
6bb70dfd74 ARM: simplify inline asm with 64-bit operands
Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-05-30 21:19:57 +01:00
Mans Rullgard
371266daa3 ARM: enable UAL syntax in asm.S
This enables UAL syntax for all asm files instead of only those
which happen to be incompatible with the old, deprecated syntax.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-05-29 15:42:56 +01:00
Mans Rullgard
edfa89b260 ARM: unbreak build
Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-05-28 18:41:20 +01:00
Justin Ruggles
6ca23db9cc ac3enc: modify mantissa bit counting to keep bap counts for all values of bap
instead of just 0 to 4.

This does all the actual bit counting as a final step.
2011-05-28 12:39:28 -04:00
Mans Rullgard
7d8c17b5f6 ARM: aacdec: fix constraints on inline asm
This adds output operands for modified memory allowing the
volatile qualifiers to be dropped.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-05-28 15:00:17 +01:00
Mans Rullgard
84e4804ad0 ARM: remove unnecessary volatile from inline asm
Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-05-28 15:00:17 +01:00
Mans Rullgard
5726ec171b ARM: add "cc" clobbers to inline asm where needed
Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-05-28 15:00:17 +01:00
Mans Rullgard
79aeade6f6 ARM: disable ff_vector_fmul_vfp on VFPv3 systems
This function uses old-style vector operations deprecated in VFPv3.
Some implementations, e.g. Cortex-A9, support them only through
slow software emulation.  Cortex-A8 does have this functionality
in hardware, but as it also has NEON, this function is not used
there regardless.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-05-27 20:33:06 +01:00
Diego Biurrun
153382e1b6 multiple inclusion guard cleanup
Add missing multiple inclusion guards; clean up #endif comments;
add missing library prefixes; keep guard names consistent.
2011-05-21 13:48:10 +02:00
Martin Aumüller
b1eb7a1204 arm: properly mark external symbol call
Surround memset and ff_vp8_dct_cat_prob by X() in order to fix iOS build

Includes patch by Luca Barbato <lu_zero@gentoo.org>.

Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2011-05-14 10:38:23 +02:00
Ronald S. Bultje
b27b54de31 arm/h264pred: add missing argument type. 2011-05-10 08:44:49 -04:00
Oskar Arvidsson
19a0729b4c Adds 8-, 9- and 10-bit versions of some of the functions used by the h264 decoder.
This patch lets e.g. dsputil_init chose dsp functions with respect to
the bit depth to decode. The naming scheme of bit depth dependent
functions is <base name>_<bit depth>[_<prefix>] (i.e. the old
clear_blocks_c is now named clear_blocks_8_c).

Note: Some of the functions for high bit depth is not dependent on the
bit depth, but only on the pixel size. This leaves some room for
optimizing binary size.

Preparatory patch for high bit depth h264 decoding support.

Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2011-05-10 07:24:36 -04:00
Mans Rullgard
5f2e6c0fd1 ac3enc: NEON optimised extract_exponents
Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-04-05 01:11:16 +01:00
Mans Rullgard
f7653904c8 ARM: NEON fixed-point forward MDCT
Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-04-03 22:39:52 +01:00
Mans Rullgard
dba9852935 ARM: NEON fixed-point FFT
Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-04-03 22:39:52 +01:00
Mans Rullgard
aa05f2126e ac3enc: ARM optimised ac3_compute_matissa_size
Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-04-01 22:46:21 +01:00
Mans Rullgard
182826c884 ac3: armv6 optimised bit_alloc_calc_bap
Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-04-01 22:46:05 +01:00
Mans Rullgard
d782bca415 ac3enc: NEON optimised float_to_fixed24
Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-03-29 19:40:13 +01:00
Mans Rullgard
d743065e18 ARM: fix ff_apply_window_int16_neon() prototype
The length argument should be unsigned.  No change in code.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-03-24 20:43:47 +00:00
Mans Rullgard
2d3b21ffb9 ARM: NEON optimised apply_window_int16()
Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-03-24 19:24:03 +00:00
Mans Rullgard
245c78313f ac3enc: NEON optimised shift functions 2011-03-24 16:30:54 +00:00
Mans Rullgard
f4855a904e ac3enc: NEON optimised ac3_max_msb_abs_int16 and ac3_exponent_min 2011-03-24 16:30:49 +00:00
Mans Rullgard
0aded9484d Move dct and rdft definitions to separate files
This leaves fft.h with only the core FFT and MDCT definitions
thus making it more managable.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-03-20 17:15:33 +00:00
Mans Rullgard
2912e87a6c Replace FFmpeg with Libav in licence headers
Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-03-19 13:33:20 +00:00
Mans Rullgard
0b32da90f8 ARM: VP8: fix build on systems with global symbol prefix
Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-02-27 13:53:26 +00:00
Mans Rullgard
8b454c352f ARM: fix vp8 neon with pic enabled
The assembler emits literal pools too far from the load instructions,
so we must do it explicitly at a suitable location.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-02-27 13:53:21 +00:00
Loren Merritt
e6b1ed693a FFT: factor a shuffle out of the inner loop and merge it into fft_permute.
6% faster SSE FFT on Conroe, 2.5% on Penryn.

Signed-off-by: Janne Grunau <janne-ffmpeg@jannau.net>
2011-02-13 15:36:39 +01:00
Mans Rullgard
a7878c9f73 VP8: ARM optimised decode_block_coeffs_internal
Approximately 5% faster on Cortex-A8.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-02-11 15:48:11 +00:00
Mans Rullgard
7da48fd011 ARM optimised vp56_rac_get_prob()
Approximately 3% faster on Cortex-A8.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-02-11 15:48:10 +00:00
Mans Rullgard
a1c1d3c003 VP8: ARM NEON optimisations for dsp functions
This adds NEON optimised versions of all functions in VP8DSPContext.
Based on initial work by Rob Clark.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-02-07 16:08:23 +00:00
Mans Rullgard
b9a639ddd6 ARM: add helper macro for declaring constant data
Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-02-02 11:35:51 +00:00
Justin Ruggles
c73d99e672 Separate format conversion DSP functions from DSPContext.
This will be beneficial for use with the audio conversion API without
requiring it to depend on all of dsputil.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-02-02 02:44:53 +00:00
Justin Ruggles
80ba1ddb58 Remove unneeded add bias from 3 functions.
DSPContext.vector_fmul_window()
DCADSPContext.lfe_fir()
SynthFilterContext.synth_filter_float()

Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-01-31 20:28:42 +00:00
Mans Rullgard
d461a47317 Rearrange MpegEncContext to simplify access from asm
This moves the fields needed by asm near the top, before any
structs or other members which complicate the offset calculation.
Modifying other structs will no longer require updating the offsets,
and the asm code is slightly simpler due to the smaller offsets.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-01-29 17:14:29 +00:00
Mans Rullgard
0745116c10 ARM: update MpegEncContext offsets 2011-01-29 04:39:39 +00:00
Mans Rullgard
78f318be59 ARM: NEON: fix overflow in h264 16x16 planar pred
Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-01-24 14:59:46 +00:00
Justin Ruggles
6eabb0d3ad Change DSPContext.vector_fmul() from dst=dst*src to dest=src0*src1.
Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-01-22 17:53:27 +00:00
Justin Ruggles
56f8952b25 Move lpc_compute_autocorr() from DSPContext to a new struct LPCContext.
Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-01-21 19:58:59 +00:00
Janne Grunau
2c3589bfda consolidate .gitignore patters into a single file
Signed-off-by: Janne Grunau <janne-ffmpeg@jannau.net>
2011-01-18 21:32:05 +01:00
Janne Grunau
348b8218f7 convert svn:ignore properties to .gitignore files
Signed-off-by: Janne Grunau <janne-ffmpeg@jannau.net>
2011-01-17 15:50:14 +01:00
Martin Storsjö
31561a98ae Fix arm asm offsets for arm/mach-o
Originally committed as revision 26287 to svn://svn.ffmpeg.org/ffmpeg/trunk
2011-01-09 15:23:00 +00:00
Luca Barbato
183cdf7163 Update asm offsets for arm
This unbreak ffmpeg build on arm/elf, arm/mach-o still need an update

Originally committed as revision 26286 to svn://svn.ffmpeg.org/ffmpeg/trunk
2011-01-09 14:21:35 +00:00
Måns Rullgård
75c490f467 ARM: disable movw/movt for relocated values on Apple platforms
Apparently Apple platforms do not handle movw/movt relocations
properly, leading to runtime crashes in code using them.

Originally committed as revision 25150 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-09-20 21:50:46 +00:00
Måns Rullgård
4a6cc8fa25 ARM: fix NEON h264_idct_add8
Originally committed as revision 25121 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-09-14 17:11:51 +00:00
Luca Barbato
6f9932476d Update H263_AIC asm offset for the apple variant
Originally committed as revision 25099 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-09-10 19:25:42 +00:00
Stefano Sabatini
c6c98d0897 Move mm_support() from libavcodec to libavutil, make it a public
function and rename it to av_get_cpu_flags().

Originally committed as revision 25076 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-09-08 15:07:14 +00:00
Stefano Sabatini
7160bb716b Rename FF_MM_ symbols related to CPU features flags as AV_CPU_FLAG_
symbols, and move them from libavcodec/avcodec.h to libavutil/cpu.h.

Originally committed as revision 25040 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-09-04 09:59:08 +00:00
Måns Rullgård
94f8b2d799 ARM: update struct offsets
Originally committed as revision 24923 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-08-25 14:45:17 +00:00
Måns Rullgård
c0ec9918b0 Remove global mm_flags variable
Originally committed as revision 24909 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-08-24 17:47:05 +00:00
Jason Garrett-Glaser
4a384de5b8 Split h264dsp and h264pred in configure.
Many H.264 derivatives, like RV40 and VP8, use the H.264 prediction functions
but not the weight/loopfilter functions.
This should reduce the size of builds with one of these derivatives but without
H.264 decoding itself.

Originally committed as revision 24741 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-08-07 23:10:25 +00:00
Måns Rullgård
fa2d5d54b9 ARM: NEON H264 8x8 IDCT
Parts by David Conrad.

Originally committed as revision 24706 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-08-05 19:45:57 +00:00
Måns Rullgård
2eef529195 ARM: update struct offsets
Originally committed as revision 24686 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-08-03 22:29:38 +00:00
Loren Merritt
1ee076b1b1 more credits to D. J. Bernstein for fft
Originally committed as revision 24308 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-18 20:06:42 +00:00
Måns Rullgård
751484372d ARM: NEON H264 chroma loop filter 3 cycles faster
Originally committed as revision 24249 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-15 21:59:27 +00:00
Måns Rullgård
8c55333c99 ARM: remove two insns from NEON chroma loop filter
Originally committed as revision 24243 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-15 06:45:11 +00:00
Aurelien Jacobs
42d1e7a287 fix VP5/6 neon dependencies
Originally committed as revision 24160 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-10 14:26:37 +00:00
Måns Rullgård
96088566ee ARM: remove unnecessary .previous directive
Originally committed as revision 24096 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-07 20:09:45 +00:00
Måns Rullgård
278caa6ad3 ARM: set section to .text in 'function' macro
This ensures code always goes into the .text section and avoids the
need to specify it explicitly after changing sections.

Originally committed as revision 24095 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-07 20:09:41 +00:00
Måns Rullgård
108ac7f290 ARM: hide a .size directive on non-ELF targets
Originally committed as revision 24094 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-07 20:09:37 +00:00
Måns Rullgård
588d28ac08 Remove vestiges of radix-2 FFT
Patch (mostly) by Loren Merritt

Originally committed as revision 23957 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-01 23:21:42 +00:00
Måns Rullgård
a4edc5a9df ARM: add mov32 macro
Originally committed as revision 23888 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-06-29 14:48:45 +00:00
Måns Rullgård
480cb7edd3 ARM: (mostly) whitespace cosmetics
Originally committed as revision 23887 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-06-29 14:48:41 +00:00
Eli Friedman
b3858964d6 Add const to some pointer parameters.
Patch by Eli Friedman,  eli D friedman A gmail

Originally committed as revision 23826 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-06-27 15:11:38 +00:00
Måns Rullgård
f30d51d74f ARM: fix build with TI compiler
The TI compiler defines __eabi__ to signal that ARM EABI is in use.
We must check for this in addition to the gcc macro __ARM_EABI__.

Originally committed as revision 23804 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-06-26 18:34:56 +00:00
Ronald S. Bultje
a815602aa3 Reindent after r23716.
Originally committed as revision 23717 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-06-22 19:17:41 +00:00
David Conrad
3ad289fca7 Add intra prediction functions for VP8.
Patch by David Conrad <lessen42 gmail com> and myself.

Originally committed as revision 23716 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-06-22 19:17:09 +00:00
Måns Rullgård
c0f8ee0fd7 ARM: struct offsets for Apple ABI
Originally committed as revision 23438 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-06-02 22:41:32 +00:00
Måns Rullgård
30d87675f1 ARM: remove some unnecessary ifdefs, fix implicit declaration warnings
Originally committed as revision 23437 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-06-02 22:07:54 +00:00
Måns Rullgård
68dacb4e3b ARM: check struct offsets only when they are used
The offsets differ depending on configuration, so only check them when
they will actually be used.  Presently, this is when NEON is enabled.

Originally committed as revision 23436 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-06-02 22:05:25 +00:00
Måns Rullgård
a76eec3b78 ARM: fail build if hardcoded struct offsets are wrong
Originally committed as revision 23427 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-06-02 18:49:53 +00:00
David Conrad
6a7d7b88af arm neon: Add missing mangle to external symbol
Originally committed as revision 23418 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-06-01 20:59:06 +00:00
Måns Rullgård
73404a44c1 ARM: NEON clear_block[s]
Originally committed as revision 23412 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-06-01 17:21:16 +00:00
Måns Rullgård
41331b65f2 ARM: NEON optimised dct_unquantize_h263_{intra,inter}
Originally committed as revision 23386 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-05-29 15:29:40 +00:00
David Conrad
c0fda017d1 vp3: 10l Fix DC-only IDCT for C and ARM too
Originally committed as revision 23359 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-05-28 07:22:04 +00:00
Måns Rullgård
5635985c26 ARM: NEON optimised VP6 edge filter
Originally committed as revision 22993 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-04-30 21:30:27 +00:00
Måns Rullgård
84368aa629 ARM: fix build for darwin/iphone
References to external symbols in asm code need prefixes.

Originally committed as revision 22949 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-04-22 21:19:32 +00:00
David Conrad
eb6a6cd788 vp3: DC-only IDCT
2-4% faster overall decode

Originally committed as revision 22896 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-04-17 02:04:30 +00:00
Måns Rullgård
b591c7af31 10l: fix build on non-NEON ARM
Originally committed as revision 22867 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-04-13 00:48:49 +00:00
Måns Rullgård
08255107cf DCA: ARM/NEON optimised lfe_fir
Originally committed as revision 22863 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-04-12 20:45:33 +00:00
Måns Rullgård
f01210a691 ARM: fix NEON synth_filter_float with hardfp calls
Originally committed as revision 22852 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-04-12 13:28:59 +00:00
Måns Rullgård
e73d1a5efc ARM: NEON optimised synth_filter_float
2.7x faster DCA decoding on Cortex-A8

Originally committed as revision 22828 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-04-10 16:27:56 +00:00
Måns Rullgård
a8bb9ea532 ARM: NEON optimised RDFT
Originally committed as revision 22641 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-03-23 03:35:02 +00:00
Måns Rullgård
3bd74e9243 Simplify arch-specific object file lists
Originally committed as revision 22570 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-03-16 21:23:03 +00:00
Måns Rullgård
43f60eba19 Move arch-specific makefile parts into $arch/Makefile
Originally committed as revision 22569 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-03-16 21:22:59 +00:00
Måns Rullgård
4693b031a3 Move H264 dsputil functions into their own struct
This moves the H264-specific functions from DSPContext to the new
H264DSPContext.  The code is made conditional on CONFIG_H264DSP
which is set by the codecs requiring it.

The qpel and chroma MC functions are not moved as these are used by
non-h264 code.

Originally committed as revision 22565 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-03-16 01:17:00 +00:00
Martin Storsjö
18c31f6ff8 Only use .size in ARM assembly when targeting ELF
This fixes compilation on mingw32ce

Originally committed as revision 22437 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-03-10 21:39:58 +00:00
Måns Rullgård
a7e7d40c2e ARM: set size of asm functions in object files
Originally committed as revision 22404 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-03-09 16:17:56 +00:00
Måns Rullgård
4a89e0a675 ARM: add some missing includes
Originally committed as revision 22340 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-03-08 19:59:54 +00:00
Måns Rullgård
5bacc3ad57 ARM: move mpegvideo prototypes to a header file
Originally committed as revision 22309 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-03-08 02:36:07 +00:00
Måns Rullgård
1429224b04 Move FFT parts from dsputil.h to fft.h
Originally committed as revision 22235 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-03-06 14:34:46 +00:00
Kostya Shishkov
9b3c455c50 ARM: NEON scalarproduct_int16 and scalarproduct_and_madd_int16
Patch by Kostya, minor fixes by me.

Originally committed as revision 21958 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-02-22 12:20:31 +00:00
Måns Rullgård
a87b2f6df4 ARM: add missing preserve8 directives
Originally committed as revision 21952 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-02-22 00:34:05 +00:00
Måns Rullgård
41c2bd0a26 ARMv6 optimised pix_sum
Originally committed as revision 21705 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-02-09 16:13:52 +00:00
Måns Rullgård
66ec243d95 ARMv6 optimised pix_norm1
Originally committed as revision 21704 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-02-09 16:13:49 +00:00
Måns Rullgård
0c28474c92 ARMv6 optimised sse16
Originally committed as revision 21703 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-02-09 16:13:45 +00:00
Måns Rullgård
3132614305 ARMv6 optimised diff_pixels
Originally committed as revision 21702 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-02-09 16:13:41 +00:00
Måns Rullgård
f73a626ae4 ARMv6 optimised get_pixels
Originally committed as revision 21701 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-02-09 16:13:38 +00:00
Måns Rullgård
d2578ff9f1 ARMv6 optimised pix_abs8
Originally committed as revision 21700 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-02-09 16:13:34 +00:00
Måns Rullgård
74cc33c235 ARMv6 optimised pix_abs16_y2
Originally committed as revision 21699 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-02-09 16:13:31 +00:00
Måns Rullgård
39a760f678 ARMv6 optimised pix_abs16_x2
Originally committed as revision 21698 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-02-09 16:13:29 +00:00
Måns Rullgård
e6056a9008 ARMv6 optimised pix_abs16
Originally committed as revision 21697 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-02-09 16:13:26 +00:00
Måns Rullgård
38e016a7c9 ARMv6 optimised put_pixels functions except xy2 variants
Originally committed as revision 21696 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-02-09 16:13:21 +00:00
Måns Rullgård
1c6f46be03 Add missing guards and includes to arm/aac.h
Originally committed as revision 21247 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-01-16 15:54:01 +00:00
Måns Rullgård
798339fb46 AAC: ARM/NEON asm for VMUL2/4 functions
Originally committed as revision 21219 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-01-15 02:58:24 +00:00
Måns Rullgård
c5d6cd5c81 ARM: 1l c&p fix: do not set pred16x16_plane for rv40
Originally committed as revision 20705 to svn://svn.ffmpeg.org/ffmpeg/trunk
2009-12-02 17:21:43 +00:00
Måns Rullgård
702b5885a1 ARM: NEON optimised H264 16x16, 8x8 pred
Originally committed as revision 20704 to svn://svn.ffmpeg.org/ffmpeg/trunk
2009-12-02 14:56:45 +00:00
Måns Rullgård
5dad039bf7 ARM: small tweak of NEON H264 IDCT
Originally committed as revision 20697 to svn://svn.ffmpeg.org/ffmpeg/trunk
2009-12-02 00:37:39 +00:00
Måns Rullgård
1025d19dd7 ARM: NEON 2xN chroma MC
Originally committed as revision 20696 to svn://svn.ffmpeg.org/ffmpeg/trunk
2009-12-02 00:37:36 +00:00
Måns Rullgård
04e7f6d2d0 ARM: NEON 16x16 and 8x8 avg qpel MC
Originally committed as revision 20695 to svn://svn.ffmpeg.org/ffmpeg/trunk
2009-12-02 00:37:33 +00:00
Måns Rullgård
0115b3eadb ARM: align stack in NEON h264 mc functions
A certain rotten fruit operating system doesn't provide the 8-byte stack
alignment required by the standard ARM ABI, so align it manually.

Originally committed as revision 20208 to svn://svn.ffmpeg.org/ffmpeg/trunk
2009-10-11 16:16:08 +00:00
Måns Rullgård
3e6015cc18 ARM: simplify movrel definition as CONFIG_PIC is now set for shared libs
Originally committed as revision 20204 to svn://svn.ffmpeg.org/ffmpeg/trunk
2009-10-11 10:15:48 +00:00
Måns Rullgård
12bf71b691 ARM: whitespace cosmetics
Originally committed as revision 20191 to svn://svn.ffmpeg.org/ffmpeg/trunk
2009-10-07 21:35:24 +00:00
Måns Rullgård
bef966e341 ARM: NEON avg_pixels8 and avg_h264_qpel8_mc00
Originally committed as revision 20190 to svn://svn.ffmpeg.org/ffmpeg/trunk
2009-10-07 21:35:19 +00:00
Måns Rullgård
2ad4c241c8 ARM: make function names all-lowercase
Originally committed as revision 20186 to svn://svn.ffmpeg.org/ffmpeg/trunk
2009-10-06 21:55:41 +00:00
Måns Rullgård
cf57bea6fb ARM: enable ARMv4 add_pixels_clamped
Somehow this function was never used.

Originally committed as revision 20185 to svn://svn.ffmpeg.org/ffmpeg/trunk
2009-10-06 21:55:37 +00:00
Måns Rullgård
153f49570f ARM: ARMv6 optimised add_pixels_clamped()
Originally committed as revision 20184 to svn://svn.ffmpeg.org/ffmpeg/trunk
2009-10-06 21:55:35 +00:00
Måns Rullgård
c8315e9186 ARM: whitespace cosmetics
Originally committed as revision 20183 to svn://svn.ffmpeg.org/ffmpeg/trunk
2009-10-06 21:55:30 +00:00
Måns Rullgård
55c0e1e6d2 ARM: add ff_ prefix to lots of functions
Originally committed as revision 20167 to svn://svn.ffmpeg.org/ffmpeg/trunk
2009-10-04 13:13:12 +00:00
Måns Rullgård
9abcc9a6f4 ARM: cosmetics
Originally committed as revision 20166 to svn://svn.ffmpeg.org/ffmpeg/trunk
2009-10-04 13:13:08 +00:00
Måns Rullgård
f67e0b824f ARM: replace some #if with if()
Originally committed as revision 20165 to svn://svn.ffmpeg.org/ffmpeg/trunk
2009-10-04 13:13:06 +00:00
Måns Rullgård
701c618f7d ARM: clean up file/function naming conventions
Originally committed as revision 20164 to svn://svn.ffmpeg.org/ffmpeg/trunk
2009-10-04 13:13:02 +00:00
Måns Rullgård
84d430f85a ARM: clean up dsputil initialisation
- Move v5 and v6 initialisation to separate files.
- Move NEON IDCT selection to ff_dsputil_init_neon()

Originally committed as revision 20163 to svn://svn.ffmpeg.org/ffmpeg/trunk
2009-10-04 13:12:55 +00:00
Måns Rullgård
1febba1e62 ARM: shorten some long macro names
Originally committed as revision 20159 to svn://svn.ffmpeg.org/ffmpeg/trunk
2009-10-03 18:22:56 +00:00
Måns Rullgård
2e823300a6 ARM: update ldm/stm instructions to modern syntax
Originally committed as revision 20158 to svn://svn.ffmpeg.org/ffmpeg/trunk
2009-10-03 18:22:52 +00:00
Måns Rullgård
abff992d36 ARM: whitespace cosmetics
Originally committed as revision 20157 to svn://svn.ffmpeg.org/ffmpeg/trunk
2009-10-03 18:22:49 +00:00
Måns Rullgård
c61e40b728 ARM: use plain labels for pc-relative addressing
Originally committed as revision 20152 to svn://svn.ffmpeg.org/ffmpeg/trunk
2009-10-02 23:11:03 +00:00
Måns Rullgård
b44c6d8edb ARM: remove unnecessary .fpu neon directives
Originally committed as revision 20151 to svn://svn.ffmpeg.org/ffmpeg/trunk
2009-10-02 19:35:12 +00:00
Måns Rullgård
fd818a21c7 ARM: use undocumented .syntax directive to enable UAL syntax
Originally committed as revision 20150 to svn://svn.ffmpeg.org/ffmpeg/trunk
2009-10-02 19:35:07 +00:00
Måns Rullgård
e654b7c29e ARM: apply extern symbol prefix where needed
Originally committed as revision 20147 to svn://svn.ffmpeg.org/ffmpeg/trunk
2009-10-02 08:42:00 +00:00
Måns Rullgård
ec71a8e00b ARM: NEON optimised vector_fmul_add
Originally committed as revision 20063 to svn://svn.ffmpeg.org/ffmpeg/trunk
2009-09-27 16:52:05 +00:00
Måns Rullgård
f331cec47d ARM: NEON optimised vector_clipf
Originally committed as revision 20031 to svn://svn.ffmpeg.org/ffmpeg/trunk
2009-09-26 19:55:21 +00:00