Based on a patch from Christophe Gisquet.
Unrolling of the m == 0 case avoids a possible use of the uninitilized
value sum when s->predictor_history is not set. I failed to find a
sample for it. It also reduced the cycle count from 220 to 150 on
sandy bridge, x86_64 linux, gcc 4.8.2 compared to his patch.
The scaling factor is constant so it is faster to scale the
FIR coefficients in the tables during compilation.
Signed-off-by: Janne Grunau <janne-libav@jannau.net>
The x86 runs short on registers because numerous elements are not static.
In addition, splitting them allows more optimized code, at least for x86.
Arm asm changes by Janne Grunau.
Signed-off-by: Janne Grunau <janne-libav@jannau.net>
For the callable function (as opposed to the inline one):
C SSE SSE2 SSE4
Win32: 47 42 29 26
Win64: 30 33 25 23
The SSE version is neither compiled nor set for ARCH_X86_64, as the
inlinable function takes over.
Signed-off-by: Janne Grunau <janne-libav@jannau.net>
It is currently declared as a macro who is set to inlinable functions,
among which a Neon and a default C implementations.
Add a DSP parameter to each inline function, unused except by the
default C implementation which calls a function from the DSP context.
On an Arrandale CPU, gain for an inlined SSE2 function vs. a call:
- Win32: 29 to 26 cycles
- Win64: 25 to 23 cycles
Signed-off-by: Janne Grunau <janne-libav@jannau.net>
When downmixing 2.1 to 2-channel, if the 2.0 portion is Lt/Rt, sum-difference or dual mono, the actual output will be the same (with the LFE either mixed-in or discarded).
Also, when downmixing an arbitrary layout to 2-channel, if the bitstream contains custom downmix coefficients targeting Lt/Rt, then the output will be Lt/Rt rather than regular Stereo.
The check for (prim_channels > 2) before calling dca_downmix made these
cases unreachable, but now 2.1 layouts will go through the downmix code.
Having dual mono, Lt/Rt and sum-difference layouts print errors when
regular Stereo doesn't seems pointless.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
It was based on an old, seemingly incorrect specification, so default
coefficients were always used anyway.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
When the extra rear channel is present but unused, the
s->channel_order_tab[] value for that channel is -1. The QMF can be
skipped for the extra channel, and doing so avoids an out-of-array read
on s->samples_chanptr[].
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: Justin Ruggles <justin.ruggles@gmail.com>