C++ does not allow to mix different enums, so e.g. code comparing
ACodecID with CodecID would fail to compile with gcc.
This very evil hack should fix this problem.
In 16-bit arithmetic, x * 0xffffc is simply x * -4 with extra overflows,
(and the constant was probably meant to be 0xfffc). Combined with the
shift, this simplifies to -x >> 1. Finally, clearing the low two bits
with a 32-bit mask and switching to a 32-bit type allows more efficient
code on 32-bit machines.
Signed-off-by: Mans Rullgard <mans@mansr.com>
At both places this function is called, mb_[xy] == s->mb_[xy]
making the call together with following code equivalent to
simply assigning zeros.
Signed-off-by: Mans Rullgard <mans@mansr.com>
The main benefit of inlining this function is from constant
propagation for the 'field_based' argument. Instead of inlining
all calls, create two versions of the function for field_based
values of 0 and 1.
Signed-off-by: Mans Rullgard <mans@mansr.com>
This file defines a single, huge function, MPV_motion(), which
although being declared inline is not actually inlined by the
compiler (for good reason). There is thus no sense in defining
this function in a header file, resulting in multiple copies of
it in the final library.
Signed-off-by: Mans Rullgard <mans@mansr.com>
This adds a hidden config variable for the mpegvideo.o dependency
and selects from the codecs which require it.
Signed-off-by: Mans Rullgard <mans@mansr.com>
This macro is only used in two places, both in libavcodec, so this
is a more sensible place for it.
Two small tweaks to the macro are made:
- removing the trailing semicolon
- dropping unnecessary 'volatile' from the x86 asm
Signed-off-by: Mans Rullgard <mans@mansr.com>
It expects maximum value to be 32767 but calculations in scale_vector()
which uses this function can give it ABS(-32768) which leads to wrong
result and thus clipping is needed.
yasm tolerates mismatch between movd/movq and source register size,
adjusting the instruction according to the register. nasm is more
strict.
Signed-off-by: Mans Rullgard <mans@mansr.com>
The check is bogus since the nuv frameheader is already skipped
and the (decompressed) RTjpeg header is checked.
This reverts commit f6afacdb3b.
CC: libav-stable@libav.org
If there was a failure inflating, or reinitializing
the zstream, the current frame's buffer would be lost.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
In the GNU assembler, a relational expression, bizarrely, has the
value -1 if true, whereas in Apple's it is +1. This patch makes
sure the correct expression is used in both cases.
Signed-off-by: Mans Rullgard <mans@mansr.com>
The clang integrated assembler does not support pre-UAL syntax,
while gcc requires pre-UAL syntax for ARM code. A patch[1] for
clang to support the old syntax as well has been ignored since
January.
This patch chooses the syntax appropriate for each compiler,
allowing both to build the code. Notably, this change allows
building for iphone with the latest Apple Xcode update.
[1] http://llvm.org/bugs/show_bug.cgi?id=11855
Signed-off-by: Mans Rullgard <mans@mansr.com>
Refactoring mmx2/mmxext YASM code with cpuflags will force renames.
So switching to a consistent naming scheme beforehand is sensible.
The name "mmxext" is more official and widespread and also the name
of the CPU flag, as reported e.g. by the Linux kernel.
For left HFYU prediction, we predict from the buffer buf+1 using 8- or
16-byte reads. This means that aligning the buffer by 16 bytes is in
itself not sufficient, because if the width itself is 16- or 8-byte
aligned, the buffer will not be padded, and thus a read of size 16 at
buf+1 will overflow boundaries at the right edge. Padding the buffer by
1 byte is sufficient to not overflow its boundaries.
Fixes bug 342.
This makes add_hfyu_left_prediction_sse4() handle sources that are not
16-byte aligned in its own function rather than by proxying the call to
add_hfyu_left_prediction_ssse3(). This fixes a crash on Win64, since the
sse4 version clobberes xmm6, but the ssse3 version (which uses MMX regs)
does not restore it, thus leading to XMM clobbering and RSP being off.
Fixes bug 342.
The scaling process for obtaining direct MVs from co-located field MVs
is the same for interlaced field and progressive pictures.
Signed-off-by: Kostya Shishkov <kostya.shishkov@gmail.com>
In VC-1 interlaced field pictures, chroma motion vectors can extend beyond
picture boundary even if luma vectors are bounded. The problem shows up
only for hpel interpolated MVs, and may be due to the way motion vectors
are scaled / cropped.
Thanks to Konstantin Shishkov for suggesting the fix. This fixes
long-known segfaults in MC-VC1.ts from videolan streams archive.
Signed-off-by: Kostya Shishkov <kostya.shishkov@gmail.com>
Currently there is a wild mix of 3dn2/3dnow2/3dnowext. Switching to
"3dnowext", which is a more common name of the CPU flag, as reported
e.g. by the Linux kernel, unifies this.
Fixed codebook mode in 5300 rate may write up to SUBFRAME_LEN + 4 and
that is considered normal by the reference decoder. Without that additional
padding it might overwrite first elements of LPC history.
Some calculations were changed in b6a3849 to use mmsize, which was not correct
for the AVX version, which uses INIT_YMM and therefore has mmsize == 32.
Fixes Bug 341.
Signed-off-by: Justin Ruggles <justin.ruggles@gmail.com>
This allows building dct-test even if aandcttab.o is not pulled in
by any enabled codec. The DCT with which these tables are used does
not use them directly, so building it without the tables is possible.
Signed-off-by: Mans Rullgard <mans@mansr.com>
Reordering the members in this struct reduces the holes required
to maintain alignment. With this order, the only remaining, and
unavoidable, hole is 3 bytes following left_nnz.
Signed-off-by: Mans Rullgard <mans@mansr.com>
These functions are not faster than other mmx implementations on
any hardware I have been able to test on, and they are horribly
inaccurate. There is thus no reason to ever use them.
Signed-off-by: Mans Rullgard <mans@mansr.com>
The standard syntax requires two destination registers for
LDRD/STRD instructions. Some versions of the GNU assembler
allow using only one with the second implicit, others are
more strict.
Signed-off-by: Mans Rullgard <mans@mansr.com>
This completes the conversion of h264dsp to yasm; note that h264 also
uses some dsputil functions, most notably qpel. Performance-wise, the
yasm-version is ~10 cycles faster (182->172) on x86-64, and ~8 cycles
faster (201->193) on x86-32.
These decoders use a special non-MPEG2 IDCT. Call it directly
instead of going through dsputil. There is never any reason
to use a regular IDCT with these decoders or to use the EA IDCT
with other codecs.
This also fixes the bizarre situation of eamad and eatqi decoding
incorrectly if eatgq is disabled.
Signed-off-by: Mans Rullgard <mans@mansr.com>
There is no sense in pulling in this monster struct just for
a handful of fields. The code does not call any functions
expecting an MpegEncContext.
Signed-off-by: Mans Rullgard <mans@mansr.com>
Without this, cglobal will expand "z" to "zh" to access the high byte
in a register's word, which causes a name collision with the ZH(x) macro
further up in this file.
This fixes out of array writes
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: Kostya Shishkov <kostya.shishkov@gmail.com>