It was done only in check_mvset(), while mv_scale() is called also by
dist_scale().
Sample-Id: 00001579-google
Reported-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
CC: libav-stable@libav.org
The directional intra predictors either don't care about order (dc, h,
dc_left, tm), or they prefer inverted order (vr, dr, hd). This allows
more efficient SIMD implementations.
The spec doesn't describe how it should be decoded so this is probably
the safest thing to do. Fixes valgrind errors on fuzzed11.ivf and fixes
valgrind errors on fuzzed10.ivf differently.
When request_channel_layout is 0,
all substreams should be decoded.
Thanks to Michael Niedermayer for spotting.
Also fix a mismatch between the parser and
decoder when request_channel_layout is a
subset of Stereo.
* commit '5c1c6e82261b856214499b9fef3a08bf3ff6e0ae':
dca: include dcadsp.h in {arm,x86}/dca.h for checkheaders
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit '0cffd6fff59f192120dc93aa6c3cb8180f5506e3':
x86: use the inline int8x8_fmul_int32 only if inline SSE2 is availbale
Merged-by: Michael Niedermayer <michaelni@gmx.at>
Don't decode further substreams if request_channel_layout
is a subset of the current substream's channel_layout.
Before, we would only discard further substreams if
request_channel_layout matched the substream's
channel_layout extactly, thus decoding additional
channels which the caller would probably end up downmixing.
The x86 runs short on registers because numerous elements are not static.
In addition, splitting them allows more optimized code, at least for x86.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
It is currently declared as a macro who is set to inlinable functions,
among which a Neon and a default C implementations.
Add a DSP parameter to each inline function, unused except by the
default C implementation which calls a function from the DSP context.
On an Arrandale CPU, gain for an inlined SSE2 function vs. a call:
- Win32: 29 to 26 cycles
- Win64: 25 to 23 cycles
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* commit '5bcbb516f2ff45290ef7995b081762e668693672':
arm: Add X() around all references to extern symbols
Merged-by: Michael Niedermayer <michaelni@gmx.at>
Fixes use of uninitialized memory
Fixes: 93728afd9aa074ba14a09bfd93a632fd-asan_static-oob_124a17d_1445_cov_1021181966_DBLK_D_VIXS_1.bit
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
The x86 runs short on registers because numerous elements are not static.
In addition, splitting them allows more optimized code, at least for x86.
Arm asm changes by Janne Grunau.
Signed-off-by: Janne Grunau <janne-libav@jannau.net>
For the callable function (as opposed to the inline one):
C SSE SSE2 SSE4
Win32: 47 42 29 26
Win64: 30 33 25 23
The SSE version is neither compiled nor set for ARCH_X86_64, as the
inlinable function takes over.
Signed-off-by: Janne Grunau <janne-libav@jannau.net>
It is currently declared as a macro who is set to inlinable functions,
among which a Neon and a default C implementations.
Add a DSP parameter to each inline function, unused except by the
default C implementation which calls a function from the DSP context.
On an Arrandale CPU, gain for an inlined SSE2 function vs. a call:
- Win32: 29 to 26 cycles
- Win64: 25 to 23 cycles
Signed-off-by: Janne Grunau <janne-libav@jannau.net>
Fixes use of uninitialized memory
Fixes out of array read
Fixes assertion failure
Fixes part of cb307d24befbd109c6f054008d6777b5/asan_static-oob_124a175_1445_cov_2355279992_DBLK_D_VIXS_1.bit
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Fixes inconsistencies
Fixes use of uninitilaized memory
Fixes part of cb307d24befbd109c6f054008d6777b5/asan_static-oob_124a175_1445_cov_2355279992_DBLK_D_VIXS_1.bit
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master:
vp8: Use 2 registers for dst_stride and src_stride in neon bilin filter
Conflicts:
libavcodec/arm/vp8dsp_neon.S
Merged-by: Michael Niedermayer <michaelni@gmx.at>
benchmarked on sandybridge x86_64:
1358232 decicycles in flac_lpc_32_c
1244575 decicycles in flac_lpc_32_sse4, James Almer's patch
650045 decicycles in flac_lpc_32_sse4, this patch
I haven't tested the edgecases such as odd block lengths
odd block length tested-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Enable compilation on machines with an old libfdk-aac.
Signed-off-by: Timothy Gu <timothygu99@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Also adjust header #include order and some comments.
Signed-off-by: Diego Biurrun <diego@biurrun.de>
Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
* commit '1f097d168d9cad473dd44010a337c1413a9cd198':
h264: reset data partitioning at the beginning of each decode call
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit '5de64bb34d68d6c224dca90003172d7a27958825':
utvideoenc: Add support for the new BT.709 FourCCs for YCbCr
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit 'b25e84b7399bd91605596b67d761d3464dbe8a6e':
hevc: check that the VCL NAL types are the same for all slice segments of a frame
Merged-by: Michael Niedermayer <michaelni@gmx.at>
Prevents using GetBitContexts with data from previous calls.
Fixes access to freed memory.
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
CC:libav-stable@libav.org
With cli usage the decoder might have not set the colorspace during
encoder init, manual colorspace override might be needed in such
cases.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
This avoids them being cleared before the full initialization finished
Fixes out of array read
Fixes: asan_heap-oob_f0c5e6_7071_cov_1605985132_mov_h264_aac__Demo_FlagOfOurFathers.mov
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Such changes are forbidden in H.264 and lead to race conditions
Fixes out of array read
Fixes: signal_sigsegv_f9796a_1613_cov_3114610371_FM1_BT_B.h264
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Fixes out of array read
Fixes: asan_static-oob_1efed25_1887_cov_2013541199_HeyYa_RA10_AAC_192K_30s.rm
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Fixes out of array read
Fixes: signal_sigsegv_1326a09_1752_cov_245452111_GRTH301.HNS
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Even though the most common framerate for RoQ is 30fps,
the format supports other framerates too.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* commit '09e2203b8ba6943d5c0fe6d73b65b145c3fdf98e':
hevc: Consider first quantization group any reference to 0, 0
Merged-by: Michael Niedermayer <michaelni@gmx.at>
Fixes use of uninitialized memory
Fixes out of array read
Fixes: asan_static-oob_123cee5_2630_cov_1869071233_PICSIZE_A_Bossen_1.bin
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
It is not necessarily an error when a chunk does not cover a whole block.
Messages did not reflect the actual situation either.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
A dependent slice cannot have address 0.
Prevent an out of array bound load in ff_hevc_cabac_init().
Sample-Id: 00001406-google
Reported-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
CC: libav-stable@libav.org
According to my understanding of T-REC-H.265-2013044 chapter 8.6.1.
Sample-Id: 00001438-google
Reported-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
CC: libav-stable@libav.org
This case could occur when cuting and concatenating bitstreams
Fixes out of array read
Fixes: asan_heap-oob_1b33fdd_2849_cov_478905890_SA10143.vc1
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Fixes out of array access
No releases should be affected
Depends on 7c3700cd1d, do not backport without this one
Fixes: asan_heap-oob_14a37fe_9111_cov_1692584941_test4.amv
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Fixes out of array access
Fixes: asan_heap-oob_19c7a94_6470_cov_1453611734_luckynight-partial.tak
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* commit '5312818524484a995433b986a2a7a6602572d4db':
atrac3plus: Make initialization dependant on channel count rather than channel map
Merged-by: Michael Niedermayer <michaelni@gmx.at>
Makes it easier to recreate an AVCodecContext for ATRAC3+ decoding,
which is needed in multimedia frameworks, as well as in general cases
where demuxing and decoding are separate entities.
Should fix crashes or corrupt output on pre-SSE2 CPUs when they were
using SSE2-code (e.g. AMD Athlon XP 2400+ or Intel Pentium III) in
hfix or hvar single-edge (left/right) extension functions.
Signed-off-by: Janne Grunau <janne-libav@jannau.net>
* qatar/master:
mpeg: Drop unused parameters from ff_draw_horiz_band()
Conflicts:
libavcodec/mpegvideo.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
9680 decicycles in loop_filter_v_88_16_c, 4193765 runs, 539 skips
9233 decicycles in loop_filter_h_88_16_c, 4193751 runs, 553 skips
1929 decicycles in ff_vp9_loop_filter_v_88_16_ssse3, 4194118 runs, 186 skips
2738 decicycles in ff_vp9_loop_filter_h_88_16_ssse3, 4193861 runs, 443 skips
5.978 → 5.417 overall decode time on ped1080p.webm (-threads 1)
Adding SSE2 support should be relatively trivial (just a matter of
changing the pshufb [mask_mix] with something else), patch welcome.
This was suggested by Rodeo on IRC
<Rodeo> for consistency with the rest, MODE_7_1_FRONT_CENTER would be AV_CH_LAYOUT_7POINT1_WIDE_BACK (since LS+RS is mapped to back channels in other modes)
Reviewed-by: Jean First <jeanfirst@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
This was suggested by Rodeo on IRC
<Rodeo> sorry, I meant MODE_7_1_REAR_SURROUND would probably be AV_CH_LAYOUT_7POINT1
Reviewed-by: Jean First <jeanfirst@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Introduce 2 additional registers for stride3 and mstride3 to allow
direct accesses (lea drops).
3931 → 3827 decicycles in ff_vp9_loop_filter_v_16_16_ssse3
Also uses defines to clarify the code.
Fixes invalid reads and crashes in vp90-2-05-resize.webm and fuzzed6.ivf.
The output is still not identical to what libvpx does (because we don't
actually scale in MC).
Reviewed-by: ubitux
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Prevents some invalid memory accesses after resolution change in
vp90-2-05-resize.webm, and libvpx does this too.
Reviewed-by: ubitux
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
7.1(wide) and 7.1(wide-side) channel layouts are supported in fdk_aac since october 2013 (commit fa3eba1644)
Signed-off-by: Jean First <jeanfirst@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
This fixes the speed regression from 20626f53e9
and still checks sufficiently to prevent out of allocated memory accesses
due to the index
Before:
1823 decicycles in mpeg2_fast_decode_block_non_intra, 8388493 runs, 115 skips
After:
1808 decicycles in mpeg2_fast_decode_block_non_intra, 8388494 runs, 114 skips
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
This fixes the speed regression from 20626f53e9
and still checks sufficiently to prevent out of allocated memory accesses
due to the index
Before:
1681 decicycles in mpeg2_fast_decode_block_intra, 4194238 runs, 66 skips
After:
1658 decicycles in mpeg2_fast_decode_block_intra, 4194248 runs, 56 skips
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Before this patch, we explicitly modify rsp, which isn't necessarily
universally acceptable, since the space under the stack pointer might
be modified in things like signal handlers. Therefore, use an explicit
register to hold the stack pointer relative to the bottom of the stack
(i.e. rsp). This will also clear out valgrind errors about the use of
uninitialized data that started occurring after the idct16x16/ssse3
optimizations were first merged.
- The memcpy was completely wrong because
s->prob_ctx[s->framectxid].coef is a [4][2][2][6][6][3] array, whereas
s->prob.coef is a [4][2][2][6][6][11] array.
- The additional check was committed to ffmpeg by Ronald S. Bultje.
Signed-off-by: Anton Khirnov <anton@khirnov.net>