Commit Graph

10 Commits

Author SHA1 Message Date
Niklas Haas
22530ad1ce lavc/h274: transpose IDCT
This is mathematically equivalent to what we were doing before, but
gives subtly different results due to rounding (rows first vs columns
first). Doing it this way makes our film grain database generation match
reference implementation and now produces bit-exact outputs in my
testing.

Rename the transposed variables to be a bit less confusing.
2023-10-03 00:27:14 +02:00
Niklas Haas
48fc414c7c lavc/h274: fix comment (cosmetic)
Either the average, or the sum right-shifted. Not the average
right-shifted.
2023-09-28 17:11:23 +02:00
Niklas Haas
616e9d2413 lavc/h274: correct grain DB indices
The spec specified indices in the order [x][y], but our code follows the
traditional C convention of [y][x]. This was not correctly account for
when calculating the base index of the grain database access.
2023-09-28 17:11:23 +02:00
Niklas Haas
338a5fcdbe lavc/h274: fix PRNG definition
The spec specifies x^31 + x^3 + 1 as the polynomial, but the diagram in
Figure 1-1 omits the +1 offset. The initial implementation was based on
the diagram, but this is wrong (produces subtly incorrect results).
2023-09-28 17:11:23 +02:00
Michael Niedermayer
98aec8c1b8 avcodec/h274: Fix signed left shift
Fixes: 39463/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_H264_fuzzer-5736517629247488

Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2021-10-09 11:42:16 +02:00
Michael Niedermayer
991b3deea9 avcodec/h274: fix bad left shifts
Fixes: left shift of negative value -3
Fixes: 37788/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_H264_fuzzer-6024714540154880

Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2021-09-26 17:21:59 +02:00
Niklas Haas
a543d075cd avcodec/h274: trim unnecessarily large array
We only ever read to idx+3, so 256 values are overkill.

Signed-off-by: James Almer <jamrial@gmail.com>
2021-09-12 11:07:40 -03:00
Niklas Haas
52c35d648c avcodec/h274: don't read from uninitialized array members
This bug flew under the radar because, in practice, these values are
0-initialized for the first invocation. But for subsequent invocations
(with different h/v values), reading from the uninitialized parts of
`out` is undefined behavior.

Avoid this by simply adjusting the iteration range of the following
loops. Has the added benefit of being a minor speedup.

Signed-off-by: James Almer <jamrial@gmail.com>
2021-09-12 11:07:40 -03:00
Lynne
033105a739
h274: remove optimization pragma
This results in warnings on compilers which don't support it,
objections were raised during the review process about it but went unnoticed,
and the speed benefit is highly compiler and version specific, and
also not very critical.

We generally hand-write assembly to optimize loops like that, rather
than use compiler magic, and for 40% best case scenario, it's simply
not worth it.

Plus, tree vectorization is still problematic with GCC and disabled by default
for a good reason, so enabling it locally is sketchy.
2021-08-28 15:13:55 +02:00
Niklas Haas
6bc29a6b57 avcodec/h274: add film grain synthesis routine
This could arguably also be a vf, but I decided to put it here since
decoders are technically required to apply film grain during the output
step, and I would rather want to avoid requiring users insert the
correct film grain synthesis filter on their own.

The code, while in C, is written in a way that unrolls/vectorizes fairly
well under -O3, and is reasonably cache friendly. On my CPU, a single
thread pushes about 400 FPS at 1080p.

Apart from hand-written assembly, one possible avenue of improvement
would be to change the access order to compute the grain row-by-row
rather than in 8x8 blocks. This requires some redundant PRNG calls, but
would make the algorithm more cache-oblivious.

The implementation has been written to the wording of SMPTE RDD 5-2006
as faithfully as I can manage. However, apart from passing a visual
inspection, no guarantee of correctness can be made due to the lack of
any publicly available reference implementation against which to
compare it.

Signed-off-by: Niklas Haas <git@haasn.dev>
Signed-off-by: James Almer <jamrial@gmail.com>
2021-08-24 09:58:52 -03:00