Commit Graph

737 Commits

Author SHA1 Message Date
Andreas Rheinhardt
a3db9f62a4 avformat/matroskadec: Introduce a "last known good" position
Currently, resyncing during reading packets works as follows:
The current position is recorded, then a call to matroska_parse_cluster
is made and if said call fails, the demuxer tries to resync from the
earlier position. If the call doesn't fail, but also doesn't deliver a
packet, then this is looped.

There are two problems with this approach:
1. The Matroska file format aims to be forward-compatible; to achieve
this, a demuxer should simply ignore and skip elements it doesn't
know about. But it is not possible to reliably distinguish unknown
elements from junk. If matroska_parse_cluster encounters an unknown
element, it can therefore not simply error out; instead it returns zero
and the loop is iterated which includes an update of the position that
is intended to be used in case of errors, i.e. the element that is
skipped is not searched for level 1 element ids to resync to at all if
later calls to matroska_parse_cluster return an error.
Notice that in case that sync has been lost there can be a chain of
several unknown/possibly junk elements before an error is detected.

2. Even if a call to matroska_parse_cluster delivers a packet, this does
not mean that everything is fine. E.g. it might be that some of the
block's data is missing and that the data that was presumed to be from
the block just read actually contains the beginning of the next element.
This will only be apparent at the next call of matroska_read_packet,
which uses the (false) end of the earlier block as resync position so
that in the (not unlikely) case that the call to matroska_parse_cluster
fails, the data believed to be part of the earlier block is not searched
for a level 1 element to resync to.

To counter this, a "last known good" position is introduced. When an
element id that is known to be allowed at this position in the hierarchy
(according to the syntax currently in use for parsing) is read and some
further checks (regarding the length of the element and its containing
master element) are passed, then the beginning of the current element is
treated as a "good" position and recorded as such in the
MatroskaDemuxContext. Because of 2., only the start of the element is
treated as a "good" position, not the whole element. If an error occurs
later during parsing of clusters, the resync process starts at the last
known good position.

Given that when the header is damaged the subsequent resync never skips over
data and is therefore unaffected by both issues, the "last known good"
concept is not used there.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
2019-07-16 16:16:58 -03:00
Andreas Rheinhardt
559e3422c7 avformat/matroskadec: Refactor some functions
Since the changes to the parsing of SimpleBlocks, both ebml_parse_id and
ebml_parse_elem are only called from one place, so that it is possible
to inline these two function calls. This is done, but not completely:
ebml_parse_id still exists in a modified form. This is done in
preparation for a further patch regarding the handling of
unknown-length elements.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
2019-07-16 16:16:58 -03:00
Andreas Rheinhardt
8a286e745d avformat/matroskadec: Use proper levels after discontínuity
The earlier code set the level to zero upon seeking and after a
discontinuity although in both cases parsing (re)starts at a level 1
element.

Also set the segment's length to unkown if an error occured in order not
to drop any valid data that happens to be beyond the designated end of
the segment.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
2019-07-16 16:16:58 -03:00
Andreas Rheinhardt
310f326b43 avformat/matroskadec: Add function to reset status
This function will be useful later to reset the status (e.g. current
level and the already parsed id).

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
2019-07-16 16:16:57 -03:00
Andreas Rheinhardt
27f40b1dcd avformat/matroskadec: Don't abort resyncing upon seek failure
When an error happens, the Matroska demuxer tries to resync to level 1
elements from an earlier position onwards. If the seek to said earlier
position fails, the demuxer currently treats this as an unrecoverable
error. And that behaviour is suboptimal as said failure is nothing
unrecoverable or unexpected (when the input isn't seekable).
It is preferable to simply resync from the earliest position available
(i.e. the start of the AVIOContext's buffer) onwards if the seek failed.

Here are some scenarios that might be treated as unrecoverable errors
by the current code if the input isn't seekable. They all have in
common that the current position is so far away from the desired
position that the seek can't be fulfilled from the AVIOContext's buffer:

1. Blocks (both SimpleBlocks as well as a Block in a BlockGroup) for
which reading them as binary EBML elements succeeds, but whose parsing
triggers an error (e.g. an invalid TrackNumber). In this case the
earlier position from which resyncing begins is at the start of the block
(or even earlier).
2. BlockGroups, whose parsing fails in one of the latter elements. Just
as in 1., the start of the BlockGroup (the target of the seek) might be
so far away from the current position that it is no longer in the
buffer.
3. At the beginning of parsing a cluster, the cluster is parsed until a
SimpleBlock or a BlockGroup is encountered. So if the input is damaged
between the beginning of the cluster and the first occurrence of a
SimpleBlock/BlockGroup and if said damage makes the demuxer read/skip so
much data that the beginning of the cluster is no longer in the buffer,
demuxing will currently fail completely.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
2019-07-16 16:16:57 -03:00
Andreas Rheinhardt
eb33be188d matroskadec: Fix overflow introduced in a569a7b3
This commit fixes an overflow introduced in a569a7b3 that affected EBML
elements that the Matroska demuxer doesn't want to parse like CRC-32
elements. The return value of avio_skip (the new position on success or
an AVERROR on failure) has been assigned to an integer which meant that
new positions in the range of 2GB to 4GB-1 etc. were considered errors.

Fixes ticket #8001.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2019-07-06 14:48:54 -03:00
Andreas Rheinhardt
ff5ea59f7b avformat/matroskadec: Improve error/EOF checks III
Up until now, when an element was skipped, it was relied upon
ffio_limit to make sure that there is enough data available to skip.
ffio_limit itself relies upon the availability of the file's size. As
this needn't be available, the check has been refined: First one byte
less than intended is skipped, then another byte is read, followed by a
check of the error flags.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
2019-06-24 22:19:03 -03:00
Andreas Rheinhardt
a569a7b3bb avformat/matroskadec: Improve read error/EOF checks II
This commit fixes a number of bugs:

1. There was no check that no read error/EOF occured during
ebml_read_uint, ebml_read_sint and ebml_read_float.
2. ebml_read_ascii and ebml_read_binary did sometimes not forward
error codes; instead they simply returned AVERROR(EIO).
3. In particular, AVERROR_EOF hasn't been used and no dedicated error
message for it existed. This has been changed.

In order to reduce code duplication, the new error code NEEDS_CHECKING
has been introduced which makes ebml_parse check the AVIOContext's
status for errors.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
2019-06-24 22:19:03 -03:00
Andreas Rheinhardt
239c7369e0 avformat/matroskadec: Improve read error/EOF checks I
ebml_read_num had a number of flaws:

1. The check for read errors/EOF was totally wrong. E.g. an EBML number
beginning with the invalid 0x00 would be considered a read error,
although it is just invalid data.
2. The check for read errors/EOF was done just once, after reading the
first byte of the EBML number. But errors/EOF can happen inbetween, of
course, and this wasn't checked.
3. There was no way to distinguish when EOF should be an error (because
the data has to be there) for which an error message should be emitted
and when it is not necessarily an error (namely during parsing of EBML
IDs). Such a possibility has been added and used.

All this was fixed; furthermore, the error messages for invalid EBML
numbers were improved and useless initializations were removed.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
2019-06-24 22:19:03 -03:00
Andreas Rheinhardt
a27e5398e2 avformat/matroskadec: Properly check return values
Up until now, webm_dash_manifest_cues used the return values of
ebml_read_num and ebml_read_length without checking for errors,
i.e. return values < 0. This has been changed.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
2019-06-24 22:19:03 -03:00
Andreas Rheinhardt
1215b3a5f3 avformat/matroskadec: Don't zero unnecessarily
It is only necessary to zero the initial allocated memory used to store
the size of laced frames if the block used Xiph lacing. Otherwise no
unintialized data was ever used, so use av_malloc instead of av_mallocz.

Also use the correct type for the allocations.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
2019-06-24 20:59:33 -03:00
Andreas Rheinhardt
bc3306fd5b avformat/matroskadec: Treat SimpleBlock as EBML_BIN
Up until now, the SimpleBlock was treated specially: It basically had
its own EBML category and it was also included in the BlockGroup EBML
syntax (although a SimpleBlock must not exist in a BlockGroup according
to the Matroska specifications). The latter fact also meant that
a MatroskaBlock's buffer was always unreferenced twice.
This has been changed: The type of a SimpleBlock is now an EBML_BIN.
The only way in which SimpleBlocks are still different is that they
share their associated structure with another unit (namely BlockGroup).
This is also used to unref the block: It is always unreferenced via the
BlockGroup syntax.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
2019-06-23 20:11:24 -03:00
Andreas Rheinhardt
ffa64a4db8 avformat/matroskadec: Don't keep old blocks
Before this commit, the Matroska muxer would read a block when required
to do so, parse the block, create and return the necessary AVPackets and
yet keep the blocks (in a dynamically allocated list), although they
aren't used at all any more. This has been changed. There is no list any
more and the block is immediately discarded after parsing.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
2019-06-23 00:47:50 -03:00
Andreas Rheinhardt
f3ca3e7f19 avformat/matroskadec: Remove redundant initialization
Every new element of an EbmlList is zeroed initially in
ebml_parse_elem, so that in particular a SimpleBlock's duration is
initialized to zero. Therefore it is unnecessary to initialize this
field again (for SimpleBlocks) in matroska_parse_cluster_incremental.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
2019-06-23 00:47:50 -03:00
Andreas Rheinhardt
43c3cebbd4 avformat/matroskadec: Set offset of first cluster
By default, the data_offset member of the AVFormatInternal of the
AVFormatContext associated with the MatroskaDemuxContext has not been
initialized explicitly by any Matroska-specific function, so that it was
initialized by default to the offset at the end of matroska_read_header,
i.e. usually to the offset of the length field of the first encountered
cluster. This meant that in case that the Matroska-specific seek-code
fails because there are no index entries for the target track a seek to
data_offset would be performed and ordinary parsing would start from
there which is nonsense: The length field would be treated as EBML ID and
(if the length field is not longer than four bytes (EBML numbers that
long are rejected as invalid EBML IDs)) whatever comes next would be
treated as its EBML size although it simply isn't.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
2019-06-23 00:47:50 -03:00
Andreas Rheinhardt
36aceb6174 avformat/matroskadec: Get rid of cluster size field assumption
The earlier code relied on the length of clusters always being coded on
eight bytes as was the behaviour of libavformat's Matroska muxer until
recently. But given that our own Matroska muxer now (and mkvmerge from
time immemorial) creates files that don't conform to this assumption,
it is high time to get rid of this assumption.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
2019-06-23 00:47:50 -03:00
Andreas Rheinhardt
70baf729b5 avformat/matroskadec: Remove non-incremental parsing of clusters
When the new incremental parser was introduced, the old parser was
kept, because the new parser was unable to handle the way SSA packets
are put into Matroska. But since 2014 (since c7d8dbad) this is no
longer needed, so that the old parser can be completely removed.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
2019-06-22 23:45:41 -03:00
Andreas Rheinhardt
e5ec131856 avformat/matroskadec: Use generic size check for signed integers
and drop the redundant checks contained in ebml_read_uint and
ebml_read_sint.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
2019-06-22 23:43:38 -03:00
Andreas Rheinhardt
07d4056052 avformat/matroskadec: Don't copy attached pictures
This commit replaces copying attached pictures by using references to
the already existing buffers.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
2019-06-22 23:39:17 -03:00
Andreas Rheinhardt
410a0824f0 avformat/matroskadec: Compactify structure
Matroska EBML IDs can be only four bytes long maximally, so it is
natural to use uint32_t for them. By doing this and rearranging the
elements of the MatroskaLevel1Element structure, one can reduce the size
of said structure.

Notice that this field is not read via the generic reading process for
EBML_UINT, so one is not forced to use an uint64_t for it.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2019-06-07 19:58:15 +02:00
Andreas Rheinhardt
f767c68b34 avformat/matroskadec: Correct outdated error message
This error message is outdated since d31fb1a9.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2019-06-07 19:58:15 +02:00
Andreas Rheinhardt
c6bb825e72 avformat/matroskadec: Remove unused variables
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2019-06-07 19:58:15 +02:00
Andreas Rheinhardt via ffmpeg-devel
18a851aca7 avformat/matroskadec: Improve length check
The earlier code had three flaws:

1. The case of an unknown-sized element inside a finite-sized element
(which is against the specifications) was not caught.

2. The error message wasn't helpful: It compared the length of the child
with the offset of the end of the parent and claimed that the first
exceeds the latter, although that is not necessarily true.

3. Unknown-sized elements that are not parsed can't be skipped. Given
that according to the Matroska specifications only the segment and the
clusters can be of unknown-size, this is handled by not allowing any
other units to have infinite size whereas the earlier code would seek
back by 1 byte upon encountering an infinite-size element that ought
to be skipped.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@googlemail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2019-04-05 12:05:47 +02:00
Carl Eugen Hoyos
4d8875ec23 lavf: Constify the probe function argument.
Reviewed-by: Lauri Kasanen
Reviewed-by: Tomas Härdin
2019-03-21 11:42:17 +01:00
Steve Lhomme
a2c7702ccf avformat:matroskadec: use a define to mark the EBML length is unknown
Unifying the way the EBML unknown length is signaled, rather than using two
incompatible values. UINT64_MAX cannot be read as a valid EBML length with the
current code.

Co-authored-by: Steve Lhomme <robux4@ycbcr.xyz>
Co-authored-by: Dale Curtis <dalecurtis@chromium.org>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2019-02-24 20:31:07 +01:00
Steve Lhomme
9326117bf6 avformat/matroskadec: Check parents remaining length
This was found through the Hacker One program on VLC but is not a security issue in libavformat
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2019-02-17 10:29:42 +01:00
Michael Niedermayer
d1afa7284c avformat/matroskadec: Do not leak queued packets on sync errors
Fixes: memleak
Fixes: clusterfuzz-testcase-minimized-audio_decoder_fuzzer-5649187601121280

Reported-by: Chris Cunningham <chcunningham@google.com>
Tested-by: Chris Cunningham <chcunningham@google.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2019-02-17 09:12:06 +01:00
Carl Eugen Hoyos
73abde61bb lavf/matroskadec: Do not use strncat() to limit copying a one-char constant.
Instead add the character to the snprintf above as suggested by Mark.

Silences a warning:
libavformat/matroskadec.c: In function 'webm_dash_manifest_cues':
libavformat/matroskadec.c:3947:13: warning: 'strncat' specified bound 1 equals source length [-Wstringop-overflow=]
             strncat(buf, ",", 1);
             ^~~~~~~~~~~~~~~~~~~~
2018-12-11 00:40:26 +01:00
Carl Eugen Hoyos
6871c17173 lavf/matroskadec: Simplify string length calculation.
FFmpeg relies on sizeof(char) == 1.
2018-10-19 20:36:55 +02:00
James Almer
8d5604a69a avformat/av1: update ff_isom_write_av1c() to the latest revision of the spec
This will get ISOBMFF and Matroska up to date with the revised AV1 Codec
Configuration Box spec.
For now keep propagating raw OBUs as extradata until all libavcodec modules
are adapted to handle AV1CodecConfigurationRecord formatted extradata.

Tested-by: Thomas Daede <bztdlinux@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2018-08-17 15:09:02 -03:00
James Almer
9703b7d05f avformat/matroskadec: reference the existing data buffer when creating packets
Newly allocated data buffers (wavpack, prores, compressed buffers)
are padded to meet the requirements of AVPacket.

About 10x speed up in matroska_parse_frame().

Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: James Almer <jamrial@gmail.com>
2018-04-06 21:14:50 -03:00
James Almer
b8e75a2a08 avformat/matroskadec: factor the prores packet parsing code out
Simplifies code in matroska_parse_frame(). This is in preparation for
the following patch.

Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: James Almer <jamrial@gmail.com>
2018-04-06 21:14:50 -03:00
James Almer
a61886650b avformat/matroskadec: use refcounted buffers in EbmlBin
Data in EbmlBin objects is never changed after being read from the
input file (save for two specific cases with encoded CodePrivate), so
using AVBufferRef we can prevent unnecessary copy of data by instead
creating new references to said constant data.

Signed-off-by: James Almer <jamrial@gmail.com>
2018-04-06 21:14:49 -03:00
James Almer
4f55b94663 avformat/matroskadec: address some more missing AVPacket frees
Fixes memleaks.

Signed-off-by: James Almer <jamrial@gmail.com>
2018-04-04 13:53:12 -03:00
James Almer
2f0e0deadc avformat/matroskadec: address a missing AVPacket free
Fixes memleaks.

Signed-off-by: James Almer <jamrial@gmail.com>
2018-04-04 10:54:14 -03:00
James Almer
78b96be758 avformat/matroskadec: use AVPacketList to queue packets
It's more robust and efficient.

Signed-off-by: James Almer <jamrial@gmail.com>
2018-04-04 00:15:38 -03:00
James Almer
f4f39582e7 avformat/matroskadec: fix return value
err is already an AVERROR.

Signed-off-by: James Almer <jamrial@gmail.com>
2018-02-20 10:26:21 -03:00
James Almer
88eb368f42 avformat/matroskadec: free the packet on webvtt side data allocation failure
Fixes potential memory leaks

Signed-off-by: James Almer <jamrial@gmail.com>
2018-02-20 10:25:54 -03:00
James Almer
acdea9e7c5 avformat/matroskadec: ignore CodecPrivate if the stream is VP9
Defined in a recent revision of https://www.webmproject.org/docs/container/

This prevents storing the contents of CodecPrivate into extradata for
a codec that doesn't need nor expect any. It will among other things
prevent matroska specific binary data from being dumped onto other
formats during remuxing.

Signed-off-by: James Almer <jamrial@gmail.com>
2018-02-19 22:13:13 -03:00
James Almer
63b5d04e33 avformat/matroskadec: force full frame parsing of MLP/TrueHD streams
There's at least one known file with a TrueHD stream that hasn't
been correctly muxed, and requires full frame parsing and repack.

Signed-off-by: James Almer <jamrial@gmail.com>
2018-01-29 23:09:08 -03:00
Marton Balint
18ac642359 avformat: migrate to AVFormatContext->url
Signed-off-by: Marton Balint <cus@passwd.hu>
2018-01-28 23:06:43 +01:00
Nikolas Bowe
e07649e618 avformat/matroskadec: Fix float-cast-overflow undefined behavior in matroska_parse_tracks()
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2018-01-19 22:33:31 +01:00
James Almer
f5c67c431e Merge commit '55fe72a841ba306370e68e86c88f34b4456aa4dd'
* commit '55fe72a841ba306370e68e86c88f34b4456aa4dd':
  matroskadec: don't warn about unknown spherical medata when none is present

Merged-by: James Almer <jamrial@gmail.com>
2017-11-12 00:46:52 -03:00
James Almer
55fe72a841 matroskadec: don't warn about unknown spherical medata when none is present
track->video.projection.type is set to 0 (a Matroska specific "No spherical
metadata present" value, with no related AVSphericalMapping) by default on
files without the element.

This removes bogus warnings on every single matroska file without Spherical
metadata.

Signed-off-by: James Almer <jamrial@gmail.com>
2017-11-03 19:41:08 -03:00
James Almer
68727a1dc0 Merge commit '251849f06ce36ce8dc076e0fca2922119fa7e39e'
* commit '251849f06ce36ce8dc076e0fca2922119fa7e39e':
  mkv: Add support for Spherical Video elements

See 445204cd57

Merged-by: James Almer <jamrial@gmail.com>
2017-10-17 16:47:30 -03:00
Steven Liu
cc25a887c5 avformat/matroskadec: fix resource leak
Fixes Coverity CID: 1405453

Reviewed-by: wm4 <nfxjfg@googlemail.com>
Reviewed-by: Hendrik Leppkes <h.leppkes@gmail.com>
Signed-off-by: Steven Liu <lq@chinaffmpeg.org>
2017-05-07 11:29:08 +08:00
James Almer
095147ae06 avformat/matroskadec: export Content Light Level metadata
Based on a patch by Hendrik Leppkes

Signed-off-by: James Almer <jamrial@gmail.com>
2017-04-30 20:04:50 -03:00
Derek Buitenhuis
6ba1c9bf7e webm_dash_manifest_demuxer: Fix initialization range for files with cues at the front
The WebM DASH spec states:
    The Initialization Segment shall not contain Clusters or Cues.
    The Segment Index corresponds to the Cues.

Previously, it included the cues if they were at the front.

Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2017-04-23 17:52:58 +01:00
Derek Buitenhuis
8e6b9ef473 webm_dash_manifest_demuxer: Fix UB in cue timestamp string code and make it actually work
Output was apparently not tested for correctness. Passing overlapping
memory to snprintf causes undefined behavior, and usually resulted in
only the very last timestamp being written to metadata, and not a list
at all.

Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2017-04-23 17:51:41 +01:00
James Zern
20aeee4fc9 matroskadec,cosmetics: fix a couple typos
Signed-off-by: James Zern <jzern@google.com>
2017-04-17 10:59:31 -07:00