FFmpeg

mirror of https://git.ffmpeg.org/ffmpeg.git synced 2024-09-23 23:06:41 +00:00

Author	SHA1	Message	Date
Luca Barbato	e245d4f45c	dca: Validate the channel map Having a mismatch between the number of channels in the stream and those in the channel map will lead to a segfault or worse. Bug-Id: 1016 CC: libav-stable@libav.org Signed-off-by: Luca Barbato <lu_zero@gentoo.org>	2017-03-21 12:08:49 +01:00
Konda Raju	3df77b58e3	nvenc: Allow different const qps for I, P and B frames Signed-off-by: Luca Barbato <lu_zero@gentoo.org>	2017-03-21 12:06:44 +01:00
wm4	1a7ddba576	lavc: vdpau: add support for new hw_frames_ctx and hw_device_ctx API This supports retrieving the device from a provided hw_frames_ctx, and automatically creating a hw_frames_ctx if hw_device_ctx is set. The old API is not deprecated yet. The user can still use av_vdpau_bind_context() (with or without setting hw_frames_ctx), or use the API before that by allocating and setting hwaccel_context manually.	2017-03-20 23:15:43 +00:00
wm4	16a163b55a	lavc: Add hwaccel_flags field to AVCodecContext This "reuses" the flags introduced for the av_vdpau_bind_context() API function, and makes them available to all hwaccels. This does not affect the current vdpau API, as av_vdpau_bind_context() should obviously override the AVCodecContext.hwaccel_flags flags for the sake of compatibility.	2017-03-20 23:15:43 +00:00
Diego Biurrun	cfee5e1a0f	build: Add missing object dependency for extract_extradata bitstream filter	2017-03-20 13:16:51 +01:00
Martin Storsjö	7995ebfad1	arm/aarch64: vp9: Fix vertical alignment Align the second/third operands as they usually are. Due to the wildly varying sizes of the written out operands in aarch64 assembly, the column alignment is usually not as clear as in arm assembly. Signed-off-by: Martin Storsjö <martin@martin.st>	2017-03-16 23:09:00 +02:00
Diego Biurrun	681a86aba6	x86: fft: Port to cpuflags	2017-03-14 17:23:32 +01:00
Diego Biurrun	e9bb77fb10	x86: h264: Simplify DEQUANT macro with cpuflags	2017-03-14 17:23:32 +01:00
Diego Biurrun	307eb1a8ee	x86: vp8dsp: port FILTER_BILINEAR macro to cpuflags	2017-03-14 17:23:32 +01:00
Diego Biurrun	994c4bc107	x86util: Port all macros to cpuflags Also do some small cosmetic changes: Drop pointless _MMX suffix from ABSD2 macro name, drop pointless check for MMX support, we always assume MMX is available in our SIMD code, fix spelling.	2017-03-14 17:23:32 +01:00
Anton Khirnov	522d850e68	h264_cavlc: check the value of run_before Section 9.2.3.2 of the spec implies that run_before must not be larger than zeros_left. Fixes invalid reads with corrupted files. CC: libav-stable@libav.org Bug-Id: 1000 Found-By: Kamil Frankowicz	2017-03-12 20:42:13 +01:00
Anton Khirnov	83b2b34d06	h2645_parse: use the bytestream2 API for packet splitting The code does some nontrivial jumping around in the buffer, so it is safer to use a checked API rather than do everything manually. Fixes a bug in nalff parsing, where the length field is currently not counted in the buffer size check, resulting in possible overreads with invalid files. CC: libav-stable@libav.org Bug-Id: 1002 Found-By: Kamil Frankowicz	2017-03-12 20:42:12 +01:00
Anton Khirnov	b76f6a76c6	h264dec: initialize field_started to 0 on each decode call It might be incorrectly set to 1 if the previous call exited with an error. Bug-Id: 1019 CC: libav-stable@libav.org	2017-03-12 20:42:12 +01:00
Martin Storsjö	3a0d5e206d	arm/aarch64: vp9itxfm: Skip loading the min_eob pointer when it won't be used In the half/quarter cases where we don't use the min_eob array, defer loading the pointer until we know it will be needed. Signed-off-by: Martin Storsjö <martin@martin.st>	2017-03-11 22:07:30 +02:00
Martin Storsjö	98ee855ae0	arm: vp9itxfm: Template the quarter/half idct32 function This reduces the number of lines and reduces the duplication. Also simplify the eob check for the half case. If we are in the half case, we know we at least will need to do the first three slices, we only need to check eob for the fourth one, so we can hardcode the value to check against instead of loading from the min_eob array. Since at most one slice can be skipped in the first pass, we can unroll the loop for filling zeros completely, as it was done for the quarter case before. This allows skipping loading the min_eob pointer when using the quarter/half cases. Signed-off-by: Martin Storsjö <martin@martin.st>	2017-03-11 22:07:12 +02:00
Kieran Kunhya	5f794aa165	Add Cineform HD Decoder Decodes YUV 4:2:2 10-bit and RGB 12-bit files. Older files with more subbands, skips, Bayer, alpha not supported. Further fixes and refactorings by Anton Khirnov <anton@khirnov.net>, Diego Biurrun <diego@biurrun.de>, Vittorio Giovara <vittorio.giovara@gmail.com> Signed-off-by: Diego Biurrun <diego@biurrun.de>	2017-03-09 18:37:29 +01:00
Konda Raju	f6790b5e10	add initial QP value options Signed-off-by: Diego Biurrun <diego@biurrun.de>	2017-03-09 17:24:00 +01:00
wm4	8a60bba0ae	avcodec: clarify some decoding/encoding API details Make it clear that there is no timing-dependent behavior. In particular, there is no state in which both input and output are denied, and where you have to wait for a while yourself to make progress (apparently some hardware decoders like to do this). Avoid wording that makes references to time. It shouldn't be mistaken for some kind of asynchronous API (like POSIX read() can return EAGAIN if there is no new input yet). It's a state machine, so try to use appropriate terms. Signed-off-by: Diego Biurrun <diego@biurrun.de>	2017-03-09 17:07:24 +01:00
Vittorio Giovara	b44bd7ee7f	pixlet: Fix architecture-dependent code and values The constants used in the decoder used floating point precision, and this caused different values to be generated on different architectures. Additionally on big endian machines, the fate test would output bytes in native order, which is different from the one hardcoded in the test. So, eradicate floating point numbers and use fixed point (32.32) arithmetics everywhere, replacing constants with precomputed integer values, and force the pixel format output to be the same in the fate test. Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>	2017-03-06 18:15:02 -05:00
Diego Biurrun	6eef263aca	x86: Merge align directives into SECTION_RODATA declarations where possible	2017-03-05 14:26:06 +01:00
Ganapathy Kasi	3303f86467	nvenc: Remove qmin and qmax constraints for nvenc vbr qmin and qmax are not necessary for nvenc vbr. Also fix for using 2 pass vbr mode for slow preset through ctx->flag NVENC_TWO_PASSES. Signed-off-by: Luca Barbato <lu_zero@gentoo.org>	2017-03-04 08:23:28 +01:00
Paul B Mahol	aba5b94859	Add Apple Pixlet decoder Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>	2017-03-01 11:52:29 -05:00
Diego Biurrun	39e208f4d4	build: Generalize yasm/nasm-related variable names None of them are specific to the YASM assembler.	2017-03-01 10:18:15 +01:00
Diego Biurrun	fde7ee8710	x86: hevc: Add missing colons after assembly labels This fixes several warnings of the sort warning: label alone on a line without a colon might be in error	2017-03-01 09:23:42 +01:00
Michael Niedermayer	d7b2bb5391	h264_sei: Check actual presence of picture timing SEI message Signed-off-by: Michael Niedermayer <michael@niedermayer.cc> Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>	2017-02-28 10:32:50 -05:00
Ben Chang	d8f36a6aa3	nvenc: Fix the preset mapping list The map is a sparse array and does not need a empty element to terminate it. The empty element is stored after the last one inserted in the list, overwriting whichever element was next with zeros. Bug-Id: 1029 Signed-off-by: Luca Barbato <lu_zero@gentoo.org>	2017-02-28 11:54:02 +01:00
Anton Khirnov	984736dd9e	lavc: make sure not to return EAGAIN from codecs This error is treated specially by the API. CC: libav-stable@libav.org	2017-02-25 09:57:44 +01:00
Anton Khirnov	b2788fe934	svq3: fix the slice size check Currently it incorrectly compares bits with bytes. Also, move the check right before where it's relevant, so that the correct number of remaining bits is used. CC: libav-stable@libav.org	2017-02-25 09:57:43 +01:00
John Stebbins	248dc5c164	h264dec: fix dropped initial SEI recovery point	2017-02-24 08:24:13 -07:00
Martin Storsjö	b8f66c0838	aarch64: vp9itxfm: Reorder iadst16 coeffs This matches the order they are in the 16 bpp version. There they are in this order, to make sure we access them in the same order they are declared, easing loading only half of the coefficients at a time. This makes the 8 bpp version match the 16 bpp version better. Signed-off-by: Martin Storsjö <martin@martin.st>	2017-02-24 00:04:34 +02:00
Martin Storsjö	08074c092d	arm: vp9itxfm: Reorder iadst16 coeffs This matches the order they are in the 16 bpp version. There they are in this order, to make sure we access them in the same order they are declared, easing loading only half of the coefficients at a time. This makes the 8 bpp version match the 16 bpp version better. Signed-off-by: Martin Storsjö <martin@martin.st>	2017-02-24 00:04:33 +02:00
Martin Storsjö	09eb88a12e	aarch64: vp9itxfm: Reorder the idct coefficients for better pairing All elements are used pairwise, except for the first one. Previously, the 16th element was unused. Move the unused element to the second slot, to make the later element pairs not split across registers. This simplifies loading only parts of the coefficients, reducing the difference to the 16 bpp version. Signed-off-by: Martin Storsjö <martin@martin.st>	2017-02-24 00:04:32 +02:00
Martin Storsjö	de06bdfe6c	arm: vp9itxfm: Reorder the idct coefficients for better pairing All elements are used pairwise, except for the first one. Previously, the 16th element was unused. Move the unused element to the second slot, to make the later element pairs not split across registers. This simplifies loading only parts of the coefficients, reducing the difference to the 16 bpp version. Signed-off-by: Martin Storsjö <martin@martin.st>	2017-02-24 00:04:31 +02:00
Martin Storsjö	65aa002d54	aarch64: vp9itxfm: Avoid reloading the idct32 coefficients The idct32x32 function actually pushed d8-d15 onto the stack even though it didn't clobber them; there are plenty of registers that can be used to allow keeping all the idct coefficients in registers without having to reload different subsets of them at different stages in the transform. After this, we still can skip pushing d12-d15. Before: vp9_inv_dct_dct_32x32_sub32_add_neon: 8128.3 After: vp9_inv_dct_dct_32x32_sub32_add_neon: 8053.3 Signed-off-by: Martin Storsjö <martin@martin.st>	2017-02-24 00:03:44 +02:00
Martin Storsjö	402546a172	arm: vp9itxfm: Avoid reloading the idct32 coefficients The idct32x32 function actually pushed q4-q7 onto the stack even though it didn't clobber them; there are plenty of registers that can be used to allow keeping all the idct coefficients in registers without having to reload different subsets of them at different stages in the transform. Since the idct16 core transform avoids clobbering q4-q7 (but clobbers q2-q3 instead, to avoid needing to back up and restore q4-q7 at all in the idct16 function), and the lanewise vmul needs a register in the q0-q3 range, we move the stored coefficients from q2-q3 into q4-q5 while doing idct16. While keeping these coefficients in registers, we still can skip pushing q7. Before: Cortex A7 A8 A9 A53 vp9_inv_dct_dct_32x32_sub32_add_neon: 18553.8 17182.7 14303.3 12089.7 After: vp9_inv_dct_dct_32x32_sub32_add_neon: 18470.3 16717.7 14173.6 11860.8 Signed-off-by: Martin Storsjö <martin@martin.st>	2017-02-24 00:03:43 +02:00
Martin Storsjö	575e31e931	arm: vp9lpf: Implement the mix2_44 function with one single filter pass For this case, with 8 inputs but only changing 4 of them, we can fit all 16 input pixels into a q register, and still have enough temporary registers for doing the loop filter. The wd=8 filters would require too many temporary registers for processing all 16 pixels at once though. Before: Cortex A7 A8 A9 A53 vp9_loop_filter_mix2_v_44_16_neon: 289.7 256.2 237.5 181.2 After: vp9_loop_filter_mix2_v_44_16_neon: 221.2 150.5 177.7 138.0 Signed-off-by: Martin Storsjö <martin@martin.st>	2017-02-24 00:03:09 +02:00
Martin Storsjö	3bf9c48320	aarch64: vp9lpf: Use dup+rev16+uzp1 instead of dup+lsr+dup+trn1 This is one cycle faster in total, and three instructions fewer. Before: vp9_loop_filter_mix2_v_44_16_neon: 123.2 After: vp9_loop_filter_mix2_v_44_16_neon: 122.2 Signed-off-by: Martin Storsjö <martin@martin.st>	2017-02-24 00:03:00 +02:00
Martin Storsjö	c582cb8537	arm/aarch64: vp9lpf: Keep the comparison to E within 8 bit The theoretical maximum value of E is 193, so we can just saturate the addition to 255. Before: Cortex A7 A8 A9 A53 A53/AArch64 vp9_loop_filter_v_4_8_neon: 143.0 127.7 114.8 88.0 87.7 vp9_loop_filter_v_8_8_neon: 241.0 197.2 173.7 140.0 136.7 vp9_loop_filter_v_16_8_neon: 497.0 419.5 379.7 293.0 275.7 vp9_loop_filter_v_16_16_neon: 965.2 818.7 731.4 579.0 452.0 After: vp9_loop_filter_v_4_8_neon: 136.0 125.7 112.6 84.0 83.0 vp9_loop_filter_v_8_8_neon: 234.0 195.5 171.5 136.0 133.7 vp9_loop_filter_v_16_8_neon: 490.0 417.5 377.7 289.0 271.0 vp9_loop_filter_v_16_16_neon: 951.2 814.7 732.3 571.0 446.7 Signed-off-by: Martin Storsjö <martin@martin.st>	2017-02-24 00:02:36 +02:00
Diego Biurrun	ed6a891c36	Place attribute_deprecated in the right position for struct declarations libavcodec/vaapi.h:58:1: warning: attribute 'deprecated' is ignored, place it after "struct" to apply attribute to type declaration [-Wignored-attributes]	2017-02-23 12:23:20 +01:00
Diego Biurrun	00b160af11	nvenc: Fix nvec vs. nvenc typo	2017-02-20 09:50:03 +01:00
Mark Thompson	7cb9296db8	webp: Fix alpha decoding This was broken by `4e528206bc` - the webp decoder was assuming that it could set the output pixfmt of the vp8 decoder directly, but after that change it no longer could because ff_get_format() was used instead. This adds an internal get_format() callback to webp use of the vp8 decoder to override the pixfmt appropriately.	2017-02-18 19:53:20 +00:00
Mark Thompson	17aeee5832	vaapi_encode: Discard output buffer if picture submission fails Previously this was leaking, though it actually hit an assert making sure that the buffer had already been cleared when freeing the picture.	2017-02-16 20:58:42 +00:00
Martin Storsjö	030de53e9c	libopenh264dec: Let the framework use the h264_mp4toannexb bitstream filter This avoids a lot of boilerplate code within the decoder wrapper itself. Signed-off-by: Martin Storsjö <martin@martin.st>	2017-02-15 23:05:58 +02:00
Mark Thompson	5dd9a4b88b	vaapi: Implement device-only setup In this case, the user only supplies a device and the frame context is allocated internally by lavc.	2017-02-13 21:44:43 +00:00
Mark Thompson	44f2eda39f	lavc: Add device context field to AVCodecContext For use by codec implementations which can allocate frames internally.	2017-02-13 20:14:27 +00:00
Martin Storsjö	07b5136c48	aarch64: vp9lpf: Fix broken indentation/vertical alignment Signed-off-by: Martin Storsjö <martin@martin.st>	2017-02-12 21:57:23 +02:00
Martin Storsjö	b0806088d3	aarch64: vp9lpf: Interleave the start of flat8in into the calculation above This adds lots of extra .ifs, but speeds it up by a couple cycles, by avoiding stalls. Signed-off-by: Martin Storsjö <martin@martin.st>	2017-02-11 22:54:18 +02:00
Martin Storsjö	e18c39005a	arm: vp9lpf: Interleave the start of flat8in into the calculation above This adds lots of extra .ifs, but speeds it up by a couple cycles, by avoiding stalls. Signed-off-by: Martin Storsjö <martin@martin.st>	2017-02-11 22:54:01 +02:00
Luca Barbato	9c2d36fcaf	dv: Convert to the new bitstream reader	2017-02-11 20:29:44 +01:00
Luca Barbato	ba30b74686	aac: Validate the sbr sample rate before using the value Avoid a floating point exception. Bug-Id: 1027 CC: libav-stable@libav.org	2017-02-11 20:23:11 +01:00
Anton Khirnov	f44ec22e09	lavc: use av_cpu_max_align() instead of hardcoding alignment requirements	2017-02-11 11:37:45 +01:00
Martin Storsjö	435cd7bc99	arm: vp9lpf: Use orrs instead of orr+cmp Signed-off-by: Martin Storsjö <martin@martin.st>	2017-02-11 00:44:04 +02:00
Martin Storsjö	e1f9de86f4	arm/aarch64: vp9lpf: Calculate !hev directly Previously we first calculated hev, and then negated it. Since we were able to schedule the negation in the middle of another calculation, we don't see any gain in all cases. Before: Cortex A7 A8 A9 A53 A53/AArch64 vp9_loop_filter_v_4_8_neon: 147.0 129.0 115.8 89.0 88.7 vp9_loop_filter_v_8_8_neon: 242.0 198.5 174.7 140.0 136.7 vp9_loop_filter_v_16_8_neon: 500.0 419.5 382.7 293.0 275.7 vp9_loop_filter_v_16_16_neon: 971.2 825.5 731.5 579.0 453.0 After: vp9_loop_filter_v_4_8_neon: 143.0 127.7 114.8 88.0 87.7 vp9_loop_filter_v_8_8_neon: 241.0 197.2 173.7 140.0 136.7 vp9_loop_filter_v_16_8_neon: 497.0 419.5 379.7 293.0 275.7 vp9_loop_filter_v_16_16_neon: 965.2 818.7 731.4 579.0 452.0 Signed-off-by: Martin Storsjö <martin@martin.st>	2017-02-11 00:43:59 +02:00
Martin Storsjö	3fcf788fbb	aarch64: vp9itxfm: Optimize 16x16 and 32x32 idct dc by unrolling This work is sponsored by, and copyright, Google. Before: Cortex A53 vp9_inv_dct_dct_16x16_sub1_add_neon: 235.3 vp9_inv_dct_dct_32x32_sub1_add_neon: 555.1 After: vp9_inv_dct_dct_16x16_sub1_add_neon: 180.2 vp9_inv_dct_dct_32x32_sub1_add_neon: 475.3 Signed-off-by: Martin Storsjö <martin@martin.st>	2017-02-11 00:31:58 +02:00
Martin Storsjö	a76bf8cf12	arm: vp9itxfm: Optimize 16x16 and 32x32 idct dc by unrolling This work is sponsored by, and copyright, Google. Before: Cortex A7 A8 A9 A53 vp9_inv_dct_dct_16x16_sub1_add_neon: 273.0 189.5 211.7 235.8 vp9_inv_dct_dct_32x32_sub1_add_neon: 752.0 459.2 862.2 553.9 After: vp9_inv_dct_dct_16x16_sub1_add_neon: 226.5 145.0 225.1 171.8 vp9_inv_dct_dct_32x32_sub1_add_neon: 721.2 415.7 727.6 475.0 Signed-off-by: Martin Storsjö <martin@martin.st>	2017-02-11 00:31:52 +02:00
Martin Storsjö	388e0d2515	aarch64: vp9mc: Calculate less unused data in the 4 pixel wide horizontal filter No measured speedup on a Cortex A53, but other cores might benefit. Signed-off-by: Martin Storsjö <martin@martin.st>	2017-02-11 00:08:50 +02:00
Martin Storsjö	fea92a4b57	arm: vp9mc: Calculate less unused data in the 4 pixel wide horizontal filter Before: Cortex A7 A8 A9 A53 vp9_put_8tap_smooth_4h_neon: 378.1 273.2 340.7 229.5 After: vp9_put_8tap_smooth_4h_neon: 352.1 222.2 290.5 229.5 Signed-off-by: Martin Storsjö <martin@martin.st>	2017-02-11 00:08:37 +02:00
Martin Storsjö	5e0c2158fb	aarch64: vp9mc: Simplify the extmla macro parameters Fold the field lengths into the macro. This makes the macro invocations much more readable, when the lines are shorter. This also makes it easier to use only half the registers within the macro. Signed-off-by: Martin Storsjö <martin@martin.st>	2017-02-11 00:08:29 +02:00
Martin Storsjö	bc25897630	utvideodec: Add a missing include This was missing from `77c23704c7`, fixing building. Signed-off-by: Martin Storsjö <martin@martin.st>	2017-02-10 09:31:49 +02:00
Timo Rothenpieler	a52976c0fe	nvenc: make gpu indices independent of supported capabilities Do not allocate a CUDA context for every available gpu. Signed-off-by: Luca Barbato <lu_zero@gentoo.org>	2017-02-09 23:29:32 +01:00
Derek Buitenhuis	77c23704c7	avcodec: Mark some codecs with threadsafe init as such Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com> Signed-off-by: Luca Barbato <lu_zero@gentoo.org>	2017-02-09 23:28:18 +01:00
Martin Storsjö	0c0b87f12d	aarch64: vp9itxfm: Fix incorrect vertical alignment Signed-off-by: Martin Storsjö <martin@martin.st>	2017-02-09 23:57:06 +02:00
Martin Storsjö	8476eb0d3a	aarch64: vp9itxfm: Update a comment to refer to a register with a different name Signed-off-by: Martin Storsjö <martin@martin.st>	2017-02-09 23:57:02 +02:00
Martin Storsjö	3dd7827258	aarch64: vp9itxfm: Use the right lane sizes in 8x8 for improved readability Signed-off-by: Martin Storsjö <martin@martin.st>	2017-02-09 23:56:59 +02:00
Martin Storsjö	ed8d293306	aarch64: vp9itxfm: Use a single lane ld1 instead of ld1r where possible The ld1r is a leftover from the arm version, where this trick is beneficial on some cores. Use a single-lane load where we don't need the semantics of ld1r. Signed-off-by: Martin Storsjö <martin@martin.st>	2017-02-09 23:56:54 +02:00
Martin Storsjö	4da4b2b87f	aarch64: vp9itxfm: Share instructions for loading idct coeffs in the 8x8 function Signed-off-by: Martin Storsjö <martin@martin.st>	2017-02-09 23:56:50 +02:00
Martin Storsjö	3933b86bb9	arm: vp9itxfm: Share instructions for loading idct coeffs in the 8x8 function Signed-off-by: Martin Storsjö <martin@martin.st>	2017-02-09 23:56:44 +02:00
Martin Storsjö	a63da4511d	aarch64: vp9itxfm: Do separate functions for half/quarter idct16 and idct32 This work is sponsored by, and copyright, Google. This avoids loading and calculating coefficients that we know will be zero, and avoids filling the temp buffer with zeros in places where we know the second pass won't read. This gives a pretty substantial speedup for the smaller subpartitions. The code size increases from 14740 bytes to 24292 bytes. The idct16/32_end macros are moved above the individual functions; the instructions themselves are unchanged, but since new functions are added at the same place where the code is moved from, the diff looks rather messy. Before: vp9_inv_dct_dct_16x16_sub1_add_neon: 236.7 vp9_inv_dct_dct_16x16_sub2_add_neon: 1051.0 vp9_inv_dct_dct_16x16_sub4_add_neon: 1051.0 vp9_inv_dct_dct_16x16_sub8_add_neon: 1051.0 vp9_inv_dct_dct_16x16_sub12_add_neon: 1387.4 vp9_inv_dct_dct_16x16_sub16_add_neon: 1387.6 vp9_inv_dct_dct_32x32_sub1_add_neon: 554.1 vp9_inv_dct_dct_32x32_sub2_add_neon: 5198.5 vp9_inv_dct_dct_32x32_sub4_add_neon: 5198.6 vp9_inv_dct_dct_32x32_sub8_add_neon: 5196.3 vp9_inv_dct_dct_32x32_sub12_add_neon: 6183.4 vp9_inv_dct_dct_32x32_sub16_add_neon: 6174.3 vp9_inv_dct_dct_32x32_sub20_add_neon: 7151.4 vp9_inv_dct_dct_32x32_sub24_add_neon: 7145.3 vp9_inv_dct_dct_32x32_sub28_add_neon: 8119.3 vp9_inv_dct_dct_32x32_sub32_add_neon: 8118.7 After: vp9_inv_dct_dct_16x16_sub1_add_neon: 236.7 vp9_inv_dct_dct_16x16_sub2_add_neon: 640.8 vp9_inv_dct_dct_16x16_sub4_add_neon: 639.0 vp9_inv_dct_dct_16x16_sub8_add_neon: 842.0 vp9_inv_dct_dct_16x16_sub12_add_neon: 1388.3 vp9_inv_dct_dct_16x16_sub16_add_neon: 1389.3 vp9_inv_dct_dct_32x32_sub1_add_neon: 554.1 vp9_inv_dct_dct_32x32_sub2_add_neon: 3685.5 vp9_inv_dct_dct_32x32_sub4_add_neon: 3685.1 vp9_inv_dct_dct_32x32_sub8_add_neon: 3684.4 vp9_inv_dct_dct_32x32_sub12_add_neon: 5312.2 vp9_inv_dct_dct_32x32_sub16_add_neon: 5315.4 vp9_inv_dct_dct_32x32_sub20_add_neon: 7154.9 vp9_inv_dct_dct_32x32_sub24_add_neon: 7154.5 vp9_inv_dct_dct_32x32_sub28_add_neon: 8126.6 vp9_inv_dct_dct_32x32_sub32_add_neon: 8127.2 Signed-off-by: Martin Storsjö <martin@martin.st>	2017-02-09 12:32:03 +02:00
Martin Storsjö	5eb5aec475	arm: vp9itxfm: Do a simpler half/quarter idct16/idct32 when possible This work is sponsored by, and copyright, Google. This avoids loading and calculating coefficients that we know will be zero, and avoids filling the temp buffer with zeros in places where we know the second pass won't read. This gives a pretty substantial speedup for the smaller subpartitions. The code size increases from 12388 bytes to 19784 bytes. The idct16/32_end macros are moved above the individual functions; the instructions themselves are unchanged, but since new functions are added at the same place where the code is moved from, the diff looks rather messy. Before: Cortex A7 A8 A9 A53 vp9_inv_dct_dct_16x16_sub1_add_neon: 273.0 189.5 212.0 235.8 vp9_inv_dct_dct_16x16_sub2_add_neon: 2102.1 1521.7 1736.2 1265.8 vp9_inv_dct_dct_16x16_sub4_add_neon: 2104.5 1533.0 1736.6 1265.5 vp9_inv_dct_dct_16x16_sub8_add_neon: 2484.8 1828.7 2014.4 1506.5 vp9_inv_dct_dct_16x16_sub12_add_neon: 2851.2 2117.8 2294.8 1753.2 vp9_inv_dct_dct_16x16_sub16_add_neon: 3239.4 2408.3 2543.5 1994.9 vp9_inv_dct_dct_32x32_sub1_add_neon: 758.3 456.7 864.5 553.9 vp9_inv_dct_dct_32x32_sub2_add_neon: 10776.7 7949.8 8567.7 6819.7 vp9_inv_dct_dct_32x32_sub4_add_neon: 10865.6 8131.5 8589.6 6816.3 vp9_inv_dct_dct_32x32_sub8_add_neon: 12053.9 9271.3 9387.7 7564.0 vp9_inv_dct_dct_32x32_sub12_add_neon: 13328.3 10463.2 10217.0 8321.3 vp9_inv_dct_dct_32x32_sub16_add_neon: 14176.4 11509.5 11018.7 9062.3 vp9_inv_dct_dct_32x32_sub20_add_neon: 15301.5 12999.9 11855.1 9828.2 vp9_inv_dct_dct_32x32_sub24_add_neon: 16482.7 14931.5 12650.1 10575.0 vp9_inv_dct_dct_32x32_sub28_add_neon: 17589.5 15811.9 13482.8 11333.4 vp9_inv_dct_dct_32x32_sub32_add_neon: 18696.2 17049.2 14355.6 12089.7 After: vp9_inv_dct_dct_16x16_sub1_add_neon: 273.0 189.5 211.7 235.8 vp9_inv_dct_dct_16x16_sub2_add_neon: 1203.5 998.2 1035.3 763.0 vp9_inv_dct_dct_16x16_sub4_add_neon: 1203.5 998.1 1035.5 760.8 vp9_inv_dct_dct_16x16_sub8_add_neon: 1926.1 1610.6 1722.1 1271.7 vp9_inv_dct_dct_16x16_sub12_add_neon: 2873.2 2129.7 2285.1 1757.3 vp9_inv_dct_dct_16x16_sub16_add_neon: 3221.4 2520.3 2557.6 2002.1 vp9_inv_dct_dct_32x32_sub1_add_neon: 753.0 457.5 866.6 554.6 vp9_inv_dct_dct_32x32_sub2_add_neon: 7554.6 5652.4 6048.4 4920.2 vp9_inv_dct_dct_32x32_sub4_add_neon: 7549.9 5685.0 6046.9 4925.7 vp9_inv_dct_dct_32x32_sub8_add_neon: 8336.9 6704.5 6604.0 5478.0 vp9_inv_dct_dct_32x32_sub12_add_neon: 10914.0 9777.2 9240.4 7416.9 vp9_inv_dct_dct_32x32_sub16_add_neon: 11859.2 11223.3 9966.3 8095.1 vp9_inv_dct_dct_32x32_sub20_add_neon: 15237.1 13029.4 11838.3 9829.4 vp9_inv_dct_dct_32x32_sub24_add_neon: 16293.2 14379.8 12644.9 10572.0 vp9_inv_dct_dct_32x32_sub28_add_neon: 17424.3 15734.7 13473.0 11326.9 vp9_inv_dct_dct_32x32_sub32_add_neon: 18531.3 17457.0 14298.6 12080.0 Signed-off-by: Martin Storsjö <martin@martin.st>	2017-02-09 12:32:00 +02:00
Martin Storsjö	79d332ebbd	aarch64: vp9itxfm: Move the load_add_store macro out from the itxfm16 pass2 function This allows reusing the macro for a separate implementation of the pass2 function. Signed-off-by: Martin Storsjö <martin@martin.st>	2017-02-09 12:31:56 +02:00
Martin Storsjö	47b3c2c18d	arm: vp9itxfm: Move the load_add_store macro out from the itxfm16 pass2 function This allows reusing the macro for a separate implementation of the pass2 function. Signed-off-by: Martin Storsjö <martin@martin.st>	2017-02-09 12:31:53 +02:00
Martin Storsjö	115476018d	aarch64: vp9itxfm: Make the larger core transforms standalone functions This work is sponsored by, and copyright, Google. This reduces the code size of libavcodec/aarch64/vp9itxfm_neon.o from 19496 to 14740 bytes. This gives a small slowdown of a couple of tens of cycles, but makes it more feasible to add more optimized versions of these transforms. Before: vp9_inv_dct_dct_16x16_sub4_add_neon: 1036.7 vp9_inv_dct_dct_16x16_sub16_add_neon: 1372.2 vp9_inv_dct_dct_32x32_sub4_add_neon: 5180.0 vp9_inv_dct_dct_32x32_sub32_add_neon: 8095.7 After: vp9_inv_dct_dct_16x16_sub4_add_neon: 1051.0 vp9_inv_dct_dct_16x16_sub16_add_neon: 1390.1 vp9_inv_dct_dct_32x32_sub4_add_neon: 5199.9 vp9_inv_dct_dct_32x32_sub32_add_neon: 8125.8 Signed-off-by: Martin Storsjö <martin@martin.st>	2017-02-09 12:31:45 +02:00
Martin Storsjö	0331c3f5e8	arm: vp9itxfm: Make the larger core transforms standalone functions This work is sponsored by, and copyright, Google. This reduces the code size of libavcodec/arm/vp9itxfm_neon.o from 15324 to 12388 bytes. This gives a small slowdown of a couple tens of cycles, up to around 150 cycles for the full case of the largest transform, but makes it more feasible to add more optimized versions of these transforms. Before: Cortex A7 A8 A9 A53 vp9_inv_dct_dct_16x16_sub4_add_neon: 2063.4 1516.0 1719.5 1245.1 vp9_inv_dct_dct_16x16_sub16_add_neon: 3279.3 2454.5 2525.2 1982.3 vp9_inv_dct_dct_32x32_sub4_add_neon: 10750.0 7955.4 8525.6 6754.2 vp9_inv_dct_dct_32x32_sub32_add_neon: 18574.0 17108.4 14216.7 12010.2 After: vp9_inv_dct_dct_16x16_sub4_add_neon: 2060.8 1608.5 1735.7 1262.0 vp9_inv_dct_dct_16x16_sub16_add_neon: 3211.2 2443.5 2546.1 1999.5 vp9_inv_dct_dct_32x32_sub4_add_neon: 10682.0 8043.8 8581.3 6810.1 vp9_inv_dct_dct_32x32_sub32_add_neon: 18522.4 17277.4 14286.7 12087.9 Signed-off-by: Martin Storsjö <martin@martin.st>	2017-02-09 12:31:40 +02:00
Martin Storsjö	57ec83e424	omx: Use the EOS flag to handle flushing at the end This avoids having to count the number of frames sent to the codec and the number of output packets received; instead just wait until the encoder returns a buffer with the EOS flag set. Signed-off-by: Martin Storsjö <martin@martin.st>	2017-02-08 11:50:57 +02:00
Diego Biurrun	a25dac976a	Use bitstream_init8() where appropriate	2017-02-07 18:27:21 +01:00
Alexandra Hájková	f7ec7f546f	wma: Convert to the new bitstream reader	2017-02-06 15:13:34 +01:00
Martin Storsjö	58d87e0f49	aarch64: vp9itxfm: Restructure the idct32 store macros This avoids concatenation, which can't be used if the whole macro is wrapped within another macro. This is also arguably more readable. Signed-off-by: Martin Storsjö <martin@martin.st>	2017-02-05 13:05:32 +02:00
Martin Storsjö	3bc5b28d5a	arm: vp9itxfm: Avoid .irp when it doesn't save any lines This makes it more readable. Signed-off-by: Martin Storsjö <martin@martin.st>	2017-02-05 12:59:19 +02:00
Diego Biurrun	7abdd026df	asm: Consistently uppercase SECTION markers	2017-02-03 11:37:53 +01:00
Alexandra Hájková	c29da01ac9	svq3: Convert to the new bitstream reader	2017-02-02 17:06:17 +01:00
wm4	577326d430	lavc: deprecate refcounted_frames field No deprecation guards, because the old decode API (for which this field is needed) doesn't have any either. This field should be removed together with the old decode calls. Signed-off-by: Anton Khirnov <anton@khirnov.net>	2017-02-01 10:47:46 +01:00
Anton Khirnov	fd9212f2ed	Mark some arrays that never change as const.	2017-02-01 10:42:59 +01:00
Alexandra Hájková	ab2539bd37	ffv1: Convert to the new bitstream reader	2017-01-31 17:54:11 +01:00
Alexandra Hájková	2d72219554	h261dec: Convert to the new bitstream reader	2017-01-31 17:54:11 +01:00
Alexandra Hájková	2b94ed12de	shorten: Convert to the new bitstream reader	2017-01-31 17:54:11 +01:00
Alexandra Hájková	5a6da49dd0	ralf: Convert to the new bitstream reader	2017-01-31 17:54:11 +01:00
Alexandra Hájková	d85b37a955	loco: Convert to the new bitstream reader	2017-01-31 17:54:10 +01:00
Alexandra Hájková	0f94de8a09	fic: Convert to the new bitstream reader	2017-01-31 17:54:10 +01:00
Alexandra Hájková	6b1f559f9a	dirac: Convert to the new bitstream reader	2017-01-31 17:54:10 +01:00
Alexandra Hájková	ffc00df0a6	cavs: Convert to the new bitstream reader	2017-01-31 17:54:10 +01:00
Alexandra Hájková	0c89ff82e9	aic: Convert to the new bitstream reader	2017-01-31 17:54:10 +01:00
Diego Biurrun	d4c2103bd3	golomb: Convert to the new bitstream reader	2017-01-31 17:46:19 +01:00
Andreas Cadhalpun	612cc07128	pgssubdec: reset rle_data_len/rle_remaining_len on allocation error The code relies on their validity and otherwise can try to access a NULL object->rle pointer, causing segmentation faults. Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com> Signed-off-by: Diego Biurrun <diego@biurrun.de>	2017-01-31 09:35:54 +01:00
Mark Thompson	ca62236a89	vaapi_encode: Add VP8 support	2017-01-30 23:03:46 +00:00
Mark Thompson	ff35aa8ca4	vaapi_encode: Pass framerate parameters to driver Only do this when building for a recent VAAPI version - initial driver implementations were confused about the interpretation of the framerate field, but hopefully this will be consistent everywhere once 0.40.0 is released.	2017-01-30 22:52:54 +00:00
Mark Thompson	eddfb57210	vaapi_h264: Enable VBR mode Default to using VBR when a target bitrate is set, unless the max rate is also set and matches the target. Changes to the Intel driver mean that min_qp is also respected in this case, so set a codec default to unset the value rather than using the current default inherited from the MPEG-4 part 2 encoder.	2017-01-30 22:52:54 +00:00
Mark Thompson	f033ba470f	vaapi_encode: Support VBR mode This includes a backward-compatibility hack to choose CBR anyway on old drivers which have no CBR support, so that existing programs will continue to work their options now map to VBR.	2017-01-30 22:52:54 +00:00
Mark Thompson	ca6ae3b77a	vaapi_encode: Add MPEG-2 support	2017-01-29 13:28:31 +00:00
Alexandra Hájková	381a4e31a6	tak: Convert to the new bitstream reader	2017-01-25 11:06:58 +01:00
Diego Biurrun	2e0e150144	magicyuv: Convert to the new bitstream reader	2017-01-25 10:38:43 +01:00
Diego Biurrun	b061f298f7	truemotion2rt: Convert to the new bitstream reader	2017-01-25 09:55:36 +01:00
Alexandra Hájková	e7f24c9ffc	wavpack: Convert to the new bitstream reader	2017-01-25 09:55:35 +01:00
Alexandra Hájková	6668bc80b5	mpc: Convert to the new bitstream reader	2017-01-25 09:55:33 +01:00
Alexandra Hájková	fd8de7f2d8	dxtory: Convert to the new bitstream reader	2017-01-20 10:18:32 +01:00
Alexandra Hájková	4d49a4c550	apedec: Convert to the new bitstream reader	2017-01-20 10:18:32 +01:00
Anton Khirnov	b4a911c189	mpegvideoenc: make a table const	2017-01-19 09:52:21 +01:00
Anton Khirnov	296eff4d9d	zmbvenc: get rid of a global table	2017-01-19 09:52:10 +01:00
Derek Buitenhuis	00b775dda2	hevc: Mark as having threadsafe init Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com> Signed-off-by: Anton Khirnov <anton@khirnov.net>	2017-01-19 09:51:15 +01:00
Alexandra Hájková	54dcd22885	als: Convert to the new bitstream reader	2017-01-17 09:52:11 +01:00
Luca Barbato	fb59f87ce7	nvenc: Explicitly push the cuda context on encoding Make sure that NVENC does not misbehave if other cuda usages happen in the application.	2017-01-17 07:37:12 +01:00
Alexandra Hájková	4795e4f61f	alac: Convert to the new bitstream reader	2017-01-13 10:27:03 +01:00
Luca Barbato	f8f7ad758d	qsv: Set the correct range for la_depth Setting an invalid range for it makes the encoder behave inconsistently.	2017-01-13 08:42:10 +01:00
Anton Khirnov	1202b71269	theora: export cropping information instead of handling it internally	2017-01-12 16:29:17 +01:00
Anton Khirnov	c3e84820d6	h264dec: export cropping information instead of handling it internally	2017-01-12 16:29:12 +01:00
Anton Khirnov	4fded0480f	h264dec: be more explicit in handling container cropping The current condition can trigger in cases where it shouldn't, with unexpected results. Make sure that: - container cropping is really based on the original dimensions from the caller - those dimenions are discarded on size change The code is still quite hacky and eventually should be deprecated and removed, with the decision about which cropping is used delegated to the caller.	2017-01-12 16:28:05 +01:00
Anton Khirnov	a02ae1c683	hevcdec: export cropping information instead of handling it internally	2017-01-12 16:27:56 +01:00
Anton Khirnov	019ab88a95	lavc: add an option for exporting cropping information to the caller Also, add generic code for handling cropping, so the decoders can export just the cropping size and not bother with the rest.	2017-01-12 16:24:15 +01:00
Anton Khirnov	b68e353136	qsvdec: do not sync PIX_FMT_QSV surfaces Introducing enforced sync points in arbitrary places is bad for performance. Since the vast majority of receiving code (QSV VPP or encoders, retrieving frames through hwcontext) will do the syncing, this change should not be visible to most callers. But bumping micro just in case. This is also consistent with what VAAPI hwaccel does.	2017-01-12 16:21:39 +01:00
Steve Lhomme	ac3c3ee678	dxva2: allow an empty array of ID3D11VideoDecoderOutputView We can pick the correct slice index directly from the ID3D11VideoDecoderOutputView casted from data[3]. Signed-off-by: Anton Khirnov <anton@khirnov.net>	2017-01-12 16:19:13 +01:00
Steve Lhomme	f67235a28c	dxva2: get the slice number directly from the surface in D3D11VA No need to loop through the known surfaces, we'll use the requested surface anyway. The loop is only done for DXVA2. Signed-off-by: Anton Khirnov <anton@khirnov.net>	2017-01-12 16:09:41 +01:00
Mark Thompson	89725a8512	vaapi_h264: Scale log2_max_pic_order_cnt_lsb with max_b_frames Before this change, it was possible to overflow pic_order_cnt_lsb and generate a stream with invalid POC numbering. This makes sure that the field is large enough that a single IDR B* P sequence uses fewer than half the available POC lsb values.	2017-01-11 23:03:58 +00:00
Mark Thompson	a3c3a5eac2	vaapi_encode: Support forcing IDR frames via AVFrame.pict_type	2017-01-11 23:03:58 +00:00
Mark Thompson	37fab0661a	vaapi_encode: Fix GOP sizing This change makes the configured GOP size be respected exactly - previously the value could be exceeded slightly due to flaws in the frame type selection logic.	2017-01-11 23:03:58 +00:00
Alexandra Hájková	bd6496fa07	interplayvideo: Convert to the new bitstream reader	2017-01-09 15:21:47 +01:00
Alexandra Hájková	4e25051031	adx: Convert to the new bitstream reader	2017-01-09 15:21:47 +01:00
Alexandra Hájková	9aec009f65	dvbsubdec: Convert to the new bitstream reader	2017-01-09 15:21:47 +01:00
Alexandra Hájková	d7fe11634c	motionpixels: Convert to the new bitstream reader	2017-01-09 15:18:16 +01:00
Anton Khirnov	f1af37b510	h264dec: make ff_h264_decode_init() static It is not called from outside h264dec.c anymore.	2017-01-09 13:21:13 +01:00
Anton Khirnov	e7de05f98f	h264dec: drop a redundant check Cropping parameters are already checked for validity during SPS parsing, no need to check them again.	2017-01-09 13:21:13 +01:00
Steve Lhomme	2835e9a9fd	hevcdec: add P010 support for D3D11VA Given it's the same API than DVXA2 I don't know why the same output was not enabled for both. Signed-off-by: Anton Khirnov <anton@khirnov.net>	2017-01-09 10:48:54 +01:00
Steve Lhomme	0ac2d86c47	dxva2: Factorize DXVA context validity test into a single macro Signed-off-by: Diego Biurrun <diego@biurrun.de>	2017-01-08 16:41:24 +01:00
Steve Lhomme	f8a42d4f26	dxva2: Make ff_dxva2_get_surface() static and drop its name prefix Signed-off-by: Diego Biurrun <diego@biurrun.de>	2017-01-08 16:41:07 +01:00
Jun Zhao	9b1db2d338	vaapi_h264: Fix POC on IDR frames In H.264 section 8.2.1, we have that "The bitstream shall not contain data that result in Min(TopFieldOrderCnt, BottomFieldOrderCnt) not equal to 0 for a coded IDR frame". This fixes the encoder to always conform to this - previously the POC values formed an unbroken sequence, not resetting to zero on IDR frames. Signed-off-by: Mark Thompson <sw@jkqxz.net>	2017-01-04 21:52:06 +00:00
Mark Thompson	d08e02d929	vaapi_h265: Fix build failure with old libva without 10-bit surfaces 10-bit surface support was added in libva 1.6.2, earlier versions support H.265 encoding in 8-bit only.	2017-01-04 21:49:41 +00:00
Martin Storsjö	85ad5ea72c	aarch64: vp9mc: Fix a comment to refer to a register with the right name Signed-off-by: Martin Storsjö <martin@martin.st>	2017-01-03 14:16:10 +02:00
Martin Storsjö	65074791e8	aarch64: vp9dsp: Fix vertical alignment in the init file Signed-off-by: Martin Storsjö <martin@martin.st>	2017-01-03 14:15:58 +02:00
Martin Storsjö	c536e5e869	arm: vp9mc: Fix vertical alignment of operands Signed-off-by: Martin Storsjö <martin@martin.st>	2017-01-03 14:15:45 +02:00
Diego Biurrun	53618054b6	parser: Add missing #include for printing ISO C99 conversion specifiers	2016-12-25 13:22:50 +01:00
Diego Biurrun	0b77a59336	Use correct printf conversion specifiers for POSIX integer types	2016-12-23 19:30:00 +01:00
Diego Biurrun	92db508307	build: Generate pkg-config files from Make and not from configure This moves work from the configure to the Make stage where it can be parallelized and ensures that pkgconfig files are updated when library versions change. Bug-Id: 449	2016-12-22 12:30:54 +01:00
Diego Biurrun	f9edc734e0	ratecontrol: Drop xvid-rc-related struct members unused after `a6901b9c6`	2016-12-21 11:13:20 +01:00
Ruta Gadkari	5b26d3b789	nvenc: Update check for lookahead By default it is -1. Signed-off-by: Luca Barbato <lu_zero@gentoo.org>	2016-12-21 06:16:52 +01:00
Martin Storsjö	a0c443a398	aarch64: vp9itxfm: Use the offset parameter to movrel This fixes build failures for iOS, broken since `cad42fadcd`. Signed-off-by: Martin Storsjö <martin@martin.st>	2016-12-19 22:49:51 +02:00
Alexandra Hájková	fc322d6a70	tta: Convert to the new bitstream reader	2016-12-19 13:52:36 +01:00
Alexandra Hájková	00c72a1e01	mlp: Convert to the new bitstream reader	2016-12-19 13:22:29 +01:00
Alexandra Hájková	fa64aea12e	unary: Convert to the new bitstream reader	2016-12-19 12:35:05 +01:00
Anton Khirnov	45286a625c	h264dec: make sure to only end a field if it has been started Calling ff_h264_field_end() when the per-field state is not properly initialized leads to all kinds of undefined behaviour. CC: libav-stable@libav.org Bug-Id: 977 978 992	2016-12-19 08:15:58 +01:00
Anton Khirnov	c2fa6bb0e8	mpeg12dec: move setting first_field to mpeg_field_start() For field picture, the first_field is set based on its previous value. Before this commit, first_field is set when reading the picture coding extension. However, in corrupted files there may be multiple picture coding extension headers, so the final value of first_field that is actually used during decoding can be wrong. That can lead to various undefined behaviour, like predicting from a non-existing field. Fix this problem, by setting first_field in mpeg_field_start(), which should be called exactly once per field. CC: libav-stable@libav.org Bug-ID: 999	2016-12-19 08:15:49 +01:00
Anton Khirnov	e807491fc6	mpeg12dec: avoid signed overflow in bitrate calculation CC: libav-stable@libav.org Bug-Id: 981 Found-By: Agostino Sarubbo	2016-12-19 08:15:42 +01:00
Anton Khirnov	58405de095	mpegvideo_parser: avoid signed overflow in bitrate calculation CC: libav-stable@libav.org Bug-Id: 981 Found-By: Agostino Sarubbo	2016-12-19 08:15:07 +01:00

1 2 3 4 5 ...

21573 Commits