FFmpeg

mirror of https://git.ffmpeg.org/ffmpeg.git synced 2024-09-21 13:56:55 +00:00

Author	SHA1	Message	Date
Martin Storsjö	a67ae67083	arm: vp9: Add NEON itxfm routines This work is sponsored by, and copyright, Google. For the transforms up to 8x8, we can fit all the data (including temporaries) in registers and just do a straightforward transform of all the data. For 16x16, we do a transform of 4x16 pixels in 4 slices, using a temporary buffer. For 32x32, we transform 4x32 pixels at a time, in two steps of 4x16 pixels each. Examples of relative speedup compared to the C version, from checkasm: Cortex A7 A8 A9 A53 vp9_inv_adst_adst_4x4_add_neon: 3.39 5.83 4.17 4.01 vp9_inv_adst_adst_8x8_add_neon: 3.79 4.86 4.23 3.98 vp9_inv_adst_adst_16x16_add_neon: 3.33 4.36 4.11 4.16 vp9_inv_dct_dct_4x4_add_neon: 4.06 6.16 4.59 4.46 vp9_inv_dct_dct_8x8_add_neon: 4.61 6.01 4.98 4.86 vp9_inv_dct_dct_16x16_add_neon: 3.35 3.44 3.36 3.79 vp9_inv_dct_dct_32x32_add_neon: 3.89 3.50 3.79 4.42 vp9_inv_wht_wht_4x4_add_neon: 3.22 5.13 3.53 3.77 Thus, the speedup vs C code is around 3-6x. This is mostly marginally faster than the corresponding routines in libvpx on most cores, tested with their 32x32 idct (compared to vpx_idct32x32_1024_add_neon). These numbers are slightly in libvpx's favour since their version doesn't clear the input buffer like ours do (although the effect of that on the total runtime probably is negligible.) Cortex A7 A8 A9 A53 vp9_inv_dct_dct_32x32_add_neon: 18436.8 16874.1 14235.1 11988.9 libvpx vpx_idct32x32_1024_add_neon 20789.0 13344.3 15049.9 13030.5 Only on the Cortex A8, the libvpx function is faster. On the other cores, ours is slightly faster even though ours has got source block clearing integrated. Signed-off-by: Martin Storsjö <martin@martin.st>	2016-11-11 11:09:05 +02:00
Ronald S. Bultje	0b37cd09a6	checkasm: add vp9dsp.itxfm_add tests. This includes fixes by Henrik Gramner. The forward transforms are derived from the reference encoder. Signed-off-by: Martin Storsjö <martin@martin.st>	2016-11-11 11:09:05 +02:00
Mark Thompson	fd0fae6037	pthread_frame: Unreference hw_frames_ctx on per-thread codec contexts When decoding with threads enabled, the get_format callback will be called with one of the per-thread codec contexts rather than with the outer context. If a hwaccel is in use too, this will add a reference to the hardware frames context on that codec context, which will then propagate to all of the other per-thread contexts for decoding. Once the decoder finishes, however, the per-thread contexts are not freed normally, so these references leak.	2016-11-10 20:36:11 +00:00
Martin Storsjö	11623217e3	arm: vp9mc: Use a different helper register for PIC loads This fixes crashes since `557c1675cf` in linux PIC builds. Previously, movrelx silently used r12 as helper register, which doesn't work when r12 is the destination register. Signed-off-by: Martin Storsjö <martin@martin.st>	2016-11-10 14:01:04 +02:00
Martin Storsjö	824e8c2840	arm: Clear the gp register alias at the end of functions We reset .Lpic_gp to zero at the start of each function, which means that the logic within movrelx for clearing gp when necessary will be missed. This fixes using movrelx in different functions with a different helper register. Signed-off-by: Martin Storsjö <martin@martin.st>	2016-11-10 14:01:04 +02:00
Diego Biurrun	905cdcaa9d	examples/decode_audio: Add missing header for av_free()	2016-11-10 10:33:19 +01:00
Martin Storsjö	6a62795d40	aarch64: h264idct: Use the offset parameter to movrel Signed-off-by: Martin Storsjö <martin@martin.st>	2016-11-10 11:18:22 +02:00
Martin Storsjö	557c1675cf	arm: vp9mc: Minor adjustments from review of the aarch64 version This work is sponsored by, and copyright, Google. The speedup for the large horizontal filters is surprisingly big on A7 and A53, while there's a minor slowdown (almost within measurement noise) on A8 and A9. Cortex A7 A8 A9 A53 orig: vp9_put_8tap_smooth_64h_neon: 20270.0 14447.3 19723.9 10910.9 new: vp9_put_8tap_smooth_64h_neon: 20165.8 14466.5 19730.2 10668.8 Signed-off-by: Martin Storsjö <martin@martin.st>	2016-11-10 11:18:22 +02:00
Martin Storsjö	383d96aa22	aarch64: vp9: Add NEON optimizations of VP9 MC functions This work is sponsored by, and copyright, Google. These are ported from the ARM version; it is essentially a 1:1 port with no extra added features, but with some hand tuning (especially for the plain copy/avg functions). The ARM version isn't very register starved to begin with, so there's not much to be gained from having more spare registers here - we only avoid having to clobber callee-saved registers. Examples of runtimes vs the 32 bit version, on a Cortex A53: ARM AArch64 vp9_avg4_neon: 27.2 23.7 vp9_avg8_neon: 56.5 54.7 vp9_avg16_neon: 169.9 167.4 vp9_avg32_neon: 585.8 585.2 vp9_avg64_neon: 2460.3 2294.7 vp9_avg_8tap_smooth_4h_neon: 132.7 125.2 vp9_avg_8tap_smooth_4hv_neon: 478.8 442.0 vp9_avg_8tap_smooth_4v_neon: 126.0 93.7 vp9_avg_8tap_smooth_8h_neon: 241.7 234.2 vp9_avg_8tap_smooth_8hv_neon: 690.9 646.5 vp9_avg_8tap_smooth_8v_neon: 245.0 205.5 vp9_avg_8tap_smooth_64h_neon: 11273.2 11280.1 vp9_avg_8tap_smooth_64hv_neon: 22980.6 22184.1 vp9_avg_8tap_smooth_64v_neon: 11549.7 10781.1 vp9_put4_neon: 18.0 17.2 vp9_put8_neon: 40.2 37.7 vp9_put16_neon: 97.4 99.5 vp9_put32_neon/armv8: 346.0 307.4 vp9_put64_neon/armv8: 1319.0 1107.5 vp9_put_8tap_smooth_4h_neon: 126.7 118.2 vp9_put_8tap_smooth_4hv_neon: 465.7 434.0 vp9_put_8tap_smooth_4v_neon: 113.0 86.5 vp9_put_8tap_smooth_8h_neon: 229.7 221.6 vp9_put_8tap_smooth_8hv_neon: 658.9 621.3 vp9_put_8tap_smooth_8v_neon: 215.0 187.5 vp9_put_8tap_smooth_64h_neon: 10636.7 10627.8 vp9_put_8tap_smooth_64hv_neon: 21076.8 21026.9 vp9_put_8tap_smooth_64v_neon: 9635.0 9632.4 These are generally about as fast as the corresponding ARM routines on the same CPU (at least on the A53), in most cases marginally faster. The speedup vs C code is pretty much the same as for the 32 bit case; on the A53 it's around 6-13x for ther larger 8tap filters. The exact speedup varies a little, since the C versions generally don't end up exactly as slow/fast as on 32 bit. Signed-off-by: Martin Storsjö <martin@martin.st>	2016-11-10 11:15:56 +02:00
Martin Storsjö	c44a8a3eab	aarch64: Add an offset parameter to the movrel macro With apple tools, the linker fails with errors like these, if the offset is negative: ld: in section __TEXT,__text reloc 8: symbol index out of range for architecture arm64 Signed-off-by: Martin Storsjö <martin@martin.st>	2016-11-10 11:06:08 +02:00
Martin Storsjö	a4cfcddcb0	vp9: Make the subpel filters non-static Make them aligned, to allow efficient access to them from simd. Signed-off-by: Martin Storsjö <martin@martin.st>	2016-11-10 11:05:57 +02:00
James Almer	98cae966c7	matroskaenc: write updated STREAMINFO metadata for FLAC streams if available FLAC streams originating from the FLAC encoder send updated and more complete STREAMINFO metadata as part of the last packet, so write that to CodecPrivate instead of the incomplete one available in extradata during init. Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-11-10 09:15:24 +01:00
James Almer	f4bf236338	matroskaenc: fix muxing AAC streams when using aac_adtstoasc bsf aac_adtstoasc makes the aac extradata available only after the first packet is filtered, and as packet side data. Assume extradata will be available as part of the first packet if avpriv_mpeg4audio_get_config() fails the first time due to missing extradata and reserve space for the OutputSampleRate element in the Tracks master. Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-11-10 09:01:18 +01:00
Anton Khirnov	84f225684c	pthread_frame: properly propagate the hw frame context across frame threads	2016-11-10 09:00:11 +01:00
Diego Biurrun	72a19f4013	mpegaudiodsp: aarch64: Adjust function prototype after `2caa93b813`	2016-11-10 00:13:48 +01:00
Diego Biurrun	2dd464868c	configure: Move license checks directly after command line parsing This will allow to error out immediately if incompatible options are passed on the command line instead of running time-consuming tests.	2016-11-09 20:51:56 +01:00
Diego Biurrun	c78495d1cd	configure: Log name and parameters of all helper functions where it makes sense	2016-11-09 20:51:56 +01:00
Diego Biurrun	8a6e7a67cb	configure: Use check_cpp in CPP flags tests	2016-11-09 20:51:56 +01:00
Diego Biurrun	831005b230	configure: Log correct test name and use correct filter when testing objective C flags	2016-11-09 20:51:56 +01:00
Diego Biurrun	fe7bc1f16a	configure: Do not unconditionally check for (and enable) xlib This avoids unnecessarily linking against xlib.	2016-11-09 20:51:55 +01:00
Diego Biurrun	d1a91ebe49	configure: Print list of enabled programs Also drop a related and now redundant informative output line.	2016-11-09 20:51:55 +01:00
Diego Biurrun	576c9003ae	configure: Improve output wording Also drop a redundant output line.	2016-11-09 20:51:55 +01:00
Diego Biurrun	a3483f7993	avconv: Drop stray leftover debug output	2016-11-09 20:51:55 +01:00
Diego Biurrun	67deba8a41	Use avpriv_report_missing_feature() where appropriate	2016-11-08 17:54:34 +01:00
Diego Biurrun	59d2b00d20	configure: Add --quiet command line parameter to suppress informative output	2016-11-08 17:32:57 +01:00
Diego Biurrun	4537647c04	fate: checkasm: Split monolithic test into individual components	2016-11-08 17:32:25 +01:00
Diego Biurrun	9498237049	checkasm: Add --test parameter to check only specific components Inspired by a patch from Martin Storsjö <martin@martin.st>.	2016-11-08 17:32:25 +01:00
Vittorio Giovara	de6e2ff3dd	mov: Read multiple stsd from DV Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>	2016-11-08 11:22:29 -05:00
Vittorio Giovara	47a795727f	hevc: Support extradata changes from multiple stsd Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>	2016-11-08 11:22:29 -05:00
Vittorio Giovara	2fe30b4743	hevc: Allow parsing external extradata buffers	2016-11-08 11:22:29 -05:00
Vittorio Giovara	5be2153111	hevc: Move hevc_decode_extradata before frame decoding Avoids a forward-declaration in the following commit. Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>	2016-11-08 11:22:29 -05:00
Vittorio Giovara	bed2c4b265	lavc: Add hevc main10 profile to avconv cli	2016-11-08 11:22:29 -05:00
Vittorio Giovara	17dac56b8f	lavu: Rename ycgco color space appropriately Planes are ordered as the name suggests now. Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>	2016-11-08 11:22:29 -05:00
Diego Biurrun	0361e4dcb4	h264_qpel: x86: Move function with only one instance out of template macro libavcodec/x86/h264_qpel.c:392:785: warning: unused function 'ff_avg_h264_qpel8or16_hv1_lowpass_mmxext' [-Wunused-function]	2016-11-08 17:21:02 +01:00
Diego Biurrun	88f0cf8cd3	avplay: Correct function pointer assignments in options array avplay.c:2928:5: warning: ISO C forbids initialization between function pointer and ‘void *’ [-Wpedantic]	2016-11-08 17:20:30 +01:00
Diego Biurrun	943533d64c	avconv: Correct function pointer assignments in options array Fixes several warnings of the type avconv_opt.c:2356:5: warning: ISO C forbids initialization between function pointer and ‘void *’ [-Wpedantic]	2016-11-08 16:48:41 +01:00
Andreas Cadhalpun	43de8b328b	lzf: update pointer p after realloc This fixes heap-use-after-free detected by AddressSanitizer. Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com> Signed-off-by: Luca Barbato <lu_zero@gentoo.org>	2016-11-07 22:42:00 +01:00
Luca Barbato	ab839054e6	swscale: Add GRAY12	2016-11-07 22:42:00 +01:00
Luca Barbato	7471352f19	pixfmt: Add GRAY12	2016-11-07 22:42:00 +01:00
Anton Khirnov	4ab61cd983	qsv{enc,dec}: extend the internal frame allocator Handle the internal frame requests, which is required by the HEVC encoding plugin. Signed-off-by: Maxym Dmytrychenko <maxym.dmytrychenko@intel.com>	2016-11-07 12:48:00 +01:00
Anton Khirnov	00aeedd841	qsv{dec,enc}: use a struct as a memory id with internal memory allocator This will allow implementing the allocator more fully, which is needed by the HEVC encoder plugin with video memory input. Signed-off-by: Maxym Dmytrychenko <maxym.dmytrychenko@intel.com>	2016-11-07 12:47:54 +01:00
Anton Khirnov	404e51478e	qsv{dec,enc}: always use an internal mfxFrameSurface1 For encoding, this avoids modifying the input surface, which we are not allowed to do. This will also be useful in the following commits. Signed-off-by: Maxym Dmytrychenko <maxym.dmytrychenko@intel.com>	2016-11-07 12:47:46 +01:00
Anton Khirnov	e8bbacbf52	hwcontext_qsv: support frame mapping Signed-off-by: Maxym Dmytrychenko <maxym.dmytrychenko@intel.com>	2016-11-07 12:47:40 +01:00
Anton Khirnov	8ea15afbf2	hwcontext_qsv: transfer data through the child context when VPP fails Uploading/downloading data through VPP may not work for some formats, in that case we can still try to call av_hwframe_transfer_data() on the child context. Signed-off-by: Maxym Dmytrychenko <maxym.dmytrychenko@intel.com>	2016-11-07 12:47:33 +01:00
Anton Khirnov	b91ce48600	hwcontext_qsv: do not fail when download/upload VPP session creation fails Certain pixel formats (e.g. P8) might not be supported for download/upload through VPP operations, but can still be used otherwise. Signed-off-by: Maxym Dmytrychenko <maxym.dmytrychenko@intel.com>	2016-11-07 12:47:26 +01:00
Anton Khirnov	b115a35ea6	hwcontext_qsv: add support for the P8 format When using GPU surfaces with QSV, one needs to supply a frame allocator, which will be invoked to pass surface pools to libmfx. For encoding, this allocator gets invoked not only for the pool of input frames, but also for a separate pool of (apparently) reconstructed frames and another pool of MFX_FOURCC_P8, which on Windows needs to return D3DFMT_P8 D3D surfaces. Those are probably used to store the encoded bitstream on the GPU. Signed-off-by: Maxym Dmytrychenko <maxym.dmytrychenko@intel.com>	2016-11-07 12:47:20 +01:00
Anton Khirnov	10065d9324	hwcontext_dxva2: add support for the P8 format This format is used internally by the QSV encoder to store the encoded bitstream. Signed-off-by: Maxym Dmytrychenko <maxym.dmytrychenko@intel.com>	2016-11-07 12:47:14 +01:00
Anton Khirnov	9109737654	hwcontext_dxva2: frame mapping support Signed-off-by: Maxym Dmytrychenko <maxym.dmytrychenko@intel.com>	2016-11-07 12:46:59 +01:00
Hendrik Leppkes	fabfbfe571	dxva2: fix surface selection when compiled with both d3d11va and dxva2 Fixes a regression introduced in `be630b1e08` Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-11-07 10:05:12 +01:00
Derek Buitenhuis	db0b3dccb3	libx265: Add option to force IDR frames This is in the same the same vein as `380146924e`. Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com> Signed-off-by: Martin Storsjö <martin@martin.st>	2016-11-07 10:16:10 +02:00

... 4 5 6 7 8 ...

44217 Commits