FFmpeg

mirror of https://git.ffmpeg.org/ffmpeg.git synced 2024-10-07 02:03:25 +00:00

Author	SHA1	Message	Date
Michael Niedermayer	e387c9d5dd	Merge remote-tracking branch 'qatar/master' * qatar/master: (22 commits) rv40dsp x86: use only one register, for both increment and loop counter rv40dsp: implement prescaled versions for biweight. avconv: use default channel layouts when they are unknown avconv: parse channel layout string nutdec: K&R formatting cosmetics vda: Signal 4 byte NAL headers to the decoder regardless of what's in the extradata mem: Consistently return NULL for av_malloc(0) vf_overlay: implement poll_frame() vf_scale: support named constants for sws flags. lavc doxy: add all installed headers to doxy groups. lavc doxy: add avfft to the main lavc group. lavc doxy: add remaining avcodec.h functions to a misc doxygen group. lavc doxy: add AVPicture functions to a doxy group. lavc doxy: add resampling functions to a doxy group. lavc doxy: replace \ with / lavc doxy: add encoding functions to a doxy group. lavc doxy: add decoding functions to a doxy group. lavc doxy: fix formatting of AV_PKT_DATA_{PARAM_CHANGE,H263_MB_INFO} lavc doxy: add AVPacket-related stuff to a separate doxy group. lavc doxy: add core functions/definitions to a doxy group. ... Conflicts: ffmpeg.c libavcodec/avcodec.h libavcodec/vda.c libavcodec/x86/rv40dsp.asm libavfilter/vf_scale.c libavformat/nutdec.c libavutil/mem.c tests/ref/acodec/pcm_s24daud Merged-by: Michael Niedermayer <michaelni@gmx.at>	2012-04-10 22:53:25 +02:00
Christophe GISQUET	2130bd8f5b	rv40dsp x86: use only one register, for both increment and loop counter Around 10 cycles faster for luma. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2012-04-10 10:07:09 -07:00
Christophe GISQUET	272b252c01	rv40dsp: implement prescaled versions for biweight. Quite often, the original weights are multiple of 512. By prescaling them by 1/512 when they are computed (once per frame), no intermediate shifting is needed, and no prescaling on each call either. The x86 code already used that trick. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2012-04-10 10:06:48 -07:00
Reimar Döffinger	da1ba4e88b	Fix NASM compilation. movd needs explicit register size prefix for NASM. Signed-off-by: Reimar Döffinger <Reimar.Doeffinger@gmx.de>	2012-02-03 20:42:30 +01:00
Christophe Gisquet	e5c9de2ab7	rv40: x86 SIMD for biweight Provide MMX, SSE2 and SSSE3 versions, with a fast-path when the weights are multiples of 512 (which is often the case when the values round up nicely). *_TIMER report for the 16x16 and 8x8 cases: C: 9015 decicycles in 16, 524257 runs, 31 skips 2656 decicycles in 8, 524271 runs, 17 skips MMX: 4156 decicycles in 16, 262090 runs, 54 skips 1206 decicycles in 8, 262131 runs, 13 skips MMX on fast-path: 2760 decicycles in 16, 524222 runs, 66 skips 995 decicycles in 8, 524252 runs, 36 skips SSE2: 2163 decicycles in 16, 262131 runs, 13 skips 832 decicycles in 8, 262137 runs, 7 skips SSE2 with fast path: 1783 decicycles in 16, 524276 runs, 12 skips 711 decicycles in 8, 524283 runs, 5 skips SSSE3: 2117 decicycles in 16, 262136 runs, 8 skips 814 decicycles in 8, 262143 runs, 1 skips SSSE3 with fast path: 1315 decicycles in 16, 524285 runs, 3 skips 578 decicycles in 8, 524286 runs, 2 skips This means around a 4% speedup for some sequences. Signed-off-by: Diego Biurrun <diego@biurrun.de>	2012-01-30 23:58:25 +01:00

5 Commits