FFmpeg

mirror of https://git.ffmpeg.org/ffmpeg.git synced 2024-10-05 20:43:26 +00:00

Author	SHA1	Message	Date
Diego Biurrun	a734fa575f	Remove disabled non-optimized code variants.	2011-04-29 20:01:13 +02:00
Michael Niedermayer	52a81cd0e4	Fix add_paeth_prediction_mmx for rgb48 Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2011-04-27 20:08:37 +02:00
Michael Niedermayer	afd2371d5c	merge read and and in add_paeth_prediction Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2011-04-27 20:08:37 +02:00
Baptiste Coudurier	6d4c49a2af	Move png mmx functions into x86/png_mmx.c, remove them from DSPContext. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2011-04-27 20:08:09 +02:00
Michael Niedermayer	d7e5aebae7	Merge remote branch 'qatar/master' * qatar/master: (23 commits) ac3enc: correct the flipped sign in the ac3_fixed encoder Eliminate pointless '#if 1' statements without matching '#else'. Add AVX FFT implementation. Increase alignment of av_malloc() as needed by AVX ASM. Update x86inc.asm from x264 to allow AVX emulation using SSE and MMX. mjpeg: Detect overreads in mjpeg_decode_scan() and error out. documentation: extend documentation for ffmpeg -aspect option APIChanges: update commit hashes for recent additions. lavc: deprecate FF__TYPE macros in favor of AV_PICTURE_TYPE_ enums aac: add headers needed for log2f() lavc: remove FF_API_MB_Q cruft lavc: remove FF_API_RATE_EMU cruft lavc: remove FF_API_HURRY_UP cruft pad: make the filter parametric vsrc_movie: add key_frame and pict_type. vsrc_movie: fix leak in request_frame() lavfi: add key_frame and pict_type to AVFilterBufferRefVideo. vsrc_buffer: add sample_aspect_ratio fields to arguments. lavfi: add fieldorder filter scale: make the filter parametric ... Conflicts: Changelog doc/filters.texi ffmpeg.c libavcodec/ac3dec.h libavcodec/dsputil.c libavfilter/avfilter.h libavfilter/vf_scale.c libavfilter/vf_yadif.c libavfilter/vsrc_buffer.c Merged-by: Michael Niedermayer <michaelni@gmx.at>	2011-04-27 03:51:04 +02:00
Vitor Sessak	9d35fa520e	Add AVX FFT implementation. Signed-off-by: Reinhard Tartler <siretart@tauware.de>	2011-04-26 18:25:24 +02:00
Vitor Sessak	33cbfa6fa3	Update x86inc.asm from x264 to allow AVX emulation using SSE and MMX. Signed-off-by: Reinhard Tartler <siretart@tauware.de>	2011-04-26 18:18:22 +02:00
Carl Eugen Hoyos	5c0068758f	Fix compilation with --disable-yasm.	2011-04-12 17:40:18 +02:00
Oskar Arvidsson	8dbe585641	Adds 8-, 9- and 10-bit versions of some of the functions used by the h264 decoder. This patch lets e.g. dsputil_init chose dsp functions with respect to the bit depth to decode. The naming scheme of bit depth dependent functions is <base name>_<bit depth>[_<prefix>] (i.e. the old clear_blocks_c is now named clear_blocks_8_c). Note: Some of the functions for high bit depth is not dependent on the bit depth, but only on the pixel size. This leaves some room for optimizing binary size. Preparatory patch for high bit depth h264 decoding support. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2011-04-10 22:33:42 +02:00
Michael Niedermayer	3c8493074b	Merge remote-tracking branch 'newdev/master' * newdev/master: dsputil: allow to skip drawing of top/bottom edges. Split fate-psx-str-v3 into a video-only and audio-only test. Conflicts: libavcodec/dsputil.c libavcodec/mpegvideo.c libavcodec/snow.c libavcodec/x86/dsputil_mmx.c Merged-by: Michael Niedermayer <michaelni@gmx.at>	2011-03-27 01:40:18 +01:00
Alexander Strange	1500be13f2	dsputil: allow to skip drawing of top/bottom edges.	2011-03-26 17:45:38 -04:00
Michael Niedermayer	2fd41c9067	Merge remote-tracking branch 'newdev/master' * newdev/master: avio: make udp_set_remote_url/get_local_port internal. asfdec: also subtract preroll when reading simple index object matroskaenc: remove a variable that's unused after `bc17bd9`. avio: cosmetics - nicer vertical alignment. Remove unnecessary icc version checks Disable 'attribute "foo" ignored' warnings from icc rtsp: Don't use a locale dependent format string Add xd55 codec tag for XDCAM HD422 720p25 CBR files. configure: get libavcodec version from new version.h header lavc: move the version macros to a new installed header. matroskaenc: simplify get_aac_sample_rates by using ff_mpeg4audio_get_config Do not use format string "%0.3f" for RTSP Range field. Add apply_window_int16() to DSPContext with x86-optimized versions and use it in the ac3_fixed encoder. Document usage of import libraries created by dlltool configure: Set the correct lib target for arm/wince dlltool fate: simplify regression-funcs.sh fate: add support for multithread testing Conflicts: libavformat/rtspdec.c libavutil/attributes.h libavutil/internal.h libavutil/mem.h Merged-by: Michael Niedermayer <michaelni@gmx.at>	2011-03-24 02:16:11 +01:00
Justin Ruggles	e6e9823488	Add apply_window_int16() to DSPContext with x86-optimized versions and use it in the ac3_fixed encoder.	2011-03-22 21:08:30 -04:00
Michael Niedermayer	d375c10400	Fake-Merge remote-tracking branch 'ffmpeg-mt/master'	2011-03-22 22:36:57 +01:00
Michael Niedermayer	d4a50a2100	Merge remote-tracking branch 'newdev/master' Merged-by: Michael Niedermayer <michaelni@gmx.at>	2011-03-21 03:33:28 +01:00
Mans Rullgard	0aded9484d	Move dct and rdft definitions to separate files This leaves fft.h with only the core FFT and MDCT definitions thus making it more managable. Signed-off-by: Mans Rullgard <mans@mansr.com>	2011-03-20 17:15:33 +00:00
Mans Rullgard	2912e87a6c	Replace FFmpeg with Libav in licence headers Signed-off-by: Mans Rullgard <mans@mansr.com>	2011-03-19 13:33:20 +00:00
Justin Ruggles	0f999cfddb	ac3enc: add float_to_fixed24() with x86-optimized versions to AC3DSPContext and use in scale_coefficients() for the floating-point AC-3 encoder.	2011-03-17 16:46:48 -04:00
Justin Ruggles	79414257e2	mathops: fix MULL() when the compiler does not inline the function. If the function is not inlined, an immmediate cannot be used for the shift parameter, so the %cl register must be used instead in that case. This fixes compilation for x86-32 using gcc with --disable-optimizations.	2011-03-15 20:49:37 -04:00
Justin Ruggles	aaff3b312e	mathops: change "g" constraint to "rm" in x86-32 version of MUL64(). The 1-arg imul instruction cannot take an immediate argument, only a register or memory argument.	2011-03-15 13:43:47 -04:00
Justin Ruggles	b181b8fb96	mathops: convert MULL/MULH/MUL64 to inline functions rather than macros. This fixes unexpected name collisions that were occurring with variables declared within the macros. It also fixes the fate-acodec-ac3_fixed regression test on x86-32.	2011-03-15 13:43:47 -04:00
Justin Ruggles	f1efbca5e9	ac3enc: add SIMD-optimized shifting functions for use with the fixed-point AC3 encoder.	2011-03-14 08:45:31 -04:00
Mans Rullgard	a5444fee06	Add CONFIG_AC3DSP symbol to simplify makefiles Signed-off-by: Mans Rullgard <mans@mansr.com>	2011-03-12 11:35:26 +00:00
Ronald S. Bultje	bf6fa73245	dsputil_mmx.c: remove ff_vector128. Remove ff_vector128, it is identical to ff_pb_80.	2011-02-19 10:51:15 -05:00
Ronald S. Bultje	12802ec060	dsputil: move VC1-specific stuff into VC1DSPContext.	2011-02-17 17:35:35 -05:00
Justin Ruggles	1f004fc512	ac3dsp: Change punpckhqdq to movhlps in ac3_max_msb_abs_int16(). Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2011-02-16 14:08:34 -05:00
Justin Ruggles	fbb6b49dab	ac3enc: Add x86-optimized function to speed up log2_tab(). AC3DSPContext.ac3_max_msb_abs_int16() finds the maximum MSB of the absolute value of each element in an array of int16_t. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2011-02-13 16:49:39 -05:00
Loren Merritt	e6b1ed693a	FFT: factor a shuffle out of the inner loop and merge it into fft_permute. 6% faster SSE FFT on Conroe, 2.5% on Penryn. Signed-off-by: Janne Grunau <janne-ffmpeg@jannau.net>	2011-02-13 15:36:39 +01:00
Justin Ruggles	dda3f0ef48	Add x86-optimized versions of exponent_min(). Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2011-02-10 15:32:47 -05:00
Ronald S. Bultje	17cf7c68ed	Fix ff_emu_edge_core_sse() on Win64. Fix emu_edge_v_extend_15 to be <128 bytes on Win64, by being more strict on the size of registers and which registers are being used for operations where multiple are available. This fixes segfaults in emulated_edge() function calls on Win64.	2011-02-08 18:25:12 -05:00
Justin Ruggles	c73d99e672	Separate format conversion DSP functions from DSPContext. This will be beneficial for use with the audio conversion API without requiring it to depend on all of dsputil. Signed-off-by: Mans Rullgard <mans@mansr.com>	2011-02-02 02:44:53 +00:00
Alex Converse	770c410fbb	Fix ff_imdct_calc_sse() on gcc-4.6 Gcc 4.6 only preserves the first value when using an array with an "m" constraint. Signed-off-by: Mans Rullgard <mans@mansr.com>	2011-02-02 02:40:05 +00:00
Ronald S. Bultje	81f2a3f4ff	Implement a SIMD version of emulated_edge_mc() for x86. From ~550 cycles (C version) to 170 (SSE/x86-64), 206 (MMX/x86-32) and 196 (SSE2/x86-32) cycles.	2011-01-31 20:55:56 -05:00
Justin Ruggles	d19b744a36	cosmetics: indentation Signed-off-by: Mans Rullgard <mans@mansr.com>	2011-01-31 20:30:15 +00:00
Justin Ruggles	80ba1ddb58	Remove unneeded add bias from 3 functions. DSPContext.vector_fmul_window() DCADSPContext.lfe_fir() SynthFilterContext.synth_filter_float() Signed-off-by: Mans Rullgard <mans@mansr.com>	2011-01-31 20:28:42 +00:00
Mans Rullgard	80944df720	x86: fix overflow in h264 8x8 planar prediction Signed-off-by: Mans Rullgard <mans@mansr.com>	2011-01-24 23:24:28 +00:00
Justin Ruggles	6eabb0d3ad	Change DSPContext.vector_fmul() from dst=dstsrc to dest=src0src1. Signed-off-by: Mans Rullgard <mans@mansr.com>	2011-01-22 17:53:27 +00:00
Justin Ruggles	1c189fc533	cosmetics related to LPC changes. Signed-off-by: Mans Rullgard <mans@mansr.com>	2011-01-21 19:59:08 +00:00
Justin Ruggles	77a78e9bdc	Separate window function from autocorrelation. Signed-off-by: Mans Rullgard <mans@mansr.com>	2011-01-21 19:59:08 +00:00
Justin Ruggles	56f8952b25	Move lpc_compute_autocorr() from DSPContext to a new struct LPCContext. Signed-off-by: Mans Rullgard <mans@mansr.com>	2011-01-21 19:58:59 +00:00
Ronald S. Bultje	b9c7f66e6d	Fix horizontal/horizontal_up 8x8l intra prediction x86/simd functions. The original functions did not work correctly for edge pixels, e.g. when CODEC_FLAG_EMU_EDGE is set, leading to corrupt output in e.g. VLC. Based on a patch by Daniel Kang <daniel d kang gmail com>. Signed-off-by: Ronald S. Bultje <rsbultje gmail com>	2011-01-19 20:34:42 -05:00
Mans Rullgard	ef4a65149d	Replace ASMALIGN() with .p2align This macro has unconditionally used .p2align for a long time and serves no useful purpose.	2011-01-18 20:48:24 +00:00
Mans Rullgard	ac3c9d0169	x86: remove VLA in ac3_downmix_sse	2011-01-18 20:48:24 +00:00
Janne Grunau	2c3589bfda	consolidate .gitignore patters into a single file Signed-off-by: Janne Grunau <janne-ffmpeg@jannau.net>	2011-01-18 21:32:05 +01:00
Janne Grunau	348b8218f7	convert svn:ignore properties to .gitignore files Signed-off-by: Janne Grunau <janne-ffmpeg@jannau.net>	2011-01-17 15:50:14 +01:00
Ronald S. Bultje	1b3e43e4fd	Fix overflow in pred16x16_plane x86 simd code. Fixes issue 2547. Originally committed as revision 26381 to svn://svn.ffmpeg.org/ffmpeg/trunk	2011-01-15 22:00:44 +00:00
Ronald S. Bultje	ec3233a855	Fix ff_pw_3 alignment. Originally committed as revision 26344 to svn://svn.ffmpeg.org/ffmpeg/trunk	2011-01-14 23:26:34 +00:00
Jason Garrett-Glaser	19fb234e4a	H.264: split luma dc idct out and implement MMX/SSE2 versions About 2.5x the speed. NOTE: the way that the asm code handles large qmuls is a bit suboptimal. If x264-style dequant was used (separate shift and qmul values), it might be possible to get some extra speed. Originally committed as revision 26336 to svn://svn.ffmpeg.org/ffmpeg/trunk	2011-01-14 21:34:25 +00:00
Daniel Kang	004357a11f	Fix compilation on x86-32 with --disable-optimizations, fixes issue 2127. Patch by Daniel Kang, daniel.d.kang at gmail Originally committed as revision 26204 to svn://svn.ffmpeg.org/ffmpeg/trunk	2011-01-03 11:30:04 +00:00
Daniel Kang	0790caba60	Fix invalid reads in valgrind fate, patch by Daniel Kang <daniel dot d dot kang at gmail com>, as part of Google's GCI 2010. Originally committed as revision 26177 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-12-31 01:29:06 +00:00
Daniel Kang	536e9b2f58	Port pred8x8l_down_left_mmxext (H.264 intra prediction) from x264 (authors: Jason, Loren, Holger) to FFmpeg. Patch by Daniel Kang <daniel dot d dot kang at gmail com>, as part of Google's GCI 2010. Originally committed as revision 26162 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-12-29 23:48:44 +00:00
Daniel Kang	720ea2d5b2	Port pred4x4_down_right_mmxext (H.264 intra prediction) from x264 (authors: Jason, Loren, Holger) to FFmpeg. Patch by Daniel Kang <daniel dot d dot kang at gmail com>, as part of Google's GCI 2010. Originally committed as revision 26159 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-12-29 21:55:51 +00:00
Daniel Kang	d0aebe23e2	Port pred4x4_vertical_right_mmxext (H.264 intra prediction) from x264 (authors: Jason, Loren, Holger) to FFmpeg. Patch by Daniel Kang <daniel dot d dot kang at gmail com>, as part of Google's GCI 2010. Originally committed as revision 26158 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-12-29 21:52:41 +00:00
Daniel Kang	76497232ef	Port pred4x4_horizontal_down_mmxext (H.264 intra prediction) from x264 (authors:Jason, Loren, Holger) to FFmpeg. Patch by Daniel Kang <daniel dot d dot kang at gmail com>, as part of Google's GCI 2010. Originally committed as revision 26157 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-12-29 21:49:57 +00:00
Daniel Kang	e9c576a467	Port pred4x4_horizontal_up_mmxext (H.264 intra prediction) from x264 (authors: Jason, Loren, Holger) to FFmpeg. Patch by Daniel Kang <daniel dot d dot kang at gmail com>, as part of Google's GCI 2010. Originally committed as revision 26156 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-12-29 21:42:33 +00:00
Daniel Kang	92f441ae86	Port pred4x4_vertical_left_mmxext (H.264 intra prediction) from x264 (authors: Jason, Loren, Holger) to FFmpeg. Patch by Daniel Kang <daniel dot d dot kang at gmail com>, as part of Google's GCI 2010. Originally committed as revision 26155 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-12-29 21:35:34 +00:00
Ronald S. Bultje	e8d98764cc	Merge a few superfluous CONFIG_GPL checks. Originally committed as revision 26154 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-12-29 21:30:47 +00:00
Ronald S. Bultje	42a59278cf	Whitespace cosmetics. Originally committed as revision 26152 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-12-29 20:43:15 +00:00
Daniel Kang	57b1f334d1	Port pred8x8l_horizontal_down_sse2/ssse3 (H.264 intra prediction) from x264 (authors: Jason, Loren, Holger) to FFmpeg. Patch by Daniel Kang <daniel dot d dot kang at gmail com>, as part of Google's GCI 2010. Originally committed as revision 26151 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-12-29 20:42:15 +00:00
Daniel Kang	04cbdf3d24	Port pred8x8l_horizontal_down_mmxext (H.264 intra prediction) from x264 (authors: Jason, Loren, Holger) to FFmpeg. Patch by Daniel Kang <daniel dot d dot kang at gmail com>, as part of Google's GCI 2010. Originally committed as revision 26150 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-12-29 20:38:06 +00:00
Daniel Kang	98c6053cd0	Port pred8x8l_horizontal_up_mmxext/ssse3 (H.264 intra prediction) from x264 (authors: Jason, Loren, Holger) to FFmpeg. Patch by Daniel Kang <daniel dot d dot kang at gmail com>, as part of Google's GCI 2010. Originally committed as revision 26149 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-12-29 20:35:31 +00:00
Daniel Kang	ecc7efbbb6	Port pred8x8l_vertical_left_sse2/ssse3 (H.264 intra prediction) from x264 (authors: Jason, Loren, Holger) to FFmpeg. Patch by Daniel Kang <daniel dot d dot kang at gmail com>, as part of Google's GCI 2010. Originally committed as revision 26148 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-12-29 20:06:22 +00:00
Daniel Kang	bdd93f1b25	Port pred8x8l_vertical_right_sse2/ssse3 (H.264 intra prediction) from x264 (authors: Jason, Loren, Holger) to FFmpeg. Patch by Daniel Kang <daniel dot d dot kang at gmail com>, as part of Google's GCI 2010. Originally committed as revision 26147 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-12-29 19:54:05 +00:00
Daniel Kang	f25112fc09	Port pred8x8l_vertical_right_mmxext (H.264 intra prediction) from x264 (authors: Jason, Loren, Holger) to FFmpeg. Patch by Daniel Kang <daniel dot d dot kang at gmail com>, as part of Google's GCI 2010. Originally committed as revision 26146 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-12-29 19:46:09 +00:00
Daniel Kang	602a4cb25a	Port pred8x8l_down_right_sse2/ssse3 (H.264 intra prediction) from x264 (authors: Jason, Loren, Holger) to FFmpeg. Patch by Daniel Kang <daniel dot d dot kang at gmail com>, as part of Google's GCI 2010. Originally committed as revision 26145 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-12-29 19:19:49 +00:00
Daniel Kang	e916acbcd1	Port pred8x8l_down_right_mmxext (H.264 intra prediction) from x264 (authors: Jason, Loren, Holger) to FFmpeg. Patch by Daniel Kang <daniel dot d dot kang at gmail com>, as part of Google's GCI 2010. Originally committed as revision 26143 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-12-29 19:12:02 +00:00
Daniel Kang	c249e66576	Port pred8x8l_down_left_sse2/ssse3 (H.264 intra prediction) from x264 (authors: Jason, Loren, Holger) to FFmpeg. Patch by Daniel Kang <daniel dot d dot kang at gmail com>, as part of Google's GCI 2010. Originally committed as revision 26142 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-12-29 19:02:50 +00:00
Daniel Kang	ee1ba9c326	Port pred8x8l_vertical_mmxext/ssse3 (H.264 intra prediction) from x264 to FFmpeg. Original authors: Holger Lubitz <holger lubitz org>, Jason Garrett- Glaser <darkshikari gmail com> (approves LGPL relicensing for this code) and Loren Merritt <lorenm at u dot washington dot edu> (approves LGPL relicensing for this code). Patch by Daniel Kang <daniel dot d dot kang at gmail com>, as part of Google's GCI 2010. Originally committed as revision 26140 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-12-29 18:46:40 +00:00
Daniel Kang	04207ef353	Port pred8x8l_horizontal_mmxext/ssse3 (H.264 intra prediction) from x264 to FFmpeg. Original authors: Holger Lubitz <holger lubitz org>, Jason Garrett- Glaser <darkshikari gmail com> (approves LGPL relicensing for this code) and Loren Merritt <lorenm at u dot washington dot edu> (approves LGPL relicensing for this code). Patch by Daniel Kang <daniel dot d dot kang at gmail com>, as part of Google's GCI 2010. Originally committed as revision 26139 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-12-29 18:40:53 +00:00
Daniel Kang	abab14eac0	Port pred8x8l_dc_mmx/ssse3 (H.264 intra prediction) from x264 to FFmpeg. Original authors: Holger Lubitz <holger lubitz org>, Jason Garrett-Glaser <darkshikari gmail com> (approves LGPL relicensing for this code) and Loren Merritt <lorenm at u dot washington dot edu> (approves LGPL relicensing for this code). Patch by Daniel Kang <daniel dot d dot kang at gmail com>, as part of Google's GCI 2010. Originally committed as revision 26138 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-12-29 18:33:10 +00:00
Daniel Kang	2e93fd4b5e	Port pred8x8l_top_dc_mmxext/ssse3 (H.264 intra prediction) from x264 to FFmpeg. Original authors: Holger Lubitz <holger lubitz org>, Jason Garrett-Glaser <darkshikari gmail com> (approves LGPL relicensing for this code) and Loren Merritt <lorenm at u dot washington dot edu> (approves LGPL relicensing for this code). Patch by Daniel Kang <daniel dot d dot kang at gmail com>, as part of Google's GCI 2010. Originally committed as revision 26137 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-12-29 18:11:27 +00:00
Ronald S. Bultje	54a959e483	Move PRED4x4_LOWPASS up so it can be used in 8x8l predict functions while keeping the functions ordered in the source file (i.e. cosmetics). Originally committed as revision 26136 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-12-29 18:04:57 +00:00
Ronald S. Bultje	a2dfe8d18d	Port pred8x8_dc_mmxext (H.264 intra prediction) from x264 to FFmpeg. Original authors: Holger Lubitz <holger lubitz org>, Jason Garrett-Glaser <darkshikari gmail com> (approves LGPL relicensing for this code) and Loren Merritt <lorenm at u dot washington dot edu> (approves LGPL relicensing for this code). Patch by Daniel Kang <daniel dot d dot kang at gmail com>, as part of Google's GCI 2010. Originally committed as revision 26135 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-12-29 18:00:26 +00:00
Ronald S. Bultje	83ff3f72e5	Add missing authors to copyright headers. Originally committed as revision 26133 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-12-29 17:45:26 +00:00
Daniel Kang	725a3f9dfb	Port pred8x8_top_dc_mmxext (H.264 intra prediction) from x264 to FFmpeg. Original authors: Holger Lubitz <holger lubitz org>, Jason Garrett-Glaser <darkshikari gmail com> (approves LGPL relicensing for this code) and Loren Merritt <lorenm at u dot washington dot edu> (approves LGPL relicensing for this code). Patch by Daniel Kang <daniel dot d dot kang at gmail com>, as part of Google's GCI 2010. Originally committed as revision 26132 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-12-29 17:42:34 +00:00
Ronald S. Bultje	98928c83e0	Mark recently added pred4x4_down_left_mmxext as CONFIG_GPL. Although Holger initially said he'd be OK with relicensing, he also said he wanted to have another look at the patch, and then he went on vacation, so let's play it safe for now. We can consider removing this again later. Originally committed as revision 26131 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-12-29 17:34:00 +00:00
Daniel Kang	911b32f482	Port pred4x4_down_left_mmxext (H.264 intra prediction) from x264 to FFmpeg. LGPL relicensing approved by original authors: Holger Lubitz <holger lubitz org>, Jason Garrett-Glaser <darkshikari gmail com> and Loren Merritt <lorenm at u dot washington dot edu>. Patch by Daniel Kang <daniel dot d dot kang at gmail com>, as part of Google's GCI 2010. Originally committed as revision 26087 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-12-24 22:43:07 +00:00
Ronald S. Bultje	8d147f1f60	For rounding in chroma MC SSSE3, use 16-byte pw_3/4 instead of reading 8 bytes and then using movlhps to dup it into the higher half of the register. Originally committed as revision 26086 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-12-24 17:23:22 +00:00
Baptiste Coudurier	90f1f3bf00	In yadif filter, declare asm constants directly to avoid dependency on libavcodec Originally committed as revision 25895 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-12-06 00:14:15 +00:00
Baptiste Coudurier	9e95999e2a	10l, add ff_pw_1 to dsputil_mmx for yadif sse2 Originally committed as revision 25881 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-12-04 13:06:06 +00:00
avcoder	1761272ba9	Use SECTION .text for yasm code. Patch by avcoder, ffmpeg gmail Originally committed as revision 25859 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-12-01 13:12:39 +00:00
Ramiro Polla	4f9d25ddc8	dnxhd_mmx: prefer xmm registers below xmm6 when they are available Originally committed as revision 25634 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-11-02 03:09:16 +00:00
İsmail Dönmez	80e33d2451	dsputil: Use explicit movzbl instead of movzx This fixes compilation with the latest clang trunk version. Patch by İsmail Dönmez, ismail at namtrac dot org Originally committed as revision 25628 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-11-01 19:35:51 +00:00
Ramiro Polla	a4ece893e1	lpc_mmx: add xmm registers to clobber list Originally committed as revision 25620 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-10-31 23:37:15 +00:00
Ramiro Polla	e5d5407e26	lpc_mmx: merge some asm blocks These blocks depended on the compiler keeping xmm registers untouched between them. Originally committed as revision 25619 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-10-31 23:36:26 +00:00
Ramiro Polla	eed299b897	sad16_sse2: merge 2 asm blocks Originally committed as revision 25617 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-10-31 21:20:20 +00:00
Ramiro Polla	153ca56b38	xmm_clobbers: list xmm registers first in clobber list suncc does not like the leading commas inside the macro, but it has no problem with trailing commas. Originally committed as revision 25615 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-10-31 18:14:48 +00:00
Ramiro Polla	ba40452095	idct_sse2_xvid: only mark xmm>=8 as clobbered on x86_64 Originally committed as revision 25614 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-10-31 16:28:28 +00:00
Ramiro Polla	05c018078c	motion_est_mmx: prefer xmm registers below xmm6 when they are available Originally committed as revision 25612 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-10-31 15:07:21 +00:00
Ramiro Polla	5d543a3d13	dsputil_mmx: add xmm registers to clobber list Originally committed as revision 25611 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-10-31 13:57:58 +00:00
Ramiro Polla	e2d13c5882	cosmetics: split long line Originally committed as revision 25610 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-10-31 13:46:17 +00:00
Ramiro Polla	0d729e0de2	fdct_mmx: add xmm registers to clobber list Originally committed as revision 25609 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-10-31 13:45:04 +00:00
Ramiro Polla	616735eb97	idct_sse2_xvid: add xmm registers to clobber list Originally committed as revision 25608 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-10-31 13:17:43 +00:00
Ramiro Polla	9943f3b91c	mpegvideo_mmx: add xmm registers to clobber list Originally committed as revision 25607 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-10-31 13:15:16 +00:00
Ramiro Polla	559738eff3	dsputil_mmx: prefer xmm registers below xmm6 when they are available Originally committed as revision 25606 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-10-31 13:13:53 +00:00
Ramiro Polla	51d592dbcb	h264dsp: add xmm registers to clobber list Originally committed as revision 25604 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-10-30 17:14:22 +00:00
Ramiro Polla	ac19f4a3e8	indent Originally committed as revision 25598 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-10-28 18:31:30 +00:00
Ramiro Polla	cae05859e1	h264dsp: merge some more asm blocks Originally committed as revision 25597 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-10-28 18:22:21 +00:00
Ramiro Polla	c6a908be58	dct32: mark xmm registers in clobber list in ff_dct32_float_sse() Originally committed as revision 25569 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-10-25 20:29:29 +00:00
Ramiro Polla	b32c9ca9a3	h264dsp: merge some asm blocks Some code was initializing some xmm registers in one asm block and using them in the following block, assuming they wouldn't be changed in between blocks. Originally committed as revision 25568 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-10-25 18:02:02 +00:00
Reimar Döffinger	6c2142809c	Add d modifier to asm argument to fix nasm compilation. Originally committed as revision 25397 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-10-07 19:18:18 +00:00
Ramiro Polla	326bf69acc	fft: mark xmm registers as clobbered in ff_imdct_calc_sse Originally committed as revision 25363 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-10-06 01:27:02 +00:00
Ronald S. Bultje	dd68d4db43	MMX, MMX2, SSE2 and SSSE3 optimizations for pred16x16/8x8_plane H264 intra prediction (plus some with different rounding for svq3/rv40). Speedup (for SSSE3) about ~6-fold, 3.6% faster overall with cathedral sample. Originally committed as revision 25361 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-10-05 22:06:18 +00:00
İsmail Dönmez	9276bdddca	snowdsp: Explicitly state the operand sizes Fixes compilation with clang's builtin assembler Patch by İsmail Dönmez, ismail at namtrac dot org Originally committed as revision 25331 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-10-04 13:08:13 +00:00
Ronald S. Bultje	a52ffc3f54	Move static inline function to a macro, so that constant propagation in inline asm works for gcc-3.x also (hopefully). Should fix gcc-3.x FATE breakage after r25254. Originally committed as revision 25262 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-09-29 17:42:26 +00:00
Eli Friedman	329d689f75	Use sse2 variant of put_pixels16() for no_rnd also. Provides a minor speed increase to e.g. vc1, snow and mpeg decoding. Patch by Eli Friedman <eli dot friedman gmail com>. Originally committed as revision 25259 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-09-29 15:34:43 +00:00
Ronald S. Bultje	cd17285e6c	Merge b_idx and edge variables, and optimize the ASM to directly load variables from memory locations/offsets depending on b_idx plus constants, rather than having gcc do this. This saves several lea calls and together saves about 10 cycles in h264_loop_filter_strength_mmx2(). Originally committed as revision 25256 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-09-29 14:04:39 +00:00
Ronald S. Bultje	0cc8a5d088	Remove mv_mask variable. Replace the related pand -1/0 instructions by either a pxor, or remove the instruction alltogether. Altogether, this saves 1 instruction. Originally committed as revision 25255 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-09-29 14:03:30 +00:00
Ronald S. Bultje	c0673f2cf4	Remove d_idx as a variable, and instead load it as a constant in the asm. This has no measurable speed effect because the surrounding code doesn't take advantage of this yet. Originally committed as revision 25254 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-09-29 14:02:32 +00:00
Ronald S. Bultje	2c3135f6d3	Unroll inner bidir loop in h264_loop_filter_strength_mmx2(), which gets rid of the d_idx variable and therefore allows for future optimizations. No speed difference by this commit itself. Originally committed as revision 25253 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-09-29 13:35:24 +00:00
Ronald S. Bultje	4b81511cab	Unloop the outer loop in h264_loop_filter_strength_mmx2(), which allows inlining various constants within the loop code. 20 cycles faster on cathedral sample. Originally committed as revision 25252 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-09-29 13:34:20 +00:00
Reimar Döffinger	02b424d9c8	Add d suffix to movd target register to make it work with nasm. Originally committed as revision 25206 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-09-26 09:15:18 +00:00
Reimar Döffinger	dc77e985b7	Split and then simplify address generation macro. Allows nasm to work for this code. Originally committed as revision 25205 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-09-26 09:08:11 +00:00
Ronald S. Bultje	7e117771cd	Remove unused variable. Originally committed as revision 25173 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-09-24 15:31:46 +00:00
Ronald S. Bultje	ae11291865	Unroll loop in h264_idct_add16intra_sse2(). Basically identical to r25171, this inlines scan8[] and removes loop setup. 15% faster, 0.4% overall. See "[PATCH] unroll loop in h264_idct_add8_sse2()" thread on ML. Originally committed as revision 25172 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-09-24 14:07:23 +00:00
Ronald S. Bultje	4bca677494	Unroll loop in h264_idct_add8_sse2(). This means we can inline scan8[] in the code directly also and remove loop setup. 20% faster in function, 0.8% overall. See "[PATCH] unroll loop in h264_idct_add8_sse2()" thread on ML. Originally committed as revision 25171 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-09-24 14:05:45 +00:00
Måns Rullgård	c0bc8b9afb	x86: disable SSE functions using stack when stack is not aligned This fixes crashes with ICC 10.1. Originally committed as revision 25153 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-09-21 17:57:21 +00:00
Måns Rullgård	f41237c9db	x86: remove hack disabling sse2 h264 loop filter with 32-bit icc Originally committed as revision 25146 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-09-18 20:44:32 +00:00
Ronald S. Bultje	ada65af9d1	Don't access upper 32 bits of a 32-bit int on 64-bit systems. Originally committed as revision 25140 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-09-17 12:24:22 +00:00
Ronald S. Bultje	6c3d021891	Properly add HAVE_YASM around yasmified symbols. Should fix compile error on configurations using --disable-yasm. Originally committed as revision 25138 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-09-17 03:01:57 +00:00
Ronald S. Bultje	e2e341048e	Move hadamard_diff{,16}_{mmx,mmx2,sse2,ssse3}() from inline asm to yasm, which will hopefully solve the Win64/FATE failures caused by these functions. Originally committed as revision 25137 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-09-17 01:56:06 +00:00
Ronald S. Bultje	d0acc2d2e9	Move sse16_sse2() from inline asm to yasm. It is one of the functions causing Win64/FATE issues. Originally committed as revision 25136 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-09-17 01:44:17 +00:00
Ronald S. Bultje	1d16a1cf99	Rename h264_idct_sse2.asm to h264_idct.asm; move inline IDCT asm from h264dsp_mmx.c to h264_idct.asm (as yasm code). Because the loops are now coded in asm instead of C, this is (depending on the function) up to 50% faster for cases where gcc didn't do a great job at looping. Since h264_idct_add8() is now faster than the manual loop setup in h264.c, in-asm idct calling can now be enabled for chroma as well (see r16207). For MMX, this is 5% faster. For SSE2 (which isn't done for chroma if h264.c does the looping), this makes it up to 50% faster. Speed gain overall is ~0.5-1.0%. Originally committed as revision 25119 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-09-14 13:36:26 +00:00
Jason Garrett-Glaser	8acb554aff	LGPL SSE2 H.264 iDCT This leaves no more GPL-only H.264 decoding asm code. Approved by Loren. Originally committed as revision 25092 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-09-10 02:25:12 +00:00
Stefano Sabatini	c6c98d0897	Move mm_support() from libavcodec to libavutil, make it a public function and rename it to av_get_cpu_flags(). Originally committed as revision 25076 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-09-08 15:07:14 +00:00
Reimar Döffinger	b1c32fb5e5	Use "d" suffix for general-purpose registers used with movd. This increases compatibilty with nasm and is also more consistent, e.g. with h264_intrapred.asm and h264_chromamc.asm that already do it that way. Originally committed as revision 25042 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-09-05 10:10:16 +00:00
Stefano Sabatini	7160bb716b	Rename FF_MM_ symbols related to CPU features flags as AV_CPU_FLAG_ symbols, and move them from libavcodec/avcodec.h to libavutil/cpu.h. Originally committed as revision 25040 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-09-04 09:59:08 +00:00
Ronald S. Bultje	2c166c3af1	Port latest x264 deblock asm (before they moved to using NV12 as internal format), LGPL'ed with permission from Jason and Loren. This includes mmx2 code, so remove inline asm from h264dsp_mmx.c accordingly. Originally committed as revision 25031 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-09-03 16:52:46 +00:00
Eli Friedman	a10a9f5cd0	Fix typo in r25019. Patch by Eli Friedman <eli.friedman at gmail dot com>. Originally committed as revision 25022 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-09-01 23:19:36 +00:00
Ronald S. Bultje	615da9b1d9	Unscrew breakage after my last commit because of symbol prefixes. Originally committed as revision 25020 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-09-01 21:10:19 +00:00
Ronald S. Bultje	a33a2562c1	Rename h264_weight_sse2.asm to h264_weight.asm; add 16x8/8x16/8x4 non-square biweight code to sse2/ssse3; add sse2 weight code; and use that same code to create mmx2 functions also, so that the inline asm in h264dsp_mmx.c can be removed. OK'ed by Jason on IRC. Originally committed as revision 25019 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-09-01 20:56:16 +00:00
Ronald S. Bultje	14bc1f2485	Split h264dsp_mmx.c (which was #included in dsputil_mmx.c) in h264_qpel_mmx.c, still #included in dsputil_mmx.c and is part of DSPContext, and h264dsp_mmx.c, which represents H264DSPContext and is now compiled on its own. Originally committed as revision 25018 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-09-01 20:48:59 +00:00
Ronald S. Bultje	5929b3a651	Fix vertical align. Originally committed as revision 25009 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-08-31 12:32:24 +00:00
Ronald S. Bultje	79ce0f002e	Fix compilation failure if yasm is disabled (missing vp3 symbols). Originally committed as revision 24992 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-08-30 20:30:40 +00:00
Ronald S. Bultje	de1c253bab	Split intra prediction initialization (i.e. assigning of function pointers) into its own file, it doesn't belong in h264dsp_mmx.c (much less so in dsputil_mmx.c). Originally committed as revision 24990 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-08-30 16:34:13 +00:00
Ronald S. Bultje	d0eb5a1174	Move H264 chroma MC from inline asm to yasm. This fixes VP3/5/6 and VC-1 fate failures on Win64. Originally committed as revision 24989 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-08-30 16:31:04 +00:00
Ronald S. Bultje	e9f5f020c6	Move VP3 IDCT functions from inline ASM to YASM. This fixes part of the VP3/5/6 issues on Win64. Originally committed as revision 24988 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-08-30 16:25:46 +00:00
Ronald S. Bultje	7e7c4b6008	Put ff_ prefix on non-static {put_signed,put,add}_pixels_clamped_mmx() functions. Originally committed as revision 24987 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-08-30 16:22:27 +00:00
Loren Merritt	19d929f9a3	cosmetics in imdct_sse Originally committed as revision 24958 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-08-28 21:03:13 +00:00
Ronald S. Bultje	4eca52ed19	Fix typos when converting inline asm to yasm, fixes MMX-only fate-ea-vp61. Originally committed as revision 24948 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-08-26 14:33:39 +00:00
Ronald S. Bultje	6697bc33e2	Revert r24931, it broke Win32 and some BSD compiles (yay fate). Originally committed as revision 24934 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-08-25 20:36:35 +00:00
Ronald S. Bultje	72f642400b	Mark xmm6 and xmm7 as clobbered in ff_vp3_idct_sse2(), which is contributing to the VP6 fate failures on Win64. Originally committed as revision 24931 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-08-25 19:57:05 +00:00
Måns Rullgård	69dad87c48	VP6: fix vp6_filter_diag4_mmx/sse on 64-bit The stride can be negative and must be sign extended before being used in pointer arithmetic. Originally committed as revision 24926 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-08-25 15:41:11 +00:00
Ronald S. Bultje	89fa3504ed	Move vp6_filter_diag4() x86 SIMD code from inline ASM to YASM. This should help in fixing the Win64 fate failures. Originally committed as revision 24922 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-08-25 13:44:16 +00:00
Ronald S. Bultje	3a0885146c	Move vp6_filter_diag4() from DSPContext to VP56DSPContext. Originally committed as revision 24921 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-08-25 13:42:28 +00:00
Måns Rullgård	c0ec9918b0	Remove global mm_flags variable Originally committed as revision 24909 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-08-24 17:47:05 +00:00
Ronald S. Bultje	3611c45ab7	Mark xmm registers as clobbered in simple loopfilter. Should fix the last two VP8-related fate failures on Win64. Originally committed as revision 24908 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-08-24 16:52:27 +00:00
Alex Converse	cb4f12466b	imdct/x86: Use "s->mdct_size" instead of "1 << s->mdct_bits". It generates smaller cleaner code. Originally committed as revision 24887 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-08-23 15:51:09 +00:00
Ronald S. Bultje	684d608bde	Fix segfaults in VP8 SIMD code on Win64 (and FATE/win64 failures). Originally committed as revision 24871 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-08-23 02:41:22 +00:00
Alex Converse	78b5c97d3e	Convert ff_imdct_half_sse() to yasm. This is to avoid split asm sections that attempt to preserve some registers between sections. Originally committed as revision 24869 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-08-22 14:39:58 +00:00
Jason Garrett-Glaser	05c04cdf54	VP5/6/8: ~7% faster arithmetic decoding Grab from the bitstream in 16-bit chunks instead of 8-bit chunks. TODO: grab in 32-bit chunks on 64-bit systems. Originally committed as revision 24783 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-08-12 01:11:32 +00:00
Jason Garrett-Glaser	4a384de5b8	Split h264dsp and h264pred in configure. Many H.264 derivatives, like RV40 and VP8, use the H.264 prediction functions but not the weight/loopfilter functions. This should reduce the size of builds with one of these derivatives but without H.264 decoding itself. Originally committed as revision 24741 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-08-07 23:10:25 +00:00
Jason Garrett-Glaser	98fe09df7b	Add file missing in r24702 Originally committed as revision 24703 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-08-05 00:49:48 +00:00
Eli Friedman	c12d6955e2	H.264: SSE2/SSSE3 weighted prediction asm Patch by Eli Friedman <eli.friedman at gmail dot com> Originally committed as revision 24702 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-08-05 00:13:38 +00:00
Måns Rullgård	f079a64aea	Move cavs dsp functions to their own struct Originally committed as revision 24685 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-08-03 20:59:00 +00:00
Jason Garrett-Glaser	8b9b5e085f	VP5/6/8: add one inline missed in r24677 Originally committed as revision 24682 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-08-03 11:21:22 +00:00
Jason Garrett-Glaser	827d43bb9d	VP8: move zeroing of luma DC block into the WHT Lets us do the zeroing in asm instead of C. Also makes it consistent with the way the regular iDCT code does it. Originally committed as revision 24668 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-08-02 20:18:09 +00:00
Ronald S. Bultje	6341838f3c	Use word-writing instead of dword-writing (with two cached but otherwise unchanged bytes) in the horizontal simple loopfilter. This makes the filter quite a bit faster in itself (~30 cycles less on Core1), probably mostly because we don't need a complex 4x4 transpose, but only a simple byte interleave. Also allows using pextrw on SSE4, which speeds up even more (e.g. 25% faster on Core i7). Originally committed as revision 24638 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-31 23:13:15 +00:00
Vitor Sessak	fa738b3ad1	Remove x86/mmx.h. It is not used anymore and has been deprecated for years. Originally committed as revision 24618 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-31 16:20:45 +00:00
Vitor Sessak	de4bc44abb	Convert deinterlacing MMX code to YASM Originally committed as revision 24615 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-31 14:50:51 +00:00
Vitor Sessak	740dfe7012	Fix compilation in x86_64. I broke it with r24580. Originally committed as revision 24582 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-29 22:45:21 +00:00
Vitor Sessak	2c3dda6838	Translate libmpeg2 MMX IDCT to plain asm Originally committed as revision 24580 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-29 22:19:54 +00:00
Ronald S. Bultje	ab4d031889	Use pmaddubsw for the mbedge_filter (>=ssse3), 6-10 cycles faster. Originally committed as revision 24514 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-26 21:18:19 +00:00
Jason Garrett-Glaser	e25dee602f	VP8: Much faster SSE2 MC 5-10% faster or more on Phenom, Athlon 64, and some others. Helps some on pre-SSSE3 Intel chips as well, but not as much. Originally committed as revision 24513 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-26 19:34:00 +00:00
Ronald S. Bultje	48adb7e7a4	Enable no-loop memory/register saving for ssse3/sse4 also. Originally committed as revision 24511 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-26 14:07:57 +00:00
Ronald S. Bultje	2a180c69ea	Save a register (or regsize of stackspace for x86-32) for the no-loop mbedge loopfilter functions, by re-using space that holds a variable that we no longer need. Originally committed as revision 24510 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-26 14:00:15 +00:00
Ronald S. Bultje	bcd4aa6498	Use nested ifs instead of &&, which appears to not work with %ifidn (i.e. this construct was always enabled, even for <ssse3 versions). Originally committed as revision 24509 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-26 13:56:51 +00:00
Ronald S. Bultje	2208053bd3	Split pextrw macro-spaghetti into several opt-specific macros, this will make future new optimizations (imagine a sse5) much easier. Also fix a bug where we used the direction (%2) rather than optimization (%1) to enable this, which means it wasn't ever actually used... Originally committed as revision 24507 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-26 13:50:59 +00:00
Ronald S. Bultje	6de5b7c6b8	Fix obvious bug in assignment. Somehow, the test vectors don't test this... Originally committed as revision 24489 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-25 02:42:40 +00:00
Ronald S. Bultje	e3f7bf774c	Fix SPLATB_REG mess. Used to be a if/elseif/elseif/elseif spaghetti, so this splits it into small optimization-specific macros which are selected for each DSP function. The advantage of this approach is that the sse4 functions now use the ssse3 codepath also without needing an explicit sse4 codepath. Originally committed as revision 24487 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-24 19:33:05 +00:00
Eli Friedman	3611e7a309	Inline asm for VP56 arith coder This is a lot more reliable to get cmov rather than trying to trick gcc into generating it, useful since it's 2% faster overall. Patch by Eli Friedman <eli.friedman at gmail> Originally committed as revision 24471 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-23 21:46:30 +00:00
Jason Garrett-Glaser	3ae079a3c8	VP8: optimize DC-only chroma case in the same way as luma. Add MMX idct_dc_add4uv function for this case. ~40% faster chroma idct. Originally committed as revision 24455 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-23 06:02:52 +00:00
Jason Garrett-Glaser	51c9156438	VP8 asm: cosmetics (spacing) Originally committed as revision 24453 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-23 03:02:56 +00:00
Jason Garrett-Glaser	8a467b2d44	VP8: 30% faster idct_mb Take shortcuts based on statistically common situations. Add 4-at-a-time idct_dc function (mmx and sse2) since rows of 4 DC-only DCT blocks are common. TODO: tie this more directly into the MB mode, since the DC-level transform is only used for non-splitmv blocks? Originally committed as revision 24452 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-23 02:58:27 +00:00
Jason Garrett-Glaser	c25c776708	VP8: clear DCT blocks in iDCT instead of using clear_blocks. ~0.3% faster overall. Originally committed as revision 24448 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-23 00:07:16 +00:00
Ronald S. Bultje	dc5eec8085	Use pextrw for SSE4 mbedge filter result writing, speedup 5-10cycles on CPUs supporting it. Originally committed as revision 24437 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-22 19:59:34 +00:00
Ronald S. Bultje	003243c3c2	Fix and enable horizontal >=SSE2 mbedge loopfilter. Originally committed as revision 24409 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-22 01:35:26 +00:00
Loren Merritt	c7b1d9768c	relicense h264 deblock sse2 to lgpl Originally committed as revision 24408 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-22 00:39:49 +00:00
Loren Merritt	532e769701	sync yasm macros from x264 Originally committed as revision 24406 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-21 22:45:16 +00:00
Jason Garrett-Glaser	8731dbd890	Eliminate one instruction in VP8 dc_add_sse4 Originally committed as revision 24405 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-21 22:41:37 +00:00
Jason Garrett-Glaser	7dd224a42d	Various VP8 x86 deblocking speedups SSSE3 versions, improve SSE2 versions a bit. SSE2/SSSE3 mbedge h functions are currently broken, so explicitly disable them. Originally committed as revision 24403 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-21 22:11:03 +00:00
Jason Garrett-Glaser	b8b231b5dc	Make mmx VP8 WHT faster Avoid pextrw, since it's slow on many older CPUs. Now it doesn't require mmxext either. Originally committed as revision 24397 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-21 20:51:01 +00:00
David Conrad	af521abc28	Add header declarations for mmx/sse constants missing them Originally committed as revision 24381 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-21 10:02:07 +00:00
David Conrad	c7eec58170	Move ff_pw_* from vc1dsp_mmx.c to dsputil_mmx.c Should fix compilation with icc and should help prevent any future duplicates Originally committed as revision 24380 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-21 10:02:03 +00:00
Ronald S. Bultje	e9e456d850	VP8 MBedge loopfilter MMX/MMX2/SSE2 functions for both luma (width=16) and chroma (width=8). Originally committed as revision 24378 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-20 22:58:56 +00:00
Ronald S. Bultje	268821e76e	Chroma (width=8) inner loopfilter MMX/MMX2/SSE2 for VP8 decoder. Originally committed as revision 24377 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-20 22:04:18 +00:00
Ronald S. Bultje	c60ed66dbe	Revert r24339 (it causes fate failures on x86-64) - I'll figure out what's wrong with it tomorrow or so, then re-submit. Originally committed as revision 24341 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-19 23:57:09 +00:00
Ronald S. Bultje	6526976f0c	Remove FF_MM_SSE2/3 flags for CPUs where this is generally not faster than regular MMX code. Examples of this are the Core1 CPU. Instead, set a new flag, FF_MM_SSE2/3SLOW, which can be checked for particular SSE2/3 functions that have been checked specifically on such CPUs and are actually faster than their MMX counterparts. In addition, use this flag to enable particular VP8 and LPC SSE2 functions that are faster than their MMX counterparts. Based on a patch by Loren Merritt <lorenm AT u washington edu>. Originally committed as revision 24340 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-19 22:38:23 +00:00
Ronald S. Bultje	1878f685c0	Implement chroma (width=8) inner loopfilter MMX/MMX2/SSE2 functions. Originally committed as revision 24339 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-19 21:53:28 +00:00
Ronald S. Bultje	fb9bdf048c	Be more efficient with registers or stack memory. Saves 8/16 bytes stack for x86-32, or 2 MM registers on x86-64. Originally committed as revision 24338 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-19 21:45:36 +00:00
Ronald S. Bultje	3facfc99da	Change function prototypes for width=8 inner and mbedge loopfilter functions so that it does both U and V planes at the same time. This will have speed advantages when using SSE2 (or higher) optimizations, since we can do both the U and V rows together in a single xmm register. This also renames filter16 to filter16y and filter8 to filter8uv so that it's more obvious what each function is used for. Originally committed as revision 24337 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-19 21:18:04 +00:00
Loren Merritt	1ee076b1b1	more credits to D. J. Bernstein for fft Originally committed as revision 24308 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-18 20:06:42 +00:00
Ronald S. Bultje	819b2dd2b1	Attempt to fix x86-64 testsuite on fate. Originally committed as revision 24275 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-16 21:35:30 +00:00
Ronald S. Bultje	6f323f1251	Remove duplicate define. Originally committed as revision 24272 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-16 19:54:47 +00:00
Ronald S. Bultje	889b2c26ee	Revert 24270, it contained some stuff that shouldn't have been in there. Originally committed as revision 24271 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-16 19:54:25 +00:00
Ronald S. Bultje	2356a7834b	Remove duplicate define. Originally committed as revision 24270 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-16 19:42:32 +00:00
Ronald S. Bultje	ede1b9665a	Give x86 r%d registers names, this will simplify implementation of the chroma inner loopfilter, and it also allows us to save one register on x86-64/sse2. Originally committed as revision 24269 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-16 19:38:10 +00:00
Ronald S. Bultje	526e831a46	Change return statement, the REP_RET is a mistake since the else case (x86-64, sse2) doesn't actually loop, so REP_RET isn't necessary. Originally committed as revision 24268 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-16 18:29:14 +00:00
Ronald S. Bultje	a711eb4829	VP8 H/V inner loopfilter MMX/MMXEXT/SSE2 optimizations. Originally committed as revision 24250 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-15 23:02:34 +00:00
David Conrad	faa26db28b	MMX/SSE VC1 loop filter Originally committed as revision 24208 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-11 22:53:01 +00:00

... 2 3 4 5 6 ...

488 Commits