Commit Graph

44 Commits

Author SHA1 Message Date
Michael Niedermayer
e776ee8f29 Merge remote-tracking branch 'qatar/master'
* qatar/master:
  lavr: fix handling of custom mix matrices
  fate: force pix_fmt in lagarith-rgb32 test
  fate: add tests for lagarith lossless video codec.
  ARMv6: vp8: fix stack allocation with Apple's assembler
  ARM: vp56: allow inline asm to build with clang
  fft: 3dnow: fix register name typo in DECL_IMDCT macro
  x86: dct32: port to cpuflags
  x86: build: replace mmx2 by mmxext
  Revert "wmapro: prevent division by zero when sample rate is unspecified"
  wmapro: prevent division by zero when sample rate is unspecified
  lagarith: fix color plane inversion for YUY2 output.
  lagarith: pad RGB buffer by 1 byte.
  dsputil: make add_hfyu_left_prediction_sse4() support unaligned src.

Conflicts:
	doc/APIchanges
	libavcodec/lagarith.c
	libavfilter/x86/gradfun.c
	libavutil/cpu.h
	libavutil/version.h
	libswscale/utils.c
	libswscale/version.h
	libswscale/x86/yuv2rgb.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2012-08-04 23:51:43 +02:00
Diego Biurrun
239fdf1b4a x86: build: replace mmx2 by mmxext
Refactoring mmx2/mmxext YASM code with cpuflags will force renames.
So switching to a consistent naming scheme beforehand is sensible.
The name "mmxext" is more official and widespread and also the name
of the CPU flag, as reported e.g. by the Linux kernel.
2012-08-03 22:51:05 +02:00
Michael Niedermayer
24823a761c Merge remote-tracking branch 'qatar/master'
* qatar/master:
  qdm2: remove broken and disabled dump_context() debug function
  x86: h264_intrapred: use newly introduced SPLAT* and PSHUFLW macros
  x86inc: add SPLATB_LOAD, SPLATB_REG, PSHUFLW macros
  x86inc: modify ALIGN to not generate long nops on i586
  x86: h264_intrapred: port to cpuflag macros
  avplay: update input filter pointer when the filtergraph is reset.
  avconv: fix parsing of -force_key_frames option.
  h264: use templates to avoid excessive inlining
  xtea: Make the count parameter match the documentation
  blowfish: Make the count parameter match the documentation
  mpegvideo: Don't use ff_mspel_motion() for vc1
  xtea: invert branch and loop precedence
  blowfish: invert branch and loop precedence
  flvdec: optionally trust the metadata
  avconv: Set audio filter time base to the sample rate
  vp8: Add ifdef guards around the sse2 loopfilter in the sse2slow branch too

Conflicts:
	ffmpeg.c
	ffplay.c
	libavcodec/h264.c
	libavcodec/mpegvideo_common.h

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2012-07-05 21:55:31 +02:00
Martin Storsjö
07eeeb1d4f vp8: Add ifdef guards around the sse2 loopfilter in the sse2slow branch too
This was missed in the the previous commit in 70a1c800.

Signed-off-by: Martin Storsjö <martin@martin.st>
2012-07-05 09:39:01 +03:00
Michael Niedermayer
039e9fe01c Merge remote-tracking branch 'qatar/master'
* qatar/master: (29 commits)
  lavfi: reclassify showfiltfmts as a TESTPROG
  graph2dot: fix printf format specifier
  swscale: yuv2planeX 8bit >=sse2 functions need aligned stack on x86-32.
  vp8: loopfilter >=sse2 functions need aligned stack on x86-32.
  amr: remove shift out of the AMR_BIT() macro.
  dsputilenc: group yasm and inline asm function pointer assignment.
  mov: use forward declaration of a function instead of a table.
  Clarify Doxygen comment for FF_API_* #defines.
  configure: simplify get_version()
  Create version.h headers for libraries that lack them
  gitignore: Use full path instead of relative path to specify patterns
  mpegvideo: remove VLAs
  Add XTEA encryption support in libavutil
  Add Blowfish encryption support in libavutil
  eval: Add the isinf() function and tests for it
  flacdec: move lpc filter to flacdsp
  flacdec: split off channel decorrelation as flacdsp
  avplay: Add an option for not limiting the input buffer size
  FATE: add a test for WMA cover art.
  FATE: add a test for apetag cover art
  ...

Conflicts:
	.gitignore
	configure
	ffplay.c
	libavcodec/Makefile
	libavcodec/error_resilience.c
	libavcodec/mpegvideo.c
	libavcodec/ratecontrol.c
	libavdevice/avdevice.h
	libavfilter/Makefile
	libavfilter/filtfmts.c
	libavfilter/version.h
	libavformat/mov.c
	libavformat/version.h
	libavutil/Makefile
	libavutil/avutil.h
	libavutil/version.h
	libswscale/swscale.h
	libswscale/x86/swscale_mmx.c
	tests/fate/libavutil.mak
	tests/lavfi-regression.sh
	tools/graph2dot.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2012-07-04 21:03:28 +02:00
Martin Storsjö
70a1c8000f vp8: loopfilter >=sse2 functions need aligned stack on x86-32.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2012-07-04 08:25:50 -07:00
Michael Niedermayer
2af8f2cea6 Merge remote-tracking branch 'qatar/master'
* qatar/master: (27 commits)
  cmdutils: use new avcodec_is_decoder/encoder() functions.
  lavc: make codec_is_decoder/encoder() public.
  lavc: deprecate AVCodecContext.sub_id.
  libcdio: add a forgotten AVClass to the private context.
  swscale: remove "cpu flags" from -sws_flags description.
  proresenc: give user a possibility to alter some encoding parameters
  vorbisenc: add output buffer overwrite protection
  libopencore-amrnbenc: fix end-of-stream handling
  ra144enc: fix end-of-stream handling
  nellymoserenc: zero any leftover packet bytes
  nellymoserenc: use proper MDCT overlap delay
  qpeg: Use bytestream2 functions to prevent buffer overreads.
  swscale: make %rep unconditional.
  vp8: convert simple loopfilter x86 assembly to use named arguments.
  vp8: convert idct x86 assembly to use named arguments.
  vp8: convert mc x86 assembly to use named arguments.
  vp8: convert loopfilter x86 assembly to use cpuflags().
  vp8: convert idct/mc x86 assembly to use cpuflags().
  swscale: remove now unnecessary hack.
  x86inc: don't "bake" stack_offset in named arguments.
  ...

Conflicts:
	cmdutils.c
	doc/APIchanges
	libavcodec/mpeg12.c
	libavcodec/options.c
	libavcodec/qpeg.c
	libavcodec/utils.c
	libavcodec/version.h
	libavdevice/libcdio.c
	tests/lavf-regression.sh

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2012-03-05 00:15:55 +01:00
Ronald S. Bultje
e25be47154 vp8: convert idct/mc x86 assembly to use cpuflags(). 2012-03-03 20:39:59 -08:00
Michael Niedermayer
268098d8b2 Merge remote-tracking branch 'qatar/master'
* qatar/master: (29 commits)
  amrwb: remove duplicate arguments from extrapolate_isf().
  amrwb: error out early if mode is invalid.
  h264: change underread for 10bit QPEL to overread.
  matroska: check buffer size for RM-style byte reordering.
  vp8: disable mmx functions with sse/sse2 counterparts on x86-64.
  vp8: change int stride to ptrdiff_t stride.
  wma: fix invalid buffer size assumptions causing random overreads.
  Windows Media Audio Lossless decoder
  rv10/20: Fix slice overflow with checked bitstream reader.
  h263dec: Disallow width/height changing with frame threads.
  rv10/20: Fix a buffer overread caused by losing track of the remaining buffer size.
  rmdec: Honor .RMF tag size rather than assuming 18.
  g722: Fix the QMF scaling
  r3d: don't set codec timebase.
  electronicarts: set timebase for tgv video.
  electronicarts: parse the framerate for cmv video.
  ogg: don't set codec timebase
  electronicarts: don't set codec timebase
  avs: don't set codec timebase
  wavpack: Fix an integer overflow
  ...

Conflicts:
	libavcodec/arm/vp8dsp_init_arm.c
	libavcodec/fraps.c
	libavcodec/h264.c
	libavcodec/mpeg4videodec.c
	libavcodec/mpegvideo.c
	libavcodec/msmpeg4.c
	libavcodec/pnmdec.c
	libavcodec/qpeg.c
	libavcodec/rawenc.c
	libavcodec/ulti.c
	libavcodec/vcr1.c
	libavcodec/version.h
	libavcodec/wmalosslessdec.c
	libavformat/electronicarts.c
	libswscale/ppc/yuv2rgb_altivec.c
	tests/ref/acodec/g722
	tests/ref/fate/ea-cmv

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2012-03-03 00:23:10 +01:00
Ronald S. Bultje
45549339bc vp8: disable mmx functions with sse/sse2 counterparts on x86-64.
x86-64 is guaranteed to have at least SSE2, therefore the MMX/MMX2
functions will never be used in practice.
2012-03-02 10:32:05 -08:00
Ronald S. Bultje
bd66f073fe vp8: change int stride to ptrdiff_t stride.
On 64bit platforms with 32bit int, this means we won't have to sign-
extend the integer anymore.
2012-03-02 10:31:50 -08:00
Michael Niedermayer
dd8ffc1925 Merge remote-tracking branch 'qatar/master'
* qatar/master: (47 commits)
  lavc: hide private symbols.
  lavc: deprecate img_get_alpha_info().
  lavc: use avpriv_ prefix for ff_toupper4.
  lavc: use avpriv_ prefix for ff_copy_bits and align_put_bits.
  lavc: use avpriv_ prefix for ff_ac3_parse_header.
  lavc: use avpriv_ prefix for ff_frame_rate_tab.
  lavc: rename ff_find_start_code to avpriv_mpv_find_start_code
  lavc: use avpriv_ prefix for ff_split_xiph_headers.
  lavc: use avpriv_ prefix for ff_dirac_parse_sequence_header.
  lavc: use avpriv_ prefix for some dv symbols used in lavf.
  lavc: use avpriv_ prefix for some flac symbols used in lavf.
  lavc: use avpriv_ prefix for some mpeg4audio symbols used in lavf.
  lavc: use avpriv_ prefix for some mpegaudio symbols used in lavf.
  lavc: use avpriv_ prefix for ff_aac_parse_header().
  lavf: hide private symbols.
  lavf: use avpriv_ prefix for some dv functions.
  lavf: use avpriv_ prefix for ff_new_chapter().
  avcodec: add CODEC_CAP_DELAY note to avcodec_decode_audio3() documentation
  avcodec: clarify the CODEC_CAP_DELAY note in avcodec_decode_video2()
  avcodec: clarify documentation of CODEC_CAP_DELAY
  ...

Conflicts:
	configure
	doc/general.texi
	libavcodec/Makefile
	libavcodec/aacdec.c
	libavcodec/allcodecs.c
	libavcodec/avcodec.h
	libavcodec/dv.c
	libavcodec/dvdata.c
	libavcodec/dvdata.h
	libavcodec/libspeexenc.c
	libavcodec/mpegvideo.c
	libavcodec/version.h
	libavformat/avidec.c
	libavformat/dv.c
	libavformat/dv.h
	libavformat/flvenc.c
	libavformat/mov.c
	libavformat/mp3enc.c
	libavformat/oggparsespeex.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2011-10-21 02:01:26 +02:00
Diego Biurrun
265980dabc x86: Move some variable declarations below the appropriat #ifdef.
This avoids some unused variable warnings with YASM disabled.
2011-10-20 16:19:27 +02:00
Mans Rullgard
2912e87a6c Replace FFmpeg with Libav in licence headers
Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-03-19 13:33:20 +00:00
Stefano Sabatini
c6c98d0897 Move mm_support() from libavcodec to libavutil, make it a public
function and rename it to av_get_cpu_flags().

Originally committed as revision 25076 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-09-08 15:07:14 +00:00
Stefano Sabatini
7160bb716b Rename FF_MM_ symbols related to CPU features flags as AV_CPU_FLAG_
symbols, and move them from libavcodec/avcodec.h to libavutil/cpu.h.

Originally committed as revision 25040 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-09-04 09:59:08 +00:00
Måns Rullgård
c0ec9918b0 Remove global mm_flags variable
Originally committed as revision 24909 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-08-24 17:47:05 +00:00
Jason Garrett-Glaser
827d43bb9d VP8: move zeroing of luma DC block into the WHT
Lets us do the zeroing in asm instead of C.
Also makes it consistent with the way the regular iDCT code does it.

Originally committed as revision 24668 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-08-02 20:18:09 +00:00
Ronald S. Bultje
6341838f3c Use word-writing instead of dword-writing (with two cached but otherwise
unchanged bytes) in the horizontal simple loopfilter. This makes the filter
quite a bit faster in itself (~30 cycles less on Core1), probably mostly
because we don't need a complex 4x4 transpose, but only a simple byte
interleave. Also allows using pextrw on SSE4, which speeds up even more
(e.g. 25% faster on Core i7).

Originally committed as revision 24638 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-31 23:13:15 +00:00
Jason Garrett-Glaser
3ae079a3c8 VP8: optimize DC-only chroma case in the same way as luma.
Add MMX idct_dc_add4uv function for this case.
~40% faster chroma idct.

Originally committed as revision 24455 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-23 06:02:52 +00:00
Jason Garrett-Glaser
8a467b2d44 VP8: 30% faster idct_mb
Take shortcuts based on statistically common situations.
Add 4-at-a-time idct_dc function (mmx and sse2) since rows of 4 DC-only DCT
blocks are common.
TODO: tie this more directly into the MB mode, since the DC-level transform is
only used for non-splitmv blocks?

Originally committed as revision 24452 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-23 02:58:27 +00:00
Jason Garrett-Glaser
c25c776708 VP8: clear DCT blocks in iDCT instead of using clear_blocks.
~0.3% faster overall.

Originally committed as revision 24448 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-23 00:07:16 +00:00
Ronald S. Bultje
dc5eec8085 Use pextrw for SSE4 mbedge filter result writing, speedup 5-10cycles on
CPUs supporting it.

Originally committed as revision 24437 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-22 19:59:34 +00:00
Ronald S. Bultje
003243c3c2 Fix and enable horizontal >=SSE2 mbedge loopfilter.
Originally committed as revision 24409 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-22 01:35:26 +00:00
Jason Garrett-Glaser
7dd224a42d Various VP8 x86 deblocking speedups
SSSE3 versions, improve SSE2 versions a bit.
SSE2/SSSE3 mbedge h functions are currently broken, so explicitly disable them.

Originally committed as revision 24403 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-21 22:11:03 +00:00
Jason Garrett-Glaser
b8b231b5dc Make mmx VP8 WHT faster
Avoid pextrw, since it's slow on many older CPUs.
Now it doesn't require mmxext either.

Originally committed as revision 24397 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-21 20:51:01 +00:00
Ronald S. Bultje
e9e456d850 VP8 MBedge loopfilter MMX/MMX2/SSE2 functions for both luma (width=16)
and chroma (width=8).

Originally committed as revision 24378 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-20 22:58:56 +00:00
Ronald S. Bultje
268821e76e Chroma (width=8) inner loopfilter MMX/MMX2/SSE2 for VP8 decoder.
Originally committed as revision 24377 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-20 22:04:18 +00:00
Ronald S. Bultje
c60ed66dbe Revert r24339 (it causes fate failures on x86-64) - I'll figure out what's
wrong with it tomorrow or so, then re-submit.

Originally committed as revision 24341 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-19 23:57:09 +00:00
Ronald S. Bultje
6526976f0c Remove FF_MM_SSE2/3 flags for CPUs where this is generally not faster than
regular MMX code. Examples of this are the Core1 CPU. Instead, set a new flag,
FF_MM_SSE2/3SLOW, which can be checked for particular SSE2/3 functions that
have been checked specifically on such CPUs and are actually faster than
their MMX counterparts.

In addition, use this flag to enable particular VP8 and LPC SSE2 functions
that are faster than their MMX counterparts.

Based on a patch by Loren Merritt <lorenm AT u washington edu>.

Originally committed as revision 24340 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-19 22:38:23 +00:00
Ronald S. Bultje
1878f685c0 Implement chroma (width=8) inner loopfilter MMX/MMX2/SSE2 functions.
Originally committed as revision 24339 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-19 21:53:28 +00:00
Ronald S. Bultje
3facfc99da Change function prototypes for width=8 inner and mbedge loopfilter functions
so that it does both U and V planes at the same time. This will have speed
advantages when using SSE2 (or higher) optimizations, since we can do both
the U and V rows together in a single xmm register.

This also renames filter16 to filter16y and filter8 to filter8uv so that it's
more obvious what each function is used for.

Originally committed as revision 24337 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-19 21:18:04 +00:00
Ronald S. Bultje
a711eb4829 VP8 H/V inner loopfilter MMX/MMXEXT/SSE2 optimizations.
Originally committed as revision 24250 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-15 23:02:34 +00:00
Ronald S. Bultje
f2a30bd840 Simple H/V loopfilter for VP8 in MMX, MMX2 and SSE2 (yay for yasm macros).
Originally committed as revision 24029 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-03 19:26:30 +00:00
Jason Garrett-Glaser
b06855f18a SSSE3 versions of vp8 width4 bilinear MC functions
Originally committed as revision 24013 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-03 00:48:12 +00:00
Jason Garrett-Glaser
dcc602d802 SSSE3 versions of width4 VP8 6-tap MC functions
Also make some small changes to saturation order of 4-tap SSSE3 MC to fix a
non-bitexactness bug.

Patch mostly by Eli Friedman <eli.friedman AT gmail DOT com>.

Originally committed as revision 23965 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-02 05:27:41 +00:00
Jason Garrett-Glaser
8434fc26eb Fix 100L in vp8dsp asm init
Originally committed as revision 23946 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-01 22:09:22 +00:00
Ronald S. Bultje
2dd2f71692 MMX idct_add for VP8.
Originally committed as revision 23886 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-06-29 14:43:11 +00:00
Jason Garrett-Glaser
004cda8e79 Add mmxext version of VP8 DC Hadamard transform
Originally committed as revision 23878 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-06-29 01:41:59 +00:00
Baptiste Coudurier
50f70541d3 Change MMXEXT to MMX2, MMXEXT is deprecated
Originally committed as revision 23865 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-06-28 21:12:00 +00:00
Jason Garrett-Glaser
0fecad09fe Add x86 asm functions for VP8 put_pixels
Originally committed as revision 23858 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-06-28 19:14:40 +00:00
Jason Garrett-Glaser
a173aa8940 Add MMX, SSE2, SSSE3 asm for VP8 bilinear MC
Originally committed as revision 23857 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-06-28 18:56:24 +00:00
David Conrad
30bdefd1de Fix build without yasm
Originally committed as revision 23816 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-06-27 02:52:43 +00:00
Jason Garrett-Glaser
0178d14fe5 First shot at VP8 optimizations:
- MMXEXT, SSE2 and SSSE3 MC functions
- MMX and SSE4 IDCT dc_add functions

Patch by Jason Garrett-Glaser <darkshikari gmail com> and myself.

Originally committed as revision 23815 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-06-27 02:01:45 +00:00