Commit Graph

46 Commits

Author SHA1 Message Date
Ronald S. Bultje
1acd7d594c h264: integrate clear_blocks calls with IDCT.
The non-intra-pcm branch in hl_decode_mb (simple, 8bpp) goes from 700
to 672 cycles, and the complete loop of decode_mb_cabac and hl_decode_mb
(in the decode_slice loop) goes from 1759 to 1733 cycles on the clip
tested (cathedral), i.e. almost 30 cycles per mb faster.

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2013-02-19 16:25:50 +01:00
Michael Niedermayer
ac8987591f Merge commit '88bd7fdc821aaa0cbcf44cf075c62aaa42121e3f'
* commit '88bd7fdc821aaa0cbcf44cf075c62aaa42121e3f':
  Drop DCTELEM typedef

Conflicts:
	libavcodec/alpha/dsputil_alpha.h
	libavcodec/alpha/motion_est_alpha.c
	libavcodec/arm/dsputil_init_armv6.c
	libavcodec/bfin/dsputil_bfin.h
	libavcodec/bfin/pixels_bfin.S
	libavcodec/cavs.c
	libavcodec/cavsdec.c
	libavcodec/dct-test.c
	libavcodec/dnxhdenc.c
	libavcodec/dsputil.c
	libavcodec/dsputil.h
	libavcodec/dsputil_template.c
	libavcodec/eamad.c
	libavcodec/h264_cavlc.c
	libavcodec/h264idct_template.c
	libavcodec/mpeg12.c
	libavcodec/mpegvideo.c
	libavcodec/mpegvideo.h
	libavcodec/mpegvideo_enc.c
	libavcodec/ppc/dsputil_altivec.c
	libavcodec/proresdsp.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-01-23 17:44:56 +01:00
Diego Biurrun
88bd7fdc82 Drop DCTELEM typedef
It does not help as an abstraction and adds dsputil dependencies.

Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2013-01-22 18:32:56 -08:00
Michael Niedermayer
42d3fea65f Merge commit 'af7d13ee4a4bf8d708f9b0598abb8f6e22b76de1'
* commit 'af7d13ee4a4bf8d708f9b0598abb8f6e22b76de1':
  asink_nullsink: plug a memory leak.
  x86: h264_idct: port to cpuflags
  x86: cpu: Drop unused HAVE_RWEFLAGS condition

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2012-11-28 13:32:17 +01:00
Diego Biurrun
2e89aeed65 x86: h264_idct: port to cpuflags 2012-11-28 00:28:09 +01:00
Michael Niedermayer
a1b5c9634e Merge remote-tracking branch 'qatar/master'
* qatar/master:
  x86: mmx2 ---> mmxext in asm constructs

Conflicts:
	libavcodec/x86/h264_chromamc_10bit.asm
	libavcodec/x86/h264_deblock.asm
	libavcodec/x86/h264dsp_init.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2012-11-14 12:34:30 +01:00
Diego Biurrun
26301caaa1 x86: mmx2 ---> mmxext in asm constructs 2012-11-14 00:58:51 +01:00
Michael Niedermayer
28c0678eb7 Merge commit 'be923ed659016350592acb9b3346f706f8170ac5'
* commit 'be923ed659016350592acb9b3346f706f8170ac5':
  x86: fmtconvert: port to cpuflags
  x86: MMX2 ---> MMXEXT in macro names

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2012-10-31 14:16:18 +01:00
Michael Niedermayer
3174616f59 Merge commit '6860b4081d046558c44b1b42f22022ea341a2a73'
* commit '6860b4081d046558c44b1b42f22022ea341a2a73':
  x86: include x86inc.asm in x86util.asm
  cng: Reindent some incorrectly indented lines
  cngdec: Allow flushing the decoder
  cngdec: Make the dbov variable have the right unit
  cngdec: Fix the memset size to cover the full array
  cngdec: Update the LPC coefficients after averaging the reflection coefficients
  configure: fix print_config() with broke awks

Conflicts:
	libavcodec/x86/ac3dsp.asm
	libavcodec/x86/dct32.asm
	libavcodec/x86/deinterlace.asm
	libavcodec/x86/dsputil.asm
	libavcodec/x86/dsputilenc.asm
	libavcodec/x86/fft.asm
	libavcodec/x86/fmtconvert.asm
	libavcodec/x86/h264_chromamc.asm
	libavcodec/x86/h264_deblock.asm
	libavcodec/x86/h264_deblock_10bit.asm
	libavcodec/x86/h264_idct.asm
	libavcodec/x86/h264_idct_10bit.asm
	libavcodec/x86/h264_intrapred.asm
	libavcodec/x86/h264_intrapred_10bit.asm
	libavcodec/x86/h264_weight.asm
	libavcodec/x86/vc1dsp.asm
	libavcodec/x86/vp3dsp.asm
	libavcodec/x86/vp56dsp.asm
	libavcodec/x86/vp8dsp.asm

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2012-10-31 13:43:33 +01:00
Diego Biurrun
588fafe7f3 x86: MMX2 ---> MMXEXT in macro names 2012-10-31 01:04:55 +01:00
Diego Biurrun
6860b4081d x86: include x86inc.asm in x86util.asm
This is necessary to allow refactoring some x86util macros with cpuflags.
2012-10-31 00:37:42 +01:00
Diego Biurrun
04581c8c77 x86: yasm: Use complete source path for macro helper %includes
This is more consistent with the way we handle C #includes and
it simplifies the build system.
2012-10-31 00:37:42 +01:00
Michael Niedermayer
2fc7c818cb Merge remote-tracking branch 'qatar/master'
* qatar/master:
  x86: fix build with nasm 2.08
  x86: use nop cpu directives only if supported
  x86: fix rNmp macros with nasm
  build: add trailing / to yasm/nasm -I flags
  x86: use 32-bit source registers with movd instruction
  x86: add colons after labels

Conflicts:
	Makefile
	libavutil/x86/x86inc.asm

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2012-08-07 23:04:55 +02:00
Mans Rullgard
a3df4781f4 x86: add colons after labels
nasm prints a warning if the colon is missing.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-08-07 15:20:56 +01:00
Michael Niedermayer
b4780d03d0 Merge remote-tracking branch 'qatar/master'
* qatar/master:
  x86: h264_idct: Rename x264_add8x4_idct_sse2 --> h264_add8x4_idct_sse2
  rational: add av_inv_q() returning the inverse of an AVRational
  dpx: Make start offset unsigned
  lavfi: properly signal out-of-memory error in ff_filter_samples
  cosmetics: Fix a few switched periods and linebreaks
  zerocodec: Fix memleak in decode_frame
  zerocodec: Cosmetics

Conflicts:
	ffmpeg.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2012-08-05 22:17:02 +02:00
Diego Biurrun
2096857551 x86: h264_idct: Rename x264_add8x4_idct_sse2 --> h264_add8x4_idct_sse2 2012-08-05 21:40:49 +02:00
Michael Niedermayer
ca19862d38 Merge remote-tracking branch 'qatar/master'
* qatar/master:
  libxvid: remove disabled code
  qdm2: make a table static const
  qdm2: simplify bitstream reader setup for some subpacket types
  qdm2: use get_bits_left()
  build: Consistently handle conditional compilation for all optimization OBJS.
  avpacket, bfi, bgmc, rawenc: K&R prettyprinting cosmetics
  msrle: convert MS RLE decoding function to bytestream2.
  x86inc improvements for 64-bit

Conflicts:
	common.mak
	libavcodec/avpacket.c
	libavcodec/bfi.c
	libavcodec/msrledec.c
	libavcodec/qdm2.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2012-04-13 00:39:19 +02:00
Henrik Gramner
729f90e268 x86inc improvements for 64-bit
Add support for all x86-64 registers
Prefer caller-saved register over callee-saved on WIN64
Support up to 15 function arguments

Also (by Ronald S. Bultje)
Fix up our asm to work with new x86inc.asm.

Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: Justin Ruggles <justin.ruggles@gmail.com>
2012-04-11 15:47:00 -04:00
Michael Niedermayer
f2b20b7a8b Merge remote-tracking branch 'qatar/master'
* qatar/master:
  pixdesc: mark pseudopaletted formats with a special flag.
  avconv: switch to avcodec_encode_video2().
  libx264: implement encode2().
  libx264: split extradata writing out of encode_nals().
  lavc: add avcodec_encode_video2() that encodes from an AVFrame -> AVPacket
  cmdutils: update copyright year to 2012.
  swscale: sign-extend integer function argument to qword on x86-64.
  x86inc: support yasm -f win64 flag also.
  h264: manually save/restore XMM registers for functions using INIT_MMX.
  x86inc: allow manual use of WIN64_SPILL_XMM.
  aacdec: Use correct speaker order for 7.1.
  aacdec: Remove incorrect comment.
  aacdec: Simplify output configuration.
  Remove Sun medialib glue code.
  dsputil: set STRIDE_ALIGN to 16 for x86 also.
  pngdsp: swap argument inversion.

Conflicts:
	cmdutils.c
	configure
	doc/APIchanges
	ffmpeg.c
	libavcodec/aacdec.c
	libavcodec/dsputil.h
	libavcodec/libx264.c
	libavcodec/mlib/dsputil_mlib.c
	libavcodec/utils.c
	libavfilter/vf_scale.c
	libavutil/avutil.h
	libswscale/mlib/yuv2rgb_mlib.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2012-02-09 01:27:12 +01:00
Ronald S. Bultje
ce1e250ee9 h264: manually save/restore XMM registers for functions using INIT_MMX.
On Win64, these registers are callee-save, so not saving/restoring them
correctly is a violation of ABI and can lead to crashes or corrupt data.
2012-02-08 10:31:14 -08:00
Michael Niedermayer
e37f161e66 Merge remote-tracking branch 'qatar/master'
* qatar/master: (71 commits)
  movenc: Allow writing to a non-seekable output if using empty moov
  movenc: Support adding isml (smooth streaming live) metadata
  libavcodec: Don't crash in avcodec_encode_audio if time_base isn't set
  sunrast: Document the different Sun Raster file format types.
  sunrast: Add a check for experimental type.
  libspeexenc: use AVSampleFormat instead of deprecated/removed SampleFormat
  lavf: remove disabled FF_API_SET_PTS_INFO cruft
  lavf: remove disabled FF_API_OLD_INTERRUPT_CB cruft
  lavf: remove disabled FF_API_REORDER_PRIVATE cruft
  lavf: remove disabled FF_API_SEEK_PUBLIC cruft
  lavf: remove disabled FF_API_STREAM_COPY cruft
  lavf: remove disabled FF_API_PRELOAD cruft
  lavf: remove disabled FF_API_NEW_STREAM cruft
  lavf: remove disabled FF_API_RTSP_URL_OPTIONS cruft
  lavf: remove disabled FF_API_MUXRATE cruft
  lavf: remove disabled FF_API_FILESIZE cruft
  lavf: remove disabled FF_API_TIMESTAMP cruft
  lavf: remove disabled FF_API_LOOP_OUTPUT cruft
  lavf: remove disabled FF_API_LOOP_INPUT cruft
  lavf: remove disabled FF_API_AVSTREAM_QUALITY cruft
  ...

Conflicts:
	doc/APIchanges
	libavcodec/8bps.c
	libavcodec/avcodec.h
	libavcodec/libx264.c
	libavcodec/mjpegbdec.c
	libavcodec/options.c
	libavcodec/sunrast.c
	libavcodec/utils.c
	libavcodec/version.h
	libavcodec/x86/h264_deblock.asm
	libavdevice/libdc1394.c
	libavdevice/v4l2.c
	libavformat/avformat.h
	libavformat/avio.c
	libavformat/avio.h
	libavformat/aviobuf.c
	libavformat/dv.c
	libavformat/mov.c
	libavformat/utils.c
	libavformat/version.h
	libavformat/wtv.c
	libavutil/Makefile
	libavutil/file.c
	libswscale/x86/input.asm
	libswscale/x86/swscale_mmx.c
	libswscale/x86/swscale_template.c
	tests/ref/lavf/ffm

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2012-01-28 07:53:34 +01:00
Ronald S. Bultje
3b15a6d742 config.asm: change %ifdef directives to %if directives.
This allows combining multiple conditionals in a single statement.
2012-01-27 10:19:57 +08:00
Kieran Kunhya
b1766c170c Move x264asm to libavutil.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2011-10-19 20:26:55 +02:00
Michael Niedermayer
1a34478b71 Merge remote-tracking branch 'qatar/master'
* qatar/master:
  Fix NASM include directive
  dsputil_mmx: Honor HAVE_AMD3DNOW
  lavf,lavd: remove all usage of AVFormatParameters from demuxers.
  jack: add 'channels' private option.
  VC-1: fix reading of custom PAR.
  Remove redundant and dubious video codec detection by its extradata
  mpeg12: remove repeat-field code disabled since May 2002
  patch checklist: suggest fate instead of regression tests
  Turn on resampling on sudden size change instead of bailing out during recode.
  avtools: reinitialise filter chain when input video stream changes dimensions

Conflicts:
	Makefile
	avconv.c
	doc/developer.texi
	ffplay.c
	libavcodec/x86/dsputil_mmx.c
	libavdevice/libdc1394.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2011-08-15 23:35:53 +02:00
Dave Yeo
cc73511e8e Fix NASM include directive
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2011-08-15 11:24:35 -07:00
Michael Niedermayer
0cb233cf46 Merge commit 'b2c087871dafc7d030b2d48457ddff597dfd4925'
* commit 'b2c087871dafc7d030b2d48457ddff597dfd4925':
  Move x86util.asm from libavcodec/ to libavutil/.
  Move x86inc.asm to libavutil/.
  APIchanges: note error_recognition in lavf
  lavf: add support for error_recognition, use it in avidec, and bump minor API version
  avconv: change semantics of -map
  avconv: get rid of new* options.
  cmdutils: allow precisely specifying a stream for AVOptions.
  configure: add missing CFLAGS to fix building on the HURD
  libx264: Include hint for possible values for configuring libx264
  cmdutils: allow ':'-separated modifiers in option names.
  avconv: make -map_metadata work consistently with the other options
  avconv: remove deprecated options.
  avconv: make -map_chapters accept only the input file index.
  Make a copy of ffmpeg under a new name -- avconv.
  ffmpeg: add a warning stating that the program is deprecated.
  Add weighted motion compensation for RV40 B-frames
  RV3/4: calculate B-frame motion weights once per frame
  Move RV3/4-specific DSP functions into their own context
  mjpeg: propagate decode errors from ff_mjpeg_decode_sos and ff_mjpeg_decode_dqt
  h264: notice memory allocation failure

Conflicts:
	.gitignore
	Makefile
	cmdutils.c
	configure
	doc/ffplay.texi
	doc/ffprobe.texi
	doc/ffserver.texi
	libavcodec/libx264.c
	libavformat/avformat.h
	libavformat/avidec.c
	libavformat/version.h
	tests/lavf-regression.sh
	tests/lavfi-regression.sh

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2011-08-13 02:56:08 +02:00
Ronald S. Bultje
b2c087871d Move x86util.asm from libavcodec/ to libavutil/.
This allows using it in swscale also.
2011-08-12 11:43:03 -07:00
Ronald S. Bultje
3a39195b1d Move x86inc.asm to libavutil/.
This allows using it in libswscale/ also.
2011-08-12 11:43:02 -07:00
Michael Niedermayer
faba79e080 Merge remote-tracking branch 'qatar/master'
* qatar/master:
  mxfdec: Include FF_INPUT_BUFFER_PADDING_SIZE when allocating extradata.
  H.264: tweak some other x86 asm for Atom
  probe: Fix insane flow control.
  mpegts: remove invalid error check
  s302m: use nondeprecated audio sample format API
  lavc: use designated initialisers for all codecs.
  x86: cabac: add operand size suffixes missing from 6c32576

Conflicts:
	libavcodec/ac3enc_float.c
	libavcodec/flacenc.c
	libavcodec/frwu.c
	libavcodec/pictordec.c
	libavcodec/qtrleenc.c
	libavcodec/v210enc.c
	libavcodec/wmv2dec.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2011-07-30 06:46:08 +02:00
Jason Garrett-Glaser
a3bf7b864a H.264: tweak some other x86 asm for Atom 2011-07-29 12:24:15 -07:00
Michael Niedermayer
c137fdd778 Merge remote-tracking branch 'qatar/master'
* qatar/master:
  swscale: remove misplaced comment.
  ffmpeg: fix streaming to ffserver.
  swscale: split out RGB48 output functions from yuv2packed[12X]_c().
  build: move vpath directives to main Makefile
  swscale: fix JPEG-range YUV scaling artifacts.
  build: move ALLFFLIBS to a more logical place
  ARM: factor some repetitive code into macros
  Fix SVQ3 after adding 4:4:4 H.264 support
  H.264: fix CODEC_FLAG_GRAY
  4:4:4 H.264 decoding support
  ac3enc: fix allocation of floating point samples.

Conflicts:
	ffmpeg.c
	libavcodec/dsputil_template.c
	libavcodec/h264.c
	libavcodec/mpegvideo.c
	libavcodec/snow.c
	libswscale/swscale.c
	libswscale/swscale_internal.h

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2011-06-15 02:15:25 +02:00
Jason Garrett-Glaser
c90b94424c 4:4:4 H.264 decoding support
Note: this is 4:4:4 from the 2007 spec revision, not the previous (now deprecated) 4:4:4 mode in H.264.
2011-06-13 21:16:30 -07:00
Jason Garrett-Glaser
504811baea Roll back 4:4:4 H.264 for now
Needs some ARM/PPC asm modifications.
2011-06-13 13:38:46 -07:00
Jason Garrett-Glaser
c9c493872c 4:4:4 H.264 decoding support
Note: this is 4:4:4 from the 2007 spec revision, not the previous (now deprecated) 4:4:4 mode in H.264.
2011-06-13 12:21:39 -07:00
Michael Niedermayer
cd8cb54990 Merge remote-tracking branch 'qatar/master'
* qatar/master:
  ARM: ac3dsp: optimised update_bap_counts()
  mpegaudiodec: Fix av_dlog() invocation.
  h264/10bit: add HAVE_ALIGNED_STACK checks.
  Update 8-bit H.264 IDCT function names to reflect bit-depth.
  Add IDCT functions for 10-bit H.264.
  mpegaudioenc: Fix broken av_dlog statement.
  Employ correct printf format specifiers, mostly in debug output.
  ARM: fix MUL64 inline asm for pre-armv6

Conflicts:
	libavcodec/mpegaudioenc.c
	libavformat/ape.c
	libavformat/mxfdec.c
	libavformat/r3d.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2011-06-02 05:12:10 +02:00
Daniel Kang
348493db60 Update 8-bit H.264 IDCT function names to reflect bit-depth.
Signed-off-by: Ronald S. Bultje <rbultje@google.com>
2011-05-31 15:02:32 -07:00
Michael Niedermayer
b4bcd1e2f1 Merge remote-tracking branch 'qatar/master'
* qatar/master:
  Fix compilation of iirfilter-test.
  libx264: handle closed GOP codec flag
  lavf: remove duplicate assignment in avformat_alloc_context.
  lavf: use designated initializers for AVClasses.
  flvdec: clenup debug code
  asfdec: fix possible overread on broken files.
  asfdec: do not fall back to binary/generic search
  asfdec: reindent after previous commit c7bd5ed
  asfdec: fallback to binary search internally
  mpegaudio: add _fixed suffix to some names
  Modify x86util.asm to ease transitioning to 10-bit H.264 assembly.
  dct: build dct32 as separate object files
  qdm2: include correct header for rdft

Conflicts:
	ffpresets/libx264-fast.ffpreset
	ffpresets/libx264-fast_firstpass.ffpreset
	ffpresets/libx264-faster.ffpreset
	ffpresets/libx264-faster_firstpass.ffpreset
	ffpresets/libx264-medium.ffpreset
	ffpresets/libx264-medium_firstpass.ffpreset
	ffpresets/libx264-placebo.ffpreset
	ffpresets/libx264-placebo_firstpass.ffpreset
	ffpresets/libx264-slow.ffpreset
	ffpresets/libx264-slow_firstpass.ffpreset
	ffpresets/libx264-slower.ffpreset
	ffpresets/libx264-slower_firstpass.ffpreset
	ffpresets/libx264-superfast.ffpreset
	ffpresets/libx264-superfast_firstpass.ffpreset
	ffpresets/libx264-ultrafast.ffpreset
	ffpresets/libx264-ultrafast_firstpass.ffpreset
	ffpresets/libx264-veryfast.ffpreset
	ffpresets/libx264-veryfast_firstpass.ffpreset
	ffpresets/libx264-veryslow.ffpreset
	ffpresets/libx264-veryslow_firstpass.ffpreset
	libavformat/flvdec.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2011-05-18 05:42:42 +02:00
Daniel Kang
d0005d347d Modify x86util.asm to ease transitioning to 10-bit H.264 assembly.
Arguments for variable size instructions are added to many macros, along
with other various changes. The x86util.asm code was ported from x264.

Signed-off-by: Diego Biurrun <diego@biurrun.de>
2011-05-17 20:44:48 +02:00
Michael Niedermayer
5a153604c9 Merge remote branch 'qatar/master'
* qatar/master:
  Fix FSF address copy paste error in some license headers.
  Add an aac sample which uses LTP to fate-aac.
DUPLICATE  [PATCH] Update pixdesc_be fate refs after adding 9/10bit YUV420P formats.
  arm: properly mark external symbol call

Conflicts:
	libavcodec/x86/ac3dsp.asm
	libavcodec/x86/deinterlace.asm
	libavcodec/x86/dsputil_yasm.asm
	libavcodec/x86/dsputilenc_yasm.asm
	libavcodec/x86/fft_mmx.asm
	libavcodec/x86/fmtconvert.asm
	libavcodec/x86/h264_chromamc.asm
	libavcodec/x86/h264_deblock.asm
	libavcodec/x86/h264_idct.asm
	libavcodec/x86/h264_intrapred.asm
	libavcodec/x86/h264_weight.asm
	libavcodec/x86/vc1dsp_yasm.asm
	libavcodec/x86/vp3dsp.asm
	libavcodec/x86/vp56dsp.asm
	libavcodec/x86/vp8dsp.asm
	libavcodec/x86/x86util.asm
	libswscale/ppc/swscale_template.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2011-05-15 04:44:07 +02:00
Diego Biurrun
888fa31eca Fix FSF address copy paste error in some license headers. 2011-05-14 21:32:31 +02:00
Mans Rullgard
2912e87a6c Replace FFmpeg with Libav in licence headers
Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-03-19 13:33:20 +00:00
Jason Garrett-Glaser
19fb234e4a H.264: split luma dc idct out and implement MMX/SSE2 versions
About 2.5x the speed.

NOTE: the way that the asm code handles large qmuls is a bit suboptimal.
If x264-style dequant was used (separate shift and qmul values), it might
be possible to get some extra speed.

Originally committed as revision 26336 to svn://svn.ffmpeg.org/ffmpeg/trunk
2011-01-14 21:34:25 +00:00
Reimar Döffinger
02b424d9c8 Add d suffix to movd target register to make it work with nasm.
Originally committed as revision 25206 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-09-26 09:15:18 +00:00
Ronald S. Bultje
ae11291865 Unroll loop in h264_idct_add16intra_sse2(). Basically identical to r25171, this
inlines scan8[] and removes loop setup. 15% faster, 0.4% overall.

See "[PATCH] unroll loop in h264_idct_add8_sse2()" thread on ML.

Originally committed as revision 25172 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-09-24 14:07:23 +00:00
Ronald S. Bultje
4bca677494 Unroll loop in h264_idct_add8_sse2(). This means we can inline scan8[] in the
code directly also and remove loop setup. 20% faster in function, 0.8% overall.

See "[PATCH] unroll loop in h264_idct_add8_sse2()" thread on ML.

Originally committed as revision 25171 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-09-24 14:05:45 +00:00
Ronald S. Bultje
1d16a1cf99 Rename h264_idct_sse2.asm to h264_idct.asm; move inline IDCT asm from
h264dsp_mmx.c to h264_idct.asm (as yasm code). Because the loops are now
coded in asm instead of C, this is (depending on the function) up to 50%
faster for cases where gcc didn't do a great job at looping.

Since h264_idct_add8() is now faster than the manual loop setup in h264.c,
in-asm idct calling can now be enabled for chroma as well (see r16207). For
MMX, this is 5% faster. For SSE2 (which isn't done for chroma if h264.c does
the looping), this makes it up to 50% faster. Speed gain overall is ~0.5-1.0%.

Originally committed as revision 25119 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-09-14 13:36:26 +00:00