Commit Graph

237 Commits

Author SHA1 Message Date
Ramiro Polla
326bf69acc fft: mark xmm registers as clobbered in ff_imdct_calc_sse
Originally committed as revision 25363 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-10-06 01:27:02 +00:00
Ronald S. Bultje
dd68d4db43 MMX, MMX2, SSE2 and SSSE3 optimizations for pred16x16/8x8_plane H264 intra
prediction (plus some with different rounding for svq3/rv40). Speedup (for
SSSE3) about ~6-fold, 3.6% faster overall with cathedral sample.

Originally committed as revision 25361 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-10-05 22:06:18 +00:00
İsmail Dönmez
9276bdddca snowdsp: Explicitly state the operand sizes
Fixes compilation with clang's builtin assembler

Patch by İsmail Dönmez, ismail at namtrac dot org

Originally committed as revision 25331 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-10-04 13:08:13 +00:00
Ronald S. Bultje
a52ffc3f54 Move static inline function to a macro, so that constant propagation in
inline asm works for gcc-3.x also (hopefully). Should fix gcc-3.x FATE
breakage after r25254.

Originally committed as revision 25262 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-09-29 17:42:26 +00:00
Eli Friedman
329d689f75 Use sse2 variant of put_pixels16() for no_rnd also. Provides a minor speed
increase to e.g. vc1, snow and mpeg decoding.

Patch by Eli Friedman <eli dot friedman gmail com>.

Originally committed as revision 25259 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-09-29 15:34:43 +00:00
Ronald S. Bultje
cd17285e6c Merge b_idx and edge variables, and optimize the ASM to directly load variables
from memory locations/offsets depending on b_idx plus constants, rather than
having gcc do this. This saves several lea calls and together saves about
10 cycles in h264_loop_filter_strength_mmx2().

Originally committed as revision 25256 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-09-29 14:04:39 +00:00
Ronald S. Bultje
0cc8a5d088 Remove mv_mask variable. Replace the related pand -1/0 instructions by either
a pxor, or remove the instruction alltogether. Altogether, this saves 1
instruction.

Originally committed as revision 25255 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-09-29 14:03:30 +00:00
Ronald S. Bultje
c0673f2cf4 Remove d_idx as a variable, and instead load it as a constant in the asm.
This has no measurable speed effect because the surrounding code doesn't
take advantage of this yet.

Originally committed as revision 25254 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-09-29 14:02:32 +00:00
Ronald S. Bultje
2c3135f6d3 Unroll inner bidir loop in h264_loop_filter_strength_mmx2(), which gets rid
of the d_idx variable and therefore allows for future optimizations. No speed
difference by this commit itself.

Originally committed as revision 25253 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-09-29 13:35:24 +00:00
Ronald S. Bultje
4b81511cab Unloop the outer loop in h264_loop_filter_strength_mmx2(), which allows
inlining various constants within the loop code. 20 cycles faster on
cathedral sample.

Originally committed as revision 25252 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-09-29 13:34:20 +00:00
Reimar Döffinger
02b424d9c8 Add d suffix to movd target register to make it work with nasm.
Originally committed as revision 25206 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-09-26 09:15:18 +00:00
Reimar Döffinger
dc77e985b7 Split and then simplify address generation macro.
Allows nasm to work for this code.

Originally committed as revision 25205 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-09-26 09:08:11 +00:00
Ronald S. Bultje
7e117771cd Remove unused variable.
Originally committed as revision 25173 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-09-24 15:31:46 +00:00
Ronald S. Bultje
ae11291865 Unroll loop in h264_idct_add16intra_sse2(). Basically identical to r25171, this
inlines scan8[] and removes loop setup. 15% faster, 0.4% overall.

See "[PATCH] unroll loop in h264_idct_add8_sse2()" thread on ML.

Originally committed as revision 25172 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-09-24 14:07:23 +00:00
Ronald S. Bultje
4bca677494 Unroll loop in h264_idct_add8_sse2(). This means we can inline scan8[] in the
code directly also and remove loop setup. 20% faster in function, 0.8% overall.

See "[PATCH] unroll loop in h264_idct_add8_sse2()" thread on ML.

Originally committed as revision 25171 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-09-24 14:05:45 +00:00
Måns Rullgård
c0bc8b9afb x86: disable SSE functions using stack when stack is not aligned
This fixes crashes with ICC 10.1.

Originally committed as revision 25153 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-09-21 17:57:21 +00:00
Måns Rullgård
f41237c9db x86: remove hack disabling sse2 h264 loop filter with 32-bit icc
Originally committed as revision 25146 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-09-18 20:44:32 +00:00
Ronald S. Bultje
ada65af9d1 Don't access upper 32 bits of a 32-bit int on 64-bit systems.
Originally committed as revision 25140 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-09-17 12:24:22 +00:00
Ronald S. Bultje
6c3d021891 Properly add HAVE_YASM around yasmified symbols. Should fix compile error
on configurations using --disable-yasm.

Originally committed as revision 25138 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-09-17 03:01:57 +00:00
Ronald S. Bultje
e2e341048e Move hadamard_diff{,16}_{mmx,mmx2,sse2,ssse3}() from inline asm to yasm,
which will hopefully solve the Win64/FATE failures caused by these functions.

Originally committed as revision 25137 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-09-17 01:56:06 +00:00
Ronald S. Bultje
d0acc2d2e9 Move sse16_sse2() from inline asm to yasm. It is one of the functions causing
Win64/FATE issues.

Originally committed as revision 25136 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-09-17 01:44:17 +00:00
Ronald S. Bultje
1d16a1cf99 Rename h264_idct_sse2.asm to h264_idct.asm; move inline IDCT asm from
h264dsp_mmx.c to h264_idct.asm (as yasm code). Because the loops are now
coded in asm instead of C, this is (depending on the function) up to 50%
faster for cases where gcc didn't do a great job at looping.

Since h264_idct_add8() is now faster than the manual loop setup in h264.c,
in-asm idct calling can now be enabled for chroma as well (see r16207). For
MMX, this is 5% faster. For SSE2 (which isn't done for chroma if h264.c does
the looping), this makes it up to 50% faster. Speed gain overall is ~0.5-1.0%.

Originally committed as revision 25119 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-09-14 13:36:26 +00:00
Jason Garrett-Glaser
8acb554aff LGPL SSE2 H.264 iDCT
This leaves no more GPL-only H.264 decoding asm code.

Approved by Loren.

Originally committed as revision 25092 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-09-10 02:25:12 +00:00
Stefano Sabatini
c6c98d0897 Move mm_support() from libavcodec to libavutil, make it a public
function and rename it to av_get_cpu_flags().

Originally committed as revision 25076 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-09-08 15:07:14 +00:00
Reimar Döffinger
b1c32fb5e5 Use "d" suffix for general-purpose registers used with movd.
This increases compatibilty with nasm and is also more consistent,
e.g. with h264_intrapred.asm and h264_chromamc.asm that already
do it that way.

Originally committed as revision 25042 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-09-05 10:10:16 +00:00
Stefano Sabatini
7160bb716b Rename FF_MM_ symbols related to CPU features flags as AV_CPU_FLAG_
symbols, and move them from libavcodec/avcodec.h to libavutil/cpu.h.

Originally committed as revision 25040 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-09-04 09:59:08 +00:00
Ronald S. Bultje
2c166c3af1 Port latest x264 deblock asm (before they moved to using NV12 as internal
format), LGPL'ed with permission from Jason and Loren. This includes mmx2
code, so remove inline asm from h264dsp_mmx.c accordingly.

Originally committed as revision 25031 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-09-03 16:52:46 +00:00
Eli Friedman
a10a9f5cd0 Fix typo in r25019.
Patch by Eli Friedman <eli.friedman at gmail dot com>.

Originally committed as revision 25022 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-09-01 23:19:36 +00:00
Ronald S. Bultje
615da9b1d9 Unscrew breakage after my last commit because of symbol prefixes.
Originally committed as revision 25020 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-09-01 21:10:19 +00:00
Ronald S. Bultje
a33a2562c1 Rename h264_weight_sse2.asm to h264_weight.asm; add 16x8/8x16/8x4 non-square
biweight code to sse2/ssse3; add sse2 weight code; and use that same code to
create mmx2 functions also, so that the inline asm in h264dsp_mmx.c can be
removed. OK'ed by Jason on IRC.

Originally committed as revision 25019 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-09-01 20:56:16 +00:00
Ronald S. Bultje
14bc1f2485 Split h264dsp_mmx.c (which was #included in dsputil_mmx.c) in h264_qpel_mmx.c,
still #included in dsputil_mmx.c and is part of DSPContext, and h264dsp_mmx.c,
which represents H264DSPContext and is now compiled on its own.

Originally committed as revision 25018 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-09-01 20:48:59 +00:00
Ronald S. Bultje
5929b3a651 Fix vertical align.
Originally committed as revision 25009 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-08-31 12:32:24 +00:00
Ronald S. Bultje
79ce0f002e Fix compilation failure if yasm is disabled (missing vp3 symbols).
Originally committed as revision 24992 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-08-30 20:30:40 +00:00
Ronald S. Bultje
de1c253bab Split intra prediction initialization (i.e. assigning of function pointers)
into its own file, it doesn't belong in h264dsp_mmx.c (much less so in
dsputil_mmx.c).

Originally committed as revision 24990 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-08-30 16:34:13 +00:00
Ronald S. Bultje
d0eb5a1174 Move H264 chroma MC from inline asm to yasm. This fixes VP3/5/6 and VC-1
fate failures on Win64.

Originally committed as revision 24989 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-08-30 16:31:04 +00:00
Ronald S. Bultje
e9f5f020c6 Move VP3 IDCT functions from inline ASM to YASM. This fixes part of the VP3/5/6
issues on Win64.

Originally committed as revision 24988 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-08-30 16:25:46 +00:00
Ronald S. Bultje
7e7c4b6008 Put ff_ prefix on non-static {put_signed,put,add}_pixels_clamped_mmx()
functions.

Originally committed as revision 24987 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-08-30 16:22:27 +00:00
Loren Merritt
19d929f9a3 cosmetics in imdct_sse
Originally committed as revision 24958 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-08-28 21:03:13 +00:00
Ronald S. Bultje
4eca52ed19 Fix typos when converting inline asm to yasm, fixes MMX-only fate-ea-vp61.
Originally committed as revision 24948 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-08-26 14:33:39 +00:00
Ronald S. Bultje
6697bc33e2 Revert r24931, it broke Win32 and some BSD compiles (yay fate).
Originally committed as revision 24934 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-08-25 20:36:35 +00:00
Ronald S. Bultje
72f642400b Mark xmm6 and xmm7 as clobbered in ff_vp3_idct_sse2(), which is contributing
to the VP6 fate failures on Win64.

Originally committed as revision 24931 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-08-25 19:57:05 +00:00
Måns Rullgård
69dad87c48 VP6: fix vp6_filter_diag4_mmx/sse on 64-bit
The stride can be negative and must be sign extended before being
used in pointer arithmetic.

Originally committed as revision 24926 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-08-25 15:41:11 +00:00
Ronald S. Bultje
89fa3504ed Move vp6_filter_diag4() x86 SIMD code from inline ASM to YASM. This should
help in fixing the Win64 fate failures.

Originally committed as revision 24922 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-08-25 13:44:16 +00:00
Ronald S. Bultje
3a0885146c Move vp6_filter_diag4() from DSPContext to VP56DSPContext.
Originally committed as revision 24921 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-08-25 13:42:28 +00:00
Måns Rullgård
c0ec9918b0 Remove global mm_flags variable
Originally committed as revision 24909 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-08-24 17:47:05 +00:00
Ronald S. Bultje
3611c45ab7 Mark xmm registers as clobbered in simple loopfilter. Should fix the last
two VP8-related fate failures on Win64.

Originally committed as revision 24908 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-08-24 16:52:27 +00:00
Alex Converse
cb4f12466b imdct/x86: Use "s->mdct_size" instead of "1 << s->mdct_bits".
It generates smaller cleaner code.

Originally committed as revision 24887 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-08-23 15:51:09 +00:00
Ronald S. Bultje
684d608bde Fix segfaults in VP8 SIMD code on Win64 (and FATE/win64 failures).
Originally committed as revision 24871 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-08-23 02:41:22 +00:00
Alex Converse
78b5c97d3e Convert ff_imdct_half_sse() to yasm.
This is to avoid split asm sections that attempt to preserve some
registers between sections.

Originally committed as revision 24869 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-08-22 14:39:58 +00:00
Jason Garrett-Glaser
05c04cdf54 VP5/6/8: ~7% faster arithmetic decoding
Grab from the bitstream in 16-bit chunks instead of 8-bit chunks.
TODO: grab in 32-bit chunks on 64-bit systems.

Originally committed as revision 24783 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-08-12 01:11:32 +00:00