Commit Graph

74 Commits

Author SHA1 Message Date
Ronald S. Bultje
e9e456d850 VP8 MBedge loopfilter MMX/MMX2/SSE2 functions for both luma (width=16)
and chroma (width=8).

Originally committed as revision 24378 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-20 22:58:56 +00:00
Ronald S. Bultje
268821e76e Chroma (width=8) inner loopfilter MMX/MMX2/SSE2 for VP8 decoder.
Originally committed as revision 24377 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-20 22:04:18 +00:00
Ronald S. Bultje
c60ed66dbe Revert r24339 (it causes fate failures on x86-64) - I'll figure out what's
wrong with it tomorrow or so, then re-submit.

Originally committed as revision 24341 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-19 23:57:09 +00:00
Ronald S. Bultje
1878f685c0 Implement chroma (width=8) inner loopfilter MMX/MMX2/SSE2 functions.
Originally committed as revision 24339 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-19 21:53:28 +00:00
Ronald S. Bultje
fb9bdf048c Be more efficient with registers or stack memory. Saves 8/16 bytes stack
for x86-32, or 2 MM registers on x86-64.

Originally committed as revision 24338 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-19 21:45:36 +00:00
Ronald S. Bultje
3facfc99da Change function prototypes for width=8 inner and mbedge loopfilter functions
so that it does both U and V planes at the same time. This will have speed
advantages when using SSE2 (or higher) optimizations, since we can do both
the U and V rows together in a single xmm register.

This also renames filter16 to filter16y and filter8 to filter8uv so that it's
more obvious what each function is used for.

Originally committed as revision 24337 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-19 21:18:04 +00:00
Ronald S. Bultje
819b2dd2b1 Attempt to fix x86-64 testsuite on fate.
Originally committed as revision 24275 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-16 21:35:30 +00:00
Ronald S. Bultje
6f323f1251 Remove duplicate define.
Originally committed as revision 24272 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-16 19:54:47 +00:00
Ronald S. Bultje
889b2c26ee Revert 24270, it contained some stuff that shouldn't have been in there.
Originally committed as revision 24271 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-16 19:54:25 +00:00
Ronald S. Bultje
2356a7834b Remove duplicate define.
Originally committed as revision 24270 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-16 19:42:32 +00:00
Ronald S. Bultje
ede1b9665a Give x86 r%d registers names, this will simplify implementation of the chroma
inner loopfilter, and it also allows us to save one register on x86-64/sse2.

Originally committed as revision 24269 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-16 19:38:10 +00:00
Ronald S. Bultje
526e831a46 Change return statement, the REP_RET is a mistake since the else case (x86-64,
sse2) doesn't actually loop, so REP_RET isn't necessary.

Originally committed as revision 24268 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-16 18:29:14 +00:00
Ronald S. Bultje
a711eb4829 VP8 H/V inner loopfilter MMX/MMXEXT/SSE2 optimizations.
Originally committed as revision 24250 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-15 23:02:34 +00:00
Ronald S. Bultje
f2a30bd840 Simple H/V loopfilter for VP8 in MMX, MMX2 and SSE2 (yay for yasm macros).
Originally committed as revision 24029 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-03 19:26:30 +00:00
Jason Garrett-Glaser
b06855f18a SSSE3 versions of vp8 width4 bilinear MC functions
Originally committed as revision 24013 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-03 00:48:12 +00:00
Jason Garrett-Glaser
dcc602d802 SSSE3 versions of width4 VP8 6-tap MC functions
Also make some small changes to saturation order of 4-tap SSSE3 MC to fix a
non-bitexactness bug.

Patch mostly by Eli Friedman <eli.friedman AT gmail DOT com>.

Originally committed as revision 23965 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-02 05:27:41 +00:00
Jason Garrett-Glaser
82a8d0f114 Use add instead of lshift in mmxext vp8 idct
Originally committed as revision 23891 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-06-29 17:23:17 +00:00
Ronald S. Bultje
565344e7e4 Remove unused macros (duplicates from the now-LGPL x86util.asm).
Originally committed as revision 23890 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-06-29 17:04:29 +00:00
Ronald S. Bultje
2dd2f71692 MMX idct_add for VP8.
Originally committed as revision 23886 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-06-29 14:43:11 +00:00
Jason Garrett-Glaser
004cda8e79 Add mmxext version of VP8 DC Hadamard transform
Originally committed as revision 23878 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-06-29 01:41:59 +00:00
Jason Garrett-Glaser
a912da761d Fix VP8 bilinear mc on x86_64
Originally committed as revision 23872 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-06-28 22:13:14 +00:00
Jason Garrett-Glaser
0fecad09fe Add x86 asm functions for VP8 put_pixels
Originally committed as revision 23858 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-06-28 19:14:40 +00:00
Jason Garrett-Glaser
a173aa8940 Add MMX, SSE2, SSSE3 asm for VP8 bilinear MC
Originally committed as revision 23857 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-06-28 18:56:24 +00:00
Jason Garrett-Glaser
0178d14fe5 First shot at VP8 optimizations:
- MMXEXT, SSE2 and SSSE3 MC functions
- MMX and SSE4 IDCT dc_add functions

Patch by Jason Garrett-Glaser <darkshikari gmail com> and myself.

Originally committed as revision 23815 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-06-27 02:01:45 +00:00