Commit Graph

144 Commits

Author SHA1 Message Date
Clément Bœsch
92cb9a3869 Merge commit '9064777dbb335ab4809ae09e3fdcc0245f925cdc'
* commit '9064777dbb335ab4809ae09e3fdcc0245f925cdc':
  checkasm: add HEVC test for testing IDCT DC

Merged-by: Clément Bœsch <cboesch@gopro.com>
2017-02-02 11:40:58 +01:00
Clément Bœsch
a0860b0a38 Merge commit '6f9e34baea4f6f484392e4e67f606a0835d07b73'
* commit '6f9e34baea4f6f484392e4e67f606a0835d07b73':
  arm: Check for support for the .fpu directive

Merged-by: Clément Bœsch <cboesch@gopro.com>
2017-02-02 11:22:04 +01:00
Clément Bœsch
9f1c81e5ec Merge commit '71a0472114574993df7035f4de9aa007e03817b8'
* commit '71a0472114574993df7035f4de9aa007e03817b8':
  checkasm: arm: report the first clobbered register in checkasm_checked_call

Also includes 446353ea18, 59aeed93e4, and 37961044c6 to avoid breaking
too much stuff.

Merged-by: Clément Bœsch <u@pkh.me>
2017-01-24 19:21:29 +01:00
Martin Storsjö
388f6e6715 arm: vp9itxfm: Skip empty slices in the first pass of idct_idct 16x16 and 32x32
This work is sponsored by, and copyright, Google.

Previously all subpartitions except the eob=1 (DC) case ran with
the same runtime:

                                     Cortex A7       A8       A9      A53
vp9_inv_dct_dct_16x16_sub16_add_neon:   3188.1   2435.4   2499.0   1969.0
vp9_inv_dct_dct_32x32_sub32_add_neon:  18531.7  16582.3  14207.6  12000.3

By skipping individual 4x16 or 4x32 pixel slices in the first pass,
we reduce the runtime of these functions like this:

vp9_inv_dct_dct_16x16_sub1_add_neon:     274.6    189.5    211.7    235.8
vp9_inv_dct_dct_16x16_sub2_add_neon:    2064.0   1534.8   1719.4   1248.7
vp9_inv_dct_dct_16x16_sub4_add_neon:    2135.0   1477.2   1736.3   1249.5
vp9_inv_dct_dct_16x16_sub8_add_neon:    2446.7   1828.7   1993.6   1494.7
vp9_inv_dct_dct_16x16_sub12_add_neon:   2832.4   2118.3   2266.5   1735.1
vp9_inv_dct_dct_16x16_sub16_add_neon:   3211.7   2475.3   2523.5   1983.1
vp9_inv_dct_dct_32x32_sub1_add_neon:     756.2    456.7    862.0    553.9
vp9_inv_dct_dct_32x32_sub2_add_neon:   10682.2   8190.4   8539.2   6762.5
vp9_inv_dct_dct_32x32_sub4_add_neon:   10813.5   8014.9   8518.3   6762.8
vp9_inv_dct_dct_32x32_sub8_add_neon:   11859.6   9313.0   9347.4   7514.5
vp9_inv_dct_dct_32x32_sub12_add_neon:  12946.6  10752.4  10192.2   8280.2
vp9_inv_dct_dct_32x32_sub16_add_neon:  14074.6  11946.5  11001.4   9008.6
vp9_inv_dct_dct_32x32_sub20_add_neon:  15269.9  13662.7  11816.1   9762.6
vp9_inv_dct_dct_32x32_sub24_add_neon:  16327.9  14940.1  12626.7  10516.0
vp9_inv_dct_dct_32x32_sub28_add_neon:  17462.7  15776.1  13446.2  11264.7
vp9_inv_dct_dct_32x32_sub32_add_neon:  18575.5  17157.0  14249.3  12015.1

I.e. in general a very minor overhead for the full subpartition case due
to the additional loads and cmps, but a significant speedup for the cases
when we only need to process a small part of the actual input data.

In common VP9 content in a few inspected clips, 70-90% of the non-dc-only
16x16 and 32x32 IDCTs only have nonzero coefficients in the upper left
8x8 or 16x16 subpartitions respectively.

This is cherrypicked from libav commit
9c8bc74c2b.

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2017-01-14 21:13:30 +01:00
Ronald S. Bultje
1c8fbd7b90 checkasm/vp9: benchmark all sub-IDCTs (but not WHT or ADST). 2016-12-27 10:02:33 -05:00
Hendrik Leppkes
286d8bae61 Merge commit '7b1ae0e73ab7f7c5eabc70dbe2e579127c6e154f'
* commit '7b1ae0e73ab7f7c5eabc70dbe2e579127c6e154f':
  checkasm/arm: preserve the stack alignment checkasm_checked_call

Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
2016-11-17 15:21:32 +01:00
Hendrik Leppkes
c0af1ee90d Merge commit '80fbb7becae530167373fe5178966b7d7604306e'
* commit '80fbb7becae530167373fe5178966b7d7604306e':
  checkasm: vp8.mc: initialize the full src buffer after ec32574209

Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
2016-11-17 15:20:10 +01:00
Hendrik Leppkes
90b72f6bda Merge commit '8c816c0c9b12fdefd9046415e97df299880bc9b8'
* commit '8c816c0c9b12fdefd9046415e97df299880bc9b8':
  checkasm/arm: align the clobber check data properly for ldrd

Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
2016-11-17 15:06:10 +01:00
Hendrik Leppkes
4fe013fc70 Merge commit 'ec32574209f36467ef0d22c21a7e811ba98c15b6'
* commit 'ec32574209f36467ef0d22c21a7e811ba98c15b6':
  checkasm: vp8: mc: test unequal width/height for partitions

Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
2016-11-17 15:05:25 +01:00
Hendrik Leppkes
47f75839e4 Merge commit 'f8d17d53957056c053a46f9320fa7ae6fe1479a5'
* commit 'f8d17d53957056c053a46f9320fa7ae6fe1479a5':
  checkasm: Add tests for vp8dsp

Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
2016-11-14 15:29:08 +01:00
Hendrik Leppkes
f75035b06f Merge commit 'e48746deec48e9ff195841bc3266b4e153a878cd'
* commit 'e48746deec48e9ff195841bc3266b4e153a878cd':
  checkasm: h264dsp: Move the x and y variables into the randomize_buffer macro

Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
2016-11-13 23:02:39 +01:00
Hendrik Leppkes
6fc74934de Merge commit 'dc7501e524dc3270335749302c7aa449973625f3'
* commit 'dc7501e524dc3270335749302c7aa449973625f3':
  checkasm: Issue emms after benchmarking functions

Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
2016-10-07 13:18:05 +02:00
Martin Storsjö
2e95054ebb checkasm: h264dsp: Initialize the padding area
This fixes valgrind warnings about conditional jumps based on
uninitialized data (even though the uninitialized data only ever
was compared with a direct copy of the same uninitialized data).

Signed-off-by: Martin Storsjö <martin@martin.st>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2016-08-11 19:55:16 +02:00
James Almer
54a0a52be1 checkasm/vp9dsp: use declare_func_emms in check_loopfilter
Fixes checkasm failures on mmxext functions

Signed-off-by: James Almer <jamrial@gmail.com>
2016-07-26 22:16:21 -03:00
Alexandra Hájková
9064777dbb checkasm: add HEVC test for testing IDCT DC
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-07-22 19:08:12 +02:00
Martin Storsjö
6f9e34baea arm: Check for support for the .fpu directive
When targeting COFF (windows), clang doesn't support this
directive (while binutils supports it for all targets).

Signed-off-by: Martin Storsjö <martin@martin.st>
2016-07-21 12:52:10 +03:00
Martin Storsjö
37961044c6 checkasm: arm: Ignore changes to bits 0-4 and 7 of FPSCR
These bits are set by exceptions in NEON instructions.

Also print the differing bits when FPSCR is clobbered,
and use bic instead of lsl, for clearing the topmost bits.

Signed-off-by: Martin Storsjö <martin@martin.st>
2016-07-17 21:48:17 +03:00
Janne Grunau
59aeed93e4 cheackasm/arm: remove NEON instructions from checkasm_checked_call_vfp
Fixes AS error on non NEON builds introduced in 71a0472114. Also
set the fpu directly to vfp in checkasm.S to cause build errors on NEON
builds.
2016-07-17 11:28:21 +02:00
Martin Storsjö
446353ea18 checkasm: arm: Don't start new const blocks for each string
Each const block needs to be terminated by one endconst
invocation so either call endconst after each, or just
declare plain labels to the later strings.

This fixes errors such as this, on some binutils versions:

checkasm.S:38: Error: Macro `endconst' was already defined

Signed-off-by: Martin Storsjö <martin@martin.st>
2016-07-17 12:21:19 +03:00
Janne Grunau
71a0472114 checkasm: arm: report the first clobbered register in checkasm_checked_call 2016-07-16 12:57:18 +02:00
Janne Grunau
7b1ae0e73a checkasm/arm: preserve the stack alignment checkasm_checked_call
The stack used by checkasm_checked_call_vfp was a multiple of 4 when the
checked function is called. AAPCS requires a double word (8 byte)
aligned stack public interfaces. Since both calls are public interfaces
the stack is misaligned when the checked is called.

Might fix the SIGBUS error in the armv7-linux-clang-3.7 fate config.
2016-07-13 22:18:53 +02:00
Janne Grunau
80fbb7beca checkasm: vp8.mc: initialize the full src buffer after ec32574209
Fixes "Use of uninitialised value" valgrind warnings in checkasm.
2016-07-13 22:18:52 +02:00
Matthieu Bouron
a91c330a29 Merge commit '105998fb5ca3c343f5c8cb39ce3197f87a5e4d36'
* commit '105998fb5ca3c343f5c8cb39ce3197f87a5e4d36':
  checkasm: Add tests for h264 idct

Merged-by: Matthieu Bouron <matthieu.bouron@stupeflix.com>
2016-07-13 17:22:29 +02:00
Matthieu Bouron
495a40cecb tests/checkasm: reduce cosmetic diff with libav
Chunk was not merged in ca5ec2bf51.
2016-07-13 17:11:58 +02:00
Janne Grunau
8c816c0c9b checkasm/arm: align the clobber check data properly for ldrd
Should fix the SIGBUS in the armv7-linux-clang-3.7 fate target.
2016-07-10 13:35:41 +02:00
Janne Grunau
ec32574209 checkasm: vp8: mc: test unequal width/height for partitions 2016-07-10 13:35:41 +02:00
Martin Storsjö
f8d17d5395 checkasm: Add tests for vp8dsp
The tests are inspired by similar tests for vp9 by
Ronald Bultje.

Signed-off-by: Martin Storsjö <martin@martin.st>
2016-07-08 14:10:46 +03:00
Michael Niedermayer
fb6b6b5166 tests/checkasm/pixblockdsp: Test 8 byte aligned positions
The code is documented as to require 8byte alignment

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2016-07-02 22:21:53 +02:00
Martin Storsjö
67cb2c0f73 checkasm: hevc: Iterate over features first, then over bitdepths
This avoids listing the same feature multiple times in the
test output. Previously the output contained something like this:

SSE2:
 - hevc_mc.qpel              [OK]
 - hevc_mc.epel              [OK]
 - hevc_mc.unweighted_pred   [OK]
 - hevc_mc.qpel              [OK]
 - hevc_mc.epel              [OK]
 - hevc_mc.unweighted_pred   [OK]

Signed-off-by: Martin Storsjö <martin@martin.st>
2016-06-29 21:12:05 +03:00
Martin Storsjö
e48746deec checkasm: h264dsp: Move the x and y variables into the randomize_buffer macro
This avoids the risk of accidentally clobbering such variables outside
of the macro if the same variables are used there.

Signed-off-by: Martin Storsjö <martin@martin.st>
2016-06-28 14:24:04 +03:00
Martin Storsjö
e57de6faa1 checkasm: h264dsp: Initialize the padding area
This fixes valgrind warnings about conditional jumps based on
uninitialized data (even though the uninitialized data only ever
was compared with a direct copy of the same uninitialized data).

Signed-off-by: Martin Storsjö <martin@martin.st>
2016-06-28 14:24:01 +03:00
Clément Bœsch
5558ff3a9f Merge commit '257f00ec1ab06a2a161f535036c6512f3fc8e801'
* commit '257f00ec1ab06a2a161f535036c6512f3fc8e801':
  Split global .gitignore file into per-directory files

Merged-by: Clément Bœsch <clement@stupeflix.com>
2016-06-22 11:28:51 +02:00
Clément Bœsch
8ef57a0d61 Merge commit '41ed7ab45fc693f7d7fc35664c0233f4c32d69bb'
* commit '41ed7ab45fc693f7d7fc35664c0233f4c32d69bb':
  cosmetics: Fix spelling mistakes

Merged-by: Clément Bœsch <u@pkh.me>
2016-06-21 21:55:34 +02:00
Martin Storsjö
dc7501e524 checkasm: Issue emms after benchmarking functions
The functions may not clean up properly after using MMX
registers. For the normal testing calls, the checkasm_checked_call
functions will do the cleanup (and check that functions that
should clean up do it as well), but when benchmarking functions
that don't clean up, we don't currently properly clean up at all.

This causes issues if a benchmarked function is followed by testing
of a function that is supposed to not clobber the MMX/FPU state but
doesn't touch it at all.

Signed-off-by: Martin Storsjö <martin@martin.st>
2016-06-21 22:09:29 +03:00
Martin Storsjö
105998fb5c checkasm: Add tests for h264 idct
The tests are inspired by similar tests for vp9 by
Ronald Bultje.

Signed-off-by: Martin Storsjö <martin@martin.st>
2016-06-17 21:37:56 +03:00
Diego Biurrun
257f00ec1a Split global .gitignore file into per-directory files 2016-05-13 14:55:56 +02:00
Derek Buitenhuis
ca5ec2bf51 Merge commit '01621202aad7e27b2a05c71d9ad7a19dfcbe17ec'
* commit '01621202aad7e27b2a05c71d9ad7a19dfcbe17ec':
  build: miscellaneous cosmetics

Merged-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2016-05-09 16:25:28 +01:00
Vittorio Giovara
41ed7ab45f cosmetics: Fix spelling mistakes
Signed-off-by: Diego Biurrun <diego@biurrun.de>
2016-05-04 18:16:21 +02:00
Michael Niedermayer
3c0511f29e tests/checkasm/vf_colorspace: Make bpp_mask const
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2016-04-13 22:39:41 +02:00
Michael Niedermayer
4d59d075a9 tests/checkasm/vf_colorspace: Fix dst array sizes
Suggested & Approved by: BBB
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2016-04-12 23:46:52 +02:00
Ronald S. Bultje
5ce703a6bf vf_colorspace: x86-64 SIMD (SSE2) optimizations. 2016-04-12 16:42:48 -04:00
Diego Biurrun
01621202aa build: miscellaneous cosmetics
Restore alphabetical order in lists, break overly long lines, do some
prettyprinting, add some explanatory section comments, group parts
together that belong together logically.
2016-04-07 15:26:08 +02:00
Derek Buitenhuis
ca408cf557 Merge commit '7c82d31cbe9fc5d5a321ad49c14a472bd629b50f'
* commit '7c82d31cbe9fc5d5a321ad49c14a472bd629b50f':
  checkasm: Use standard multiple inclusion guards

Merged-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2016-02-24 17:36:52 +00:00
James Almer
26034929d5 checkasm: bench each vf_blend mode once
Also bench a smaller buffer. This drastically reduces --bench runtime
and reports smaller, more readable numbers.

Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: James Almer <jamrial@gmail.com>
2016-02-22 13:54:07 -03:00
James Almer
76af0c7877 checkasm: fix dependencies for vf_blend tests
They will now compile if avcodec is disabled

Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2016-02-19 16:31:55 -03:00
Diego Biurrun
7c82d31cbe checkasm: Use standard multiple inclusion guards 2016-02-18 15:35:44 +01:00
Timothy Gu
ebf648d490 checkasm/vf_blend: Decrease iteration count
The test is already slow.
2016-02-14 10:48:24 -08:00
Timothy Gu
a953a2991e checkasm: Add vf_blend tests 2016-02-14 10:46:56 -08:00
Timothy Gu
180f9a0958 all: Make header guard names consistent 2016-01-31 15:44:11 -08:00
foo86
ae5b2c5250 avcodec/dca: add new decoder based on libdcadec 2016-01-31 17:09:38 +01:00