Commit Graph

28 Commits

Author SHA1 Message Date
Timo Rothenpieler
416923346a compat/cuda: switch from powf to __powf intrinsic
The powf builtin causes crashes on older clang, so manually implement
the (faster) intrinsic.
The code it spawns is identical to that of nvcc.
2022-09-03 20:27:34 +02:00
Mohamed Khaled Mohamed
1a5cd79f51 avfilter: add bilateral_cuda filter
GSoC 2022

Signed-off-by: Mohamed Khaled <mohamed.elbassiony00@eng-st.cu.edu.eg>
Signed-off-by: Timo Rothenpieler <timo@rothenpieler.org>
2022-09-03 15:18:56 +02:00
Mohamed Khaled Mohamed
b1648150b2 avfilter: add chromakey_cuda filter
GSoC'22

libavfilter/vf_chromakey_cuda.cu:the CUDA kernel for the filter
libavfilter/vf_chromakey_cuda.c: the C side that calls the kernel and gets user input
libavfilter/allfilters.c: added the filter to it
libavfilter/Makefile: added the filter to it
cuda/cuda_runtime.h: added two math CUDA functions that are used in the filter

Signed-off-by: Timo Rothenpieler <timo@rothenpieler.org>
2022-07-10 17:20:15 +02:00
Timo Rothenpieler
acd3c101ef compat/cuda: add __expf() implementation 2021-08-14 15:06:47 +02:00
Timo Rothenpieler
072788c46e avfilter: compress CUDA PTX code if possible 2021-06-22 14:05:44 +02:00
Matt Oliver
b57037d663 compat/cuda: correct ushort4 to use ushort
Signed-off-by: Timo Rothenpieler <timo@rothenpieler.org>
2021-02-22 17:03:52 +01:00
rcombs
eabf5e6d6b All: update names in copyright headers 2021-01-20 01:02:56 -06:00
Timo Rothenpieler
cfdddec0c8 avfilter/scale_cuda: add lanczos algorithm 2020-11-04 01:43:21 +01:00
Timo Rothenpieler
f1d0f83712 avfilter/scale_cuda: add bicubic interpolation 2020-11-03 19:58:13 +01:00
rcombs
fb17ba86a8 compat/cuda/ptx2c: remove shell loop; fix BSD sed compat
This fixes building on macOS, and improves build times dramatically there
2020-06-01 22:10:41 -05:00
Andreas Rheinhardt
b307d74fe6 compat/cuda: Change inclusion guards
cuda_runtime.h as well as dynlink_loader.h used nonstandard inclusion
guards with an AV_ prefix, although these files are not in an libav*/
path. So change the inclusion guards and adapt the ref file of the
source fate test accordingly.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
Signed-off-by: Timo Rothenpieler <timo@rothenpieler.org>
2019-08-05 12:07:09 +02:00
Rodger Combs
01994c93db build: add support for building CUDA files with clang
This avoids using the CUDA SDK at all; instead, we provide a minimal
reimplementation of the basic functionality that lavfi actually uses.
It generates very similar code to what NVCC produces.

The header contains no implementation code derived from the SDK.
The function and type declarations are derived from the SDK only to the
extent required to build a compatible implementation. This is generally
accepted to qualify as fair use.

Because this option does not require the proprietary SDK, it does not require
the "--enable-nonfree" flag in configure.

Signed-off-by: Timo Rothenpieler <timo@rothenpieler.org>
2019-08-04 19:08:08 +02:00
Timo Rothenpieler
a6818d5bd0 compat/cuda/ptx2c: don't drop final newline 2019-05-24 19:23:39 +02:00
Timo Rothenpieler
27cbbbb33f compat: remove in-tree NVidia headers
External headers are no longer welcome in the ffmpeg codebase because they
increase the maintenance burden. However, in the NVidia case the vanilla
headers need some modifications to be usable in ffmpeg therefore we still
provide them, but in a separate repository.

The external headers can be found at
https://git.videolan.org/?p=ffmpeg/nv-codec-headers.git

Fate-source is updated because of the deleted files, and dynlink_loader.h
license headers were updated with the standard FFmpeg headers.

Signed-off-by: Marton Balint <cus@passwd.hu>
Signed-off-by: Timo Rothenpieler <timo@rothenpieler.org>
2018-02-27 16:22:12 +01:00
Mark Thompson
1dc483a6f2 compat/cuda: Pass a logging context to load functions
Reviewed-by: Timo Rothenpieler <timo@rothenpieler.org>
2017-11-20 15:47:05 +00:00
Ricardo Constantino
7fbc082577 compat/cuda/ptx2c: strip CR from each line
Windows nvcc + cl.exe produce a .ctx file with CR+LF newlines which
need to be stripped to work with gcc.

Signed-off-by: Timo Rothenpieler <timo@rothenpieler.org>
2017-08-30 11:20:34 +02:00
Timo Rothenpieler
f890a6d712 compat/cuda: make cuvidGetDecoderCaps optional 2017-06-01 12:39:06 +02:00
Timo Rothenpieler
88896c4619 compat/cuda/ptx2c: remove bashism and harden against arbitrary input 2017-05-15 18:54:38 +02:00
Timo Rothenpieler
f1ab71b046 build: add support for building .cu files via nvcc
Original work by Yogender Gupta <ygupta@nvidia.com>
2017-05-15 11:46:50 +02:00
Timo Rothenpieler
f15129a44b compat/cuda: fix cast warnings on windows 2017-05-09 18:38:30 +02:00
Timo Rothenpieler
17f63d98e6 compat/cuda: update cuvid/nvdec headers to Video Codec SDK 8.0.14
This raises the required minimum NVIDIA display driver versions:
NVIDIA Linux display driver 378.13 or newer
NVIDIA Windows display driver 378.66 or newer
2017-05-09 18:38:30 +02:00
Timo Rothenpieler
b27be563a8 compat/cuda: fix ulong size on cygwin 2017-03-01 12:08:34 +01:00
Philip Langdale
81147b5596 avcodec/cuvid: Add support for P010/P016 as an output surface format
The nvidia 375.xx driver introduces support for P016 output surfaces,
for 10bit and 12bit HEVC content (it's also the first driver to support
hardware decoding of 12bit content).

The cuvid api, as far as I can tell, only declares one output format
that they appear to refer to as P016 in the driver strings. Of course,
10bit content in P016 is identical to P010, and it is useful for
compatibility purposes to declare the format to be P010 to work with
other components that only know how to consume P010 (and to avoid
triggering swscale conversions that are lossy when they shouldn't be).

For simplicity, this change does not maintain the previous ability
to output dithered NV12 for 10/12 bit input video - the user will need
to update their driver to decode such videos.
2016-11-22 10:09:30 -08:00
Timo Rothenpieler
d9ad18f3b4 avcodec/cuvid: use dynamically loaded CUDA/CUVID
And remove the now obsolete compat headers.
2016-11-22 10:34:27 +01:00
Timo Rothenpieler
5c02d2827b compat/cuda: add dynamic loader 2016-11-22 10:34:27 +01:00
Timo Rothenpieler
7904859fd8 compat/cuda: convert to unix line endings 2016-09-23 11:43:00 +02:00
Philip Langdale
843aff3cf7 cuvid: Use bundled headers
We need to remove the dynlink fanciness and replace it with normal
function prototypes and update the include paths and configure logic.

We don't need to explicitly check for PICPARMS now - they're going
to be there.
2016-09-22 18:38:51 -07:00
Philip Langdale
f59e10b0f4 cuvid: Add MIT licenced nvcuid headers from Video SDK 7.0
For unknown reasons, the only accurately descriptive version of
cuviddec.h is in the Video SDK - the one in CUDA 7.5 lacks vp8
PICPARAMS and the vp9 struct definition is inaccurate. The CUDA 8 RC
includes an ancient version of this file from many many years go.

However, the one in the Video SDK is modified to work through a
dynamic link mechanism which we don't really want to use, so the
next change will modify the files to just declare functions in
the normal way.

I've split the changes so it's clear to see what changed between
the original files and ones that work for us.
2016-09-22 18:38:36 -07:00