Previously image attributes were passed as implicit parameters located at the end of the kernel input vector: clover appended the image metadata regardless of the target on which the kernel would run. It would be beneficial to allow more flexibility to clover, since some kind of targets may already have means of querying image dimensions or format, and uploading the attributes as implicit parameters is unnecessary for them. To this end the implicit arguments holding the image attributes are now added to the function signature at IR level immediately after the image argument to which they belong. Calls to attribute getter functions are replaced by reads from these new arguments. This transformation is implemented as an IR pass in the AMDGPU target. Since llvm function signatures can't be changed directly, the kernel functions are recreated with the new signature. The pass also handles image and (partially) sampler resource IDs.
Let's look at the implementation of the pass in a bit more detail. The pass is based on the opencl.kernels
named metadata node; no changes are made to modules without proper metadata, and only the kernels listed under the opencl.kernels
node are transformed.
For each function found in the node, the kernel_arg_type
metadata is scanned for image2d_t
and image3d_t
argument types. For each of those kernel arguments, two new arguments are added immediately following the image argument: one of type <3 x i32>
for the image size (width, height and depth), and one of type <2 x i32>
for the image format (channel data type and channel order). The kernel function is recreated with the new signature and the body is copied over. Metadata is added to the new arguments; the implicit args are marked by the types __llvm_image_size
and __llvm_image_format
.
After the function with implicit arguments is in place, the uses of image and sampler arguments are scanned for the following function calls:
llvm.OpenCL.image.get.size*
llvm.OpenCL.image.get.format*
llvm.OpenCL.image.get.resource.id*
llvm.OpenCL.sampler.get.resource.id
The stars in the list above indicate that different image getter function has to be used for 2d and 3d images because of the type difference.
The resource IDs are determined by the index of the argument value within the kernel signature. Image arguments are grouped by access qualifier to read_only and write_only groups; the resource ID is the argument index within the group. E.g. in case of
__kernel void foo(read_only image2d_t a, read_only image3d_t b, write_only image2d_t c, write_only image3d_t d)
the resource ID of a
and c
is 0, and for b
and d
is 1. The resource IDs of sampler_t
arguments are calculated similarly. Samplers declared as module globals or kernel locals are not handled yet.
Now, that the implicit arguments are present at IR level with proper metadata, clover is able to look for them, and upload the image attributes only if necessary. See this commit.
Note: because of the fact that kernels are recreated during the pass described above, this change is necessary in clover.
The implementation of the OpenCL builtins in libclc looks like this.