Monday 10 August 2015

Week 11 - 2D Image reading and resource management

Previous week I've modified the existing r600 sampler state setup code so that compute shaders can use it, and fixed some resource management issues. Image reading now works using CL_INTENSITY with CL_FLOAT and CL_RGBA with CL_UNSIGNED_INT8 formats (these are ones I've tried). The former was tested using both nearest and linear filtering mode.

About sampler state setup: clover now uploads the sampler bitfield as the sampler argument (commit) to allow the libclc implementation to branch on sampler fields, particularly whether it uses normalized coordinates (read_image* builtins). OpenCL C constants have been added to the clc headers (image_defines.h).

About the resource management issues: I've noticed valgrind errors when running my test kernel [1]. Valgrind detected reads from already freed memory, like this one:

==13999== Invalid read of size 2
==13999==    at 0x4C2ED06: memcpy@@GLIBC_2.14 (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==13999==    by 0xB3A74B0: radeon_emit_array (radeon_winsys.h:680)
==13999==    by 0xB3AD368: evergreen_emit_sampler_views (evergreen_state.c:2047)
==13999==    by 0xB3AD4F8: evergreen_emit_cs_sampler_views (evergreen_state.c:2085)
[...]
==13999==  Address 0xa8f382c is 76 bytes inside a block of size 112 free'd
==13999==    at 0x4C2B200: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==13999==    by 0xB3DF2A1: r600_sampler_view_destroy (r600_state_common.c:368)
==13999==    by 0x504A8DA: clover::resource::unbind_sampler_view(clover::command_queue&, pipe_sampler_view*) (resource.cpp:96)
==13999==    by 0x5029A7D: clover::kernel::image_rd_argument::unbind(clover::kernel::exec_context&) (kernel.cpp:529)
==13999==    by 0x5028808: clover::kernel::exec_context::unbind() (kernel.cpp:235)
[...]

Furthermore, there were other valgrind errors regarding RAT setup of write-only images. After asking around on mesa-dev and messing with the code using gdb it became clear that the memory used by compute resources are managed by clover, and shouldn't be freed inside the driver. The lifetime of textures are bound to the lifetime of the mem object; the driver-side data describing surfaces and sampler views are created before and destroyed after kernel launch. These actions are all initiated by clover.

To avoid code duplication, it is beneficial to use already existing graphics code when possible. However, graphics code contains a reference counting resource management scheme, which interferes with clover if used during compute setup: this was the problem causing the valgrind errors.

The following changes contain the read-only and write-only image resource setup which avoids the errors mentioned above:

Piglit tests are still on my TODO list, now both for image reading and writing. The OpenCL test runner itself has to be modified to be able to accept image input.

Furthermore, currently the only way to supply samplers to the kernels is to pass them as an argument. Implementing global and kernel local constant samplers is another TODO.

UPDATE: Actually one may still use the reference counting mechanism of the driver as long as care is taken to prevent the refcount ever reaching 0 inside the driver.


[1] Test kernel. The kernel is supplied with image arguments such that a[4] evaluates to 0x3f.


// img1: CLK_INTENSITY, CLK_FLOAT
// img2: CLK_RGBA, CLK_UNSIGNED_INT8
// img3: CLK_RGBA, CLK_UNSIGNED_INT8
__kernel void imgtest(read_only image2d_t img1,
                      read_only image2d_t img2,
                      write_only image2d_t img3,
                      sampler_t s1, sampler_t s2,
                      __global int * a, __global float * b)
{
    int i = get_global_id(0);
    int j = get_global_id(1);

    // Test read_imagef
    if (i == j && i < 10) {
        float x = (0.5f + i*0.1f) / get_image_width(img1);
        b[i] = read_imagef(img1, s1, (float2)(x, 0.f)).x;
    }

    // Test read_imageui
    if (i == 0 && j == 0) {
        uint4 c = read_imageui(img2, s2, (int2)(1, 2));
        a[5] = c.x;
        a[6] = c.y;
        a[7] = c.z;
        a[8] = c.w;
    }

    // Test write_imageui
    int k = 100 * i + j;
    write_imageui(img3, (int2)(i, j), (uint4)(k & 0xff, k >> 8, 0, 0));

    // Test attribute getters
    a[0] = get_image_width(img1);
    a[1] = get_image_height(img1);
    a[2] = get_image_width(img2);
    a[3] = get_image_height(img2);
    // Should evaluate to 63
    a[4] = (get_image_channel_order(img1) == CLK_INTENSITY) |
           (get_image_channel_data_type(img1) == CLK_FLOAT) << 1 |
           (get_image_channel_order(img2) == CLK_RGBA) << 2 |
           (get_image_channel_data_type(img2) == CLK_UNSIGNED_INT8) << 3 |
           (get_image_channel_order(img3) == CLK_RGBA) << 4 |
           (get_image_channel_data_type(img3) == CLK_UNSIGNED_INT8) << 5;
}
}