Memory transfer between host and device in OpenCL? -


consider following code creates buffer memory object array of double's of size size:

coef_mem = clcreatebuffer(context, cl_mem_read_write | cl_mem_copy_host_ptr, (sizeof(double) * size), arr, &err); 

consider passed arg kernel. there 2 possibilities depending on device on kernel running:

  1. the device same host device
  2. the device other host device

here questions both possibilities:

  • at step memory transferred device host?
  • how measure time required transferring memory host device?
  • how measure time required transferring memory device's global memory private memory?
  • is memory still transferred if device same host device?
  • will time required transfer host device greater time required transferring device's global memory private memory?

at step memory transferred device host?

the guarantee have data on device time kernel begins execution. opencl specification deliberately doesn't mandate when these data transfers should happen, in order allow different opencl implementations make decisions suitable own hardware. if have single device in context, transfer could performed create buffer. in experience, these transfers happen when kernel enqueued (or after), because when implementation knows really needs buffer on particular device. implementation.


how measure time required transferring memory host device?

use profiler, shows when these transfers happen , how long take. if transfer data clenqueuewritebuffer instead, use opencl event profiling system.


how measure time required transferring memory device's global memory private memory?

again, use profiler. profilers have metric achieved bandwidth when reading global memory, or similar. it's not explicit transfer global private memory though.


is memory still transferred if device same host device?

with cl_mem_copy_host_ptr, yes. if don't want transfer happen, use cl_mem_use_host_ptr instead. unified memory architectures (e.g. integrated gpu), typical recommendation use cl_mem_alloc_host_ptr allocate device buffer in host-accessible memory (usually pinned), , access clenqueuemapbuffer.


will time required transfer host device greater time required transferring device's global memory private memory?

probably, depend on architecture, whether have unified memory system, , how access data in kernel (memory access patterns , caches have big effect).


Comments

Popular posts from this blog

c++ - How to add Crypto++ library to Qt project -

jQuery Mobile app not scrolling in Firefox -

How to use vim as editor in Matlab GUI -