2

I have a single fragment shader that performs processing on an imageBuffer using image load/store operations. I am exclusively concerned about the following scenario:

  • I have a single fragment shader (no multistage (eg. vertex then fragment shaders) considerations, and no multipass rendering)
  • imageBuffer variables are declared as coherent. Exclusively interested in coherent imageBuffers.

To make things perfectly clear, my scenario is the following:

// Source code of my sole and unique fragment shader:
coherent layout(1x32) uniform uimageBuffer data;

void main()
{
  ...
  various calls to imageLoad(data, ..., ...);
  ...
  various calls to imageStore(data, ..., ...);
  ...
}

I have largely looked at the spec

ARB_shader_image_load_store

especially this very paragraph:

"Using variables declared as "coherent" guarantees that the results of stores will be immediately visible to shader invocations using similarly-declared variables; calling MemoryBarrier is required to ensure that the stores are visible to other operations."

Note: my "coherent uniform imageBuffer data;" declaration precisely is a "similarly-declared" variable. My scenario is single-pass, single-stage (fragment shader).

Now, I have looked at various web sites and stumbled (like most people I think) upon this thread on stackoverflow.com:

How exactly is GLSL's "coherent" memory qualifier interpreted by GPU drivers for multi-pass rendering?

and more specifically, this paragraph:

"Your shaders cannot even make the assumption that issuing a load right after a store will get the memory that was just stored in this very shader (yes really. You have to put a memoryBarrier in to pull that one off)."

My question is the following:

With the coherent qualifier specified, in my single-shader, single-pass processing scenario, can I yes or no be sure that imageStore()'s will be immediately visible to ALL invocations of my fragment shader (eg. the current invocation as well as other concurrent invocations)?

By reading the ARB_shader_image_load_store spec, it seems to me that:

  • the answer to this question is yes,
  • I don't need any kind of memoryBarrier(),
  • the quoted sentence in the above referenced thread in stackoverflow may indeed be misleading and wrong.

Thanks for your insight.

Community
  • 1
  • 1
fred54
  • 31
  • 4
  • You are in for some major headaches if you do not use a memory barrier in this shader that ***both*** loads and later stores. You ***might*** be able to pull this off safely if your draw only results in a single fragment. But for anything larger you need synchronization between concurrent fragment shader invocations, otherwise other ones may load before another stores. Fragment shaders can be scheduled in any order the driver thinks will produce sane results, imageLoad/Store is effectively ignored in thid consideration because it is assumed that you know what you are doing. – Andon M. Coleman May 18 '14 at 02:24
  • @AndonM.Coleman you are not answering my question. This question is: **can I yes or no be sure that imageStore()'s will be immediately visible to ALL invocations of my fragment shader (eg. the current invocation as well as other concurrent invocations)?** In other words, are load/stores on coherent AND similarly-declared variables in a SINGLE shader stage **guaranteed** to be executed in-order? The spec seems to say yes, some forum members say no. You are just mentioning general issues with concurrent programming. I do know I have no control over the scheduling of shader invocations. – fred54 May 18 '14 at 06:26
  • That is your question, then no. – Andon M. Coleman May 18 '14 at 12:33
  • What part of the ARB_shader_image_load_store allows you to answer 'no' here? I have cited the exact part of the spec allowing me to say 'yes'. I don't want to be rude in any way, I would just want you to back up what you say, or explain me why I'm wrong in a precise manner. – fred54 May 18 '14 at 14:27
  • There are numerous places, but the most pertinent to this question is under ***Section 2.14.X***: *"The relative order of invocations of the same shader type are undefined. A store issued by a shader when working on primitive B might complete prior to a store for primitive A, even if primitive A is specified prior to primitive B. This applies even to fragment shaders; while fragment shader outputs are written to the framebuffer in primitive order, stores executed by fragment shader invocations are not."* You need a `memoryBarrier` to ensure writes are ordered correctly. – Andon M. Coleman May 18 '14 at 16:00
  • Somebody pointed me to the location in the GL specification where things are said, it is in **Section 2.20.X** _"shader memory reads and writes complete in a largely undefined order. The built-in function memoryBarrier() can be used if needed to guarantee the completion and relative ordering of memory accesses performed by a single shader invocation."_ Thanks for your insight Andon. I feel the sentence I referenced in the ARB_shader_image_load_store spec is very strongly misleading. – fred54 May 19 '14 at 12:55

2 Answers2

0

Use that memory barrier.

For one thing GPU may optimize and fetch whole blocks of memeory to read FROM, and have separate memory to write TO.

In other words if Your shader always modify SINGLE location JUST ONCE then its ok, but if it relay on neighbors values AFTER some computation was applied, then You need memory barrier.

przemo_li
  • 3,932
  • 4
  • 35
  • 60
0

With the coherent qualifier specified, in my single-shader, single-pass processing scenario, can I yes or no be sure that imageStore()'s will be immediately visible to ALL invocations of my fragment shader (eg. the current invocation as well as other concurrent invocations)?

If each fragment shader writes to separate locations in the image, and each fragment shader only reads the locations that it wrote, then you don't even need coherent. However, if a fragment shader instance wants to read data written by other fragment shader instances, you're SOL. There's nothing you can do for that one.

If it were a compute shader, you could issue the barrier call to synchronize operations within a work group. That would ensure that the writes you want to read happen (you still need the memoryBarrier call to make them visible). But that would only ensure that writes from instances within this work group have happened. Writes from other instances are still undefined.

and more specifically, this paragraph:

BTW, that paragraph was wrong. Very very wrong. Too bad the person who wrote that paragraph will never ever be identified ;)

Nicol Bolas
  • 449,505
  • 63
  • 781
  • 982