Add translation scheme for HLSL-style input/outputs
Add design doc describing how to translate from SPIR-V Vulkan style inputs/outputs to WGSL-style inputs as shader entry points, and outputs as shader return value. Change-Id: If0dc5f07542ccf2db95ed648a34f73a06e20a5a4 Reviewed-on: https://dawn-review.googlesource.com/c/tint/+/38961 Reviewed-by: Alan Baker <alanbaker@google.com> Commit-Queue: David Neto <dneto@google.com>
This commit is contained in:
parent
2edb8d4064
commit
222fae1c8c
|
@ -0,0 +1,264 @@
|
|||
# SPIR-V translation of shader input and output variables
|
||||
|
||||
WGSL [MR 1315](https://github.com/gpuweb/gpuweb/issues/1315) changed WGSL so
|
||||
that pipeline inputs and outputs are handled similar to HLSL:
|
||||
|
||||
- Shader pipeline inputs are the WGSL entry point function arguments.
|
||||
- Shader pipeline outputs are the WGSL entry point return value.
|
||||
|
||||
Note: In both cases, a struct may be used to pack multiple values together.
|
||||
In that case, I/O specific attributes appear on struct members at the struct declaration.
|
||||
|
||||
Resource variables, e.g. buffers, samplers, and textures, are still declared
|
||||
as variables at module scope.
|
||||
|
||||
## Vulkan SPIR-V today
|
||||
|
||||
SPIR-V for Vulkan models inputs and outputs as module-scope variables in
|
||||
the Input and Output storage classes, respectively.
|
||||
|
||||
The `OpEntryPoint` instruction has a list of module-scope variables that must
|
||||
be a superset of all the input and output variables that are statically
|
||||
accessed in the shader call tree.
|
||||
From SPIR-V 1.4 onward, all interface variables that might be statically accessed
|
||||
must appear on that list.
|
||||
So that includes all resource variables that might be statically accessed
|
||||
by the shader call tree.
|
||||
|
||||
## Translation scheme for SPIR-V to WGSL
|
||||
|
||||
A translation scheme from SPIR-V to WGSL is as follows:
|
||||
|
||||
Each SPIR-V entry point maps to a set of Private variables proxying the
|
||||
inputs and outputs, and two functions:
|
||||
|
||||
- An inner function with no arguments or return values, and whose body
|
||||
is the same as the original SPIR-V entry point.
|
||||
- Original input variables are mapped to pseudo-in Private variables
|
||||
with the same store types, but no other attributes or properties copied.
|
||||
- Original output variables are mapped to pseudo-out Private variables
|
||||
with the same store types, but no other attributes or properties are copied.
|
||||
- A wrapper entry point function whose arguments correspond in type, location
|
||||
and builtin attributes the original input variables, and whose return type is
|
||||
a structure containing members correspond in type, location, and builtin
|
||||
attributes to the original output variables.
|
||||
The body of the wrapper function the following phases:
|
||||
- Copy formal parameter values into pseudo-in variables.
|
||||
- Use stores to initialize pseudo-out variables:
|
||||
- If the original variable had an initializer, store that value.
|
||||
- Otherwise, store a zero value for the store type.
|
||||
- Execute the inner function.
|
||||
- Copy pseudo-out variables into the return structure.
|
||||
- Return the return structure.
|
||||
|
||||
- Replace uses of the the original input/output variables to the pseudo-in and
|
||||
pseudo-out variables, respectively.
|
||||
- Remap pointer-to-Input with pointer-to-Private
|
||||
- Remap pointer-to-Output with pointer-to-Private
|
||||
|
||||
We are not concerned with the cost of extra copying input/output values.
|
||||
First, the pipeline inputs/outputs tend to be small.
|
||||
Second, we expect the backend compiler in the driver will be able to see
|
||||
through the copying and optimize the result.
|
||||
|
||||
### Example
|
||||
|
||||
|
||||
```glsl
|
||||
#version 450
|
||||
|
||||
layout(location = 0) out vec4 frag_colour;
|
||||
layout(location = 0) in vec4 the_colour;
|
||||
|
||||
void bar() {
|
||||
frag_colour = the_colour;
|
||||
}
|
||||
|
||||
void main() {
|
||||
bar();
|
||||
}
|
||||
```
|
||||
|
||||
Current translation, through SPIR-V, SPIR-V reader, WGSL writer:
|
||||
|
||||
```groovy
|
||||
[[location(0)]] var<out> frag_colour : vec4<f32>;
|
||||
[[location(0)]] var<in> the_colour : vec4<f32>;
|
||||
|
||||
fn bar_() -> void {
|
||||
const x_14 : vec4<f32> = the_colour;
|
||||
frag_colour = x_14;
|
||||
return;
|
||||
}
|
||||
|
||||
[[stage(fragment)]]
|
||||
fn main() -> void {
|
||||
bar_();
|
||||
return;
|
||||
}
|
||||
```
|
||||
|
||||
Proposed translation, through SPIR-V, SPIR-V reader, WGSL writer:
|
||||
|
||||
```groovy
|
||||
// 'in' variables are now 'private'.
|
||||
var<private> frag_colour : vec4<f32>;
|
||||
var<private> the_colour : vec4<f32>;
|
||||
|
||||
fn bar_() -> void {
|
||||
// Accesses to the module-scope variables do not change.
|
||||
// This is a big simplifying advantage.
|
||||
const x_14 : vec4<f32> = the_colour;
|
||||
frag_colour = x_14;
|
||||
return;
|
||||
}
|
||||
|
||||
fn main_inner() -> void {
|
||||
bar_();
|
||||
return;
|
||||
}
|
||||
|
||||
// Declare a structure type to collect the return values.
|
||||
struct main_result_type {
|
||||
[[location(0)]] frag_color : vec4<f32>;
|
||||
};
|
||||
|
||||
[[stage(fragment)]]
|
||||
fn main(
|
||||
|
||||
// 'in' variables are entry point parameters
|
||||
[[location(0)]] the_color_arg : vec4<f32>
|
||||
|
||||
) -> main_result_type {
|
||||
|
||||
// Save 'in' arguments to 'private' variables.
|
||||
the_color = the_color_arg;
|
||||
|
||||
// Initialize 'out' variables.
|
||||
// Use the zero value, since no initializer was specified.
|
||||
frag_color = vec4<f32>();
|
||||
|
||||
// Invoke the original entry point.
|
||||
main_inner();
|
||||
|
||||
// Collect outputs into a structure and return it.
|
||||
var result : main_outer_result_type;
|
||||
result.frag_color = frag_color;
|
||||
return result;
|
||||
}
|
||||
```
|
||||
|
||||
Alternately, we could emit the body of the original entry point at
|
||||
the point of invocation.
|
||||
However that is more complex because the original entry point function
|
||||
may return from multiple locations, and we would like to have only
|
||||
a single exit path to construct and return the result value.
|
||||
|
||||
### Handling fragment discard
|
||||
|
||||
In SPIR-V `OpKill` causes immediate termination of the shader.
|
||||
Is the shader obligated to write its outputs when `OpKill` is executed?
|
||||
|
||||
The Vulkan fragment operations are as follows:
|
||||
(see [6. Fragment operations](https://www.khronos.org/registry/vulkan/specs/1.2/html/vkspec.html#fragops)).
|
||||
|
||||
* Scissor test
|
||||
* Sample mask test
|
||||
* Fragment shading
|
||||
* Multisample coverage
|
||||
* Depth bounds test
|
||||
* Stencil test
|
||||
* Depth test
|
||||
* Sample counting
|
||||
* Coverage reduction
|
||||
|
||||
After that, the fragment results are used to update output attachments, including
|
||||
colour, depth, and stencil attachments.
|
||||
|
||||
Vulkan says:
|
||||
|
||||
> If a fragment operation results in all bits of the coverage mask being 0,
|
||||
> the fragment is discarded, and no further operations are performed.
|
||||
> Fragments can also be programmatically discarded in a fragment shader by executing one of
|
||||
>
|
||||
> OpKill.
|
||||
|
||||
I interpret this to mean that the outputs of a discarded fragment are ignored.
|
||||
|
||||
Therefore, `OpKill` does not require us to modify the basic scheme from the previous
|
||||
section.
|
||||
|
||||
The `OpDemoteToHelperInvocationEXT`
|
||||
instruction is an alternative way to throw away a fragment, but which
|
||||
does not immediately terminate execution of the invocation.
|
||||
It is introduced in the [`SPV_EXT_demote_to_helper_invocation](http://htmlpreview.github.io/?https://github.com/KhronosGroup/SPIRV-Registry/blob/master/extensions/EXT/SPV_EXT_demote_to_helper_invocation.html)
|
||||
extension. WGSL does not have this feature, but we expect it will be introduced by a
|
||||
future WGSL extension. The same analysis applies to demote-to-helper. When introduced,
|
||||
it will not affect translation of pipeline outputs.
|
||||
|
||||
### Handling depth-replacing mode
|
||||
|
||||
A Vulkan fragment shader must write to the fragment depth builtin if and only if it
|
||||
has a `DepthReplacing` execution mode. Otherwise behaviour is undefined.
|
||||
|
||||
We will ignore the case where the SPIR-V shader writes to the `FragDepth` builtin
|
||||
and then discards the fragment.
|
||||
This is justified because "no further operations" are performed by the pipeline
|
||||
after the fragment is discarded, and that includes writing to depth output attachments.
|
||||
|
||||
Assuming the shader is valid, no special translation is required.
|
||||
|
||||
### Handling output sample mask
|
||||
|
||||
By the same reasoning as for depth-replacing, it is ok to incidentally not write
|
||||
to the sample-mask builtin variable when the fragment is discarded.
|
||||
|
||||
### Handling clip distance and cull distance
|
||||
|
||||
Most builtin variables are scalars or vectors.
|
||||
However, the `ClipDistance` and `CullDistance` builtin variables are arrays of 32-bit float values.
|
||||
Each entry defines a clip half-plane (respectively cull half-plane)
|
||||
A Vulkan implementation must support array sizes of up to 8 elements.
|
||||
|
||||
How prevalent are shaders that use these features?
|
||||
These variables are supported when Vulkan features `shaderClipDistance` and `shaderCullDistance`
|
||||
are supported.
|
||||
According to gpuinfo.org as of this writing, those
|
||||
Vulkan features appear to be nearly universally supported on Windows devices (>99%),
|
||||
but by only 70% on Android.
|
||||
It appears that Qualcomm devices support them, but Mali devices do not (e.g. Mali-G77).
|
||||
|
||||
The proposed translation scheme forces a copy of each array from private
|
||||
variables into the return value of a vertex shader, or into a private
|
||||
variable of a fragment shader.
|
||||
In addition to the register pressure, there may be a performance degradation
|
||||
due to the bulk copying of data.
|
||||
|
||||
We think this is an acceptable tradeoff for the gain in usability and
|
||||
consistency with other pipeline inputs and outputs.
|
||||
|
||||
## Translation scheme for WGSL AST to SPIR-V
|
||||
|
||||
To translate from the WGSL AST to SPIR-V, do the following:
|
||||
|
||||
- Each entry point formal parameter is mapped to a SPIR-V `Input` variable.
|
||||
- Struct and array inputs may have to be broken down into individual variables.
|
||||
- The return of the entry point is broken down into fields, with one
|
||||
`Output` variable per field.
|
||||
- In the above, builtins must be separated from user attributes.
|
||||
- Builtin attributes are moved to the corresponding variable.
|
||||
- Location and interpolation attributes are moved to the corresponding
|
||||
variables.
|
||||
- This translation relies on the fact that pipeline inputs and pipeline
|
||||
outputs are IO-shareable types. IO-shareable types are always storable,
|
||||
and can be the store type of input/output variables.
|
||||
- Input function parameters will be automatically initialized by the system
|
||||
as part of setting up the pipeline inputs to the entry point.
|
||||
- Replace each return statement in the entry point with a code sequence
|
||||
which writes the return value components to the synthesized output variables,
|
||||
and then executes an `OpReturn` (without value).
|
||||
|
||||
This translation is sufficient even for fragment shaders with discard.
|
||||
In that case, outputs will be ignored because downstream pipeline
|
||||
operations will not be performed.
|
||||
This is the same rationale as for translation from SPIR-V to WGSL AST.
|
Loading…
Reference in New Issue