Path: blob/21.2-virgl/docs/_extra/specs/MESA_shader_integer_functions.txt
4566 views
Name12MESA_shader_integer_functions34Name Strings56GL_MESA_shader_integer_functions78Contact910Ian Romanick <[email protected]>1112Contributors1314All the contributors of GL_ARB_gpu_shader51516Status1718Supported by all GLSL 1.30 capable drivers in Mesa 12.1 and later1920Version2122Version 3, March 31, 20172324Number2526OpenGL Extension #4952728Dependencies2930This extension is written against the OpenGL 3.2 (Compatibility Profile)31Specification.3233This extension is written against Version 1.50 (Revision 09) of the OpenGL34Shading Language Specification.3536GLSL 1.30 (OpenGL) or GLSL ES 3.00 (OpenGL ES) is required.3738This extension interacts with ARB_gpu_shader5.3940This extension interacts with ARB_gpu_shader_fp64.4142This extension interacts with NV_gpu_shader5.4344Overview4546GL_ARB_gpu_shader5 extends GLSL in a number of useful ways. Much of this47added functionality requires significant hardware support. There are many48aspects, however, that can be easily implmented on any GPU with "real"49integer support (as opposed to simulating integers using floating point50calculations).5152This extension provides a set of new features to the OpenGL Shading53Language to support capabilities of these GPUs, extending the54capabilities of version 1.30 of the OpenGL Shading Language and version553.00 of the OpenGL ES Shading Language. Shaders using the new56functionality provided by this extension should enable this57functionality via the construct5859#extension GL_MESA_shader_integer_functions : require (or enable)6061This extension provides a variety of new features for all shader types,62including:6364* support for implicitly converting signed integer types to unsigned65types, as well as more general implicit conversion and function66overloading infrastructure to support new data types introduced by67other extensions;6869* new built-in functions supporting:7071* splitting a floating-point number into a significand and exponent72(frexp), or building a floating-point number from a significand and73exponent (ldexp);7475* integer bitfield manipulation, including functions to find the76position of the most or least significant set bit, count the number77of one bits, and bitfield insertion, extraction, and reversal;7879* extended integer precision math, including add with carry, subtract80with borrow, and extenended multiplication;8182The resulting extension is a strict subset of GL_ARB_gpu_shader5.8384IP Status8586No known IP claims.8788New Procedures and Functions8990None9192New Tokens9394None9596Additions to Chapter 2 of the OpenGL 3.2 (Compatibility Profile) Specification97(OpenGL Operation)9899None.100101Additions to Chapter 3 of the OpenGL 3.2 (Compatibility Profile) Specification102(Rasterization)103104None.105106Additions to Chapter 4 of the OpenGL 3.2 (Compatibility Profile) Specification107(Per-Fragment Operations and the Frame Buffer)108109None.110111Additions to Chapter 5 of the OpenGL 3.2 (Compatibility Profile) Specification112(Special Functions)113114None.115116Additions to Chapter 6 of the OpenGL 3.2 (Compatibility Profile) Specification117(State and State Requests)118119None.120121Additions to Appendix A of the OpenGL 3.2 (Compatibility Profile)122Specification (Invariance)123124None.125126Additions to the AGL/GLX/WGL Specifications127128None.129130Modifications to The OpenGL Shading Language Specification, Version 1.50131(Revision 09)132133Including the following line in a shader can be used to control the134language features described in this extension:135136#extension GL_MESA_shader_integer_functions : <behavior>137138where <behavior> is as specified in section 3.3.139140New preprocessor #defines are added to the OpenGL Shading Language:141142#define GL_MESA_shader_integer_functions 1143144145Modify Section 4.1.10, Implicit Conversions, p. 27146147(modify table of implicit conversions)148149Can be implicitly150Type of expression converted to151--------------------- -----------------152int uint, float153ivec2 uvec2, vec2154ivec3 uvec3, vec3155ivec4 uvec4, vec4156157uint float158uvec2 vec2159uvec3 vec3160uvec4 vec4161162(modify second paragraph of the section) No implicit conversions are163provided to convert from unsigned to signed integer types or from164floating-point to integer types. There are no implicit array or structure165conversions.166167(insert before the final paragraph of the section) When performing168implicit conversion for binary operators, there may be multiple data types169to which the two operands can be converted. For example, when adding an170int value to a uint value, both values can be implicitly converted to uint171and float. In such cases, a floating-point type is chosen if either172operand has a floating-point type. Otherwise, an unsigned integer type is173chosen if either operand has an unsigned integer type. Otherwise, a174signed integer type is chosen.175176177Modify Section 5.9, Expressions, p. 57178179(modify bulleted list as follows, adding support for implicit conversion180between signed and unsigned types)181182Expressions in the shading language are built from the following:183184* Constants of type bool, int, int64_t, uint, uint64_t, float, all vector185types, and all matrix types.186187...188189* The operator modulus (%) operates on signed or unsigned integer scalars190or vectors. If the fundamental types of the operands do not match, the191conversions from Section 4.1.10 "Implicit Conversions" are applied to192produce matching types. ...193194195Modify Section 6.1, Function Definitions, p. 63196197(modify description of overloading, beginning at the top of p. 64)198199Function names can be overloaded. The same function name can be used for200multiple functions, as long as the parameter types differ. If a function201name is declared twice with the same parameter types, then the return202types and all qualifiers must also match, and it is the same function203being declared. For example,204205vec4 f(in vec4 x, out vec4 y); // (A)206vec4 f(in vec4 x, out uvec4 y); // (B) okay, different argument type207vec4 f(in ivec4 x, out uvec4 y); // (C) okay, different argument type208209int f(in vec4 x, out ivec4 y); // error, only return type differs210vec4 f(in vec4 x, in vec4 y); // error, only qualifier differs211vec4 f(const in vec4 x, out vec4 y); // error, only qualifier differs212213When function calls are resolved, an exact type match for all the214arguments is sought. If an exact match is found, all other functions are215ignored, and the exact match is used. If no exact match is found, then216the implicit conversions in Section 4.1.10 (Implicit Conversions) will be217applied to find a match. Mismatched types on input parameters (in or218inout or default) must have a conversion from the calling argument type219to the formal parameter type. Mismatched types on output parameters (out220or inout) must have a conversion from the formal parameter type to the221calling argument type.222223If implicit conversions can be used to find more than one matching224function, a single best-matching function is sought. To determine a best225match, the conversions between calling argument and formal parameter226types are compared for each function argument and pair of matching227functions. After these comparisons are performed, each pair of matching228functions are compared. A function definition A is considered a better229match than function definition B if:230231* for at least one function argument, the conversion for that argument232in A is better than the corresponding conversion in B; and233234* there is no function argument for which the conversion in B is better235than the corresponding conversion in A.236237If a single function definition is considered a better match than every238other matching function definition, it will be used. Otherwise, a239semantic error occurs and the shader will fail to compile.240241To determine whether the conversion for a single argument in one match is242better than that for another match, the following rules are applied, in243order:2442451. An exact match is better than a match involving any implicit246conversion.2472482. A match involving an implicit conversion from float to double is249better than a match involving any other implicit conversion.2502513. A match involving an implicit conversion from either int or uint to252float is better than a match involving an implicit conversion from253either int or uint to double.254255If none of the rules above apply to a particular pair of conversions,256neither conversion is considered better than the other.257258For the function prototypes (A), (B), and (C) above, the following259examples show how the rules apply to different sets of calling argument260types:261262f(vec4, vec4); // exact match of vec4 f(in vec4 x, out vec4 y)263f(vec4, uvec4); // exact match of vec4 f(in vec4 x, out ivec4 y)264f(vec4, ivec4); // matched to vec4 f(in vec4 x, out vec4 y)265// (C) not relevant, can't convert vec4 to266// ivec4. (A) better than (B) for 2nd267// argument (rule 2), same on first argument.268f(ivec4, vec4); // NOT matched. All three match by implicit269// conversion. (C) is better than (A) and (B)270// on the first argument. (A) is better than271// (B) and (C).272273274Modify Section 8.3, Common Functions, p. 84275276(add support for single-precision frexp and ldexp functions)277278Syntax:279280genType frexp(genType x, out genIType exp);281genType ldexp(genType x, in genIType exp);282283The function frexp() splits each single-precision floating-point number in284<x> into a binary significand, a floating-point number in the range [0.5,2851.0), and an integral exponent of two, such that:286287x = significand * 2 ^ exponent288289The significand is returned by the function; the exponent is returned in290the parameter <exp>. For a floating-point value of zero, the significant291and exponent are both zero. For a floating-point value that is an292infinity or is not a number, the results of frexp() are undefined.293294If the input <x> is a vector, this operation is performed in a295component-wise manner; the value returned by the function and the value296written to <exp> are vectors with the same number of components as <x>.297298The function ldexp() builds a single-precision floating-point number from299each significand component in <x> and the corresponding integral exponent300of two in <exp>, returning:301302significand * 2 ^ exponent303304If this product is too large to be represented as a single-precision305floating-point value, the result is considered undefined.306307If the input <x> is a vector, this operation is performed in a308component-wise manner; the value passed in <exp> and returned by the309function are vectors with the same number of components as <x>.310311312(add support for new integer built-in functions)313314Syntax:315316genIType bitfieldExtract(genIType value, int offset, int bits);317genUType bitfieldExtract(genUType value, int offset, int bits);318319genIType bitfieldInsert(genIType base, genIType insert, int offset,320int bits);321genUType bitfieldInsert(genUType base, genUType insert, int offset,322int bits);323324genIType bitfieldReverse(genIType value);325genUType bitfieldReverse(genUType value);326327genIType bitCount(genIType value);328genIType bitCount(genUType value);329330genIType findLSB(genIType value);331genIType findLSB(genUType value);332333genIType findMSB(genIType value);334genIType findMSB(genUType value);335336The function bitfieldExtract() extracts bits <offset> through337<offset>+<bits>-1 from each component in <value>, returning them in the338least significant bits of corresponding component of the result. For339unsigned data types, the most significant bits of the result will be set340to zero. For signed data types, the most significant bits will be set to341the value of bit <offset>+<base>-1. If <bits> is zero, the result will be342zero. The result will be undefined if <offset> or <bits> is negative, or343if the sum of <offset> and <bits> is greater than the number of bits used344to store the operand. Note that for vector versions of bitfieldExtract(),345a single pair of <offset> and <bits> values is shared for all components.346347The function bitfieldInsert() inserts the <bits> least significant bits of348each component of <insert> into the corresponding component of <base>.349The result will have bits numbered <offset> through <offset>+<bits>-1350taken from bits 0 through <bits>-1 of <insert>, and all other bits taken351directly from the corresponding bits of <base>. If <bits> is zero, the352result will simply be <base>. The result will be undefined if <offset> or353<bits> is negative, or if the sum of <offset> and <bits> is greater than354the number of bits used to store the operand. Note that for vector355versions of bitfieldInsert(), a single pair of <offset> and <bits> values356is shared for all components.357358The function bitfieldReverse() reverses the bits of <value>. The bit359numbered <n> of the result will be taken from bit (<bits>-1)-<n> of360<value>, where <bits> is the total number of bits used to represent361<value>.362363The function bitCount() returns the number of one bits in the binary364representation of <value>.365366The function findLSB() returns the bit number of the least significant one367bit in the binary representation of <value>. If <value> is zero, -1 will368be returned.369370The function findMSB() returns the bit number of the most significant bit371in the binary representation of <value>. For positive integers, the372result will be the bit number of the most significant one bit. For373negative integers, the result will be the bit number of the most374significant zero bit. For a <value> of zero or negative one, -1 will be375returned.376377378(support for unsigned integer add/subtract with carry-out)379380Syntax:381382genUType uaddCarry(genUType x, genUType y, out genUType carry);383genUType usubBorrow(genUType x, genUType y, out genUType borrow);384385The function uaddCarry() adds 32-bit unsigned integers or vectors <x> and386<y>, returning the sum modulo 2^32. The value <carry> is set to zero if387the sum was less than 2^32, or one otherwise.388389The function usubBorrow() subtracts the 32-bit unsigned integer or vector390<y> from <x>, returning the difference if non-negative or 2^32 plus the391difference, otherwise. The value <borrow> is set to zero if x >= y, or392one otherwise.393394395(support for signed and unsigned multiplies, with 32-bit inputs and a39664-bit result spanning two 32-bit outputs)397398Syntax:399400void umulExtended(genUType x, genUType y, out genUType msb,401out genUType lsb);402void imulExtended(genIType x, genIType y, out genIType msb,403out genIType lsb);404405The functions umulExtended() and imulExtended() multiply 32-bit unsigned406or signed integers or vectors <x> and <y>, producing a 64-bit result. The40732 least significant bits are returned in <lsb>; the 32 most significant408bits are returned in <msb>.409410411GLX Protocol412413None.414415Dependencies on ARB_gpu_shader_fp64416417This extension, ARB_gpu_shader_fp64, and NV_gpu_shader5 all modify the set418of implicit conversions supported in the OpenGL Shading Language. If more419than one of these extensions is supported, an expression of one type may420be converted to another type if that conversion is allowed by any of these421specifications.422423If ARB_gpu_shader_fp64 or a similar extension introducing new data types424is not supported, the function overloading rule in the GLSL specification425preferring promotion an input parameters to smaller type to a larger type426is never applicable, as all data types are of the same size. That rule427and the example referring to "double" should be removed.428429430Dependencies on NV_gpu_shader5431432This extension, ARB_gpu_shader_fp64, and NV_gpu_shader5 all modify the set433of implicit conversions supported in the OpenGL Shading Language. If more434than one of these extensions is supported, an expression of one type may435be converted to another type if that conversion is allowed by any of these436specifications.437438If NV_gpu_shader5 is supported, integer data types are supported with four439different precisions (8-, 16, 32-, and 64-bit) and floating-point data440types are supported with three different precisions (16-, 32-, and44164-bit). The extension adds the following rule for output parameters,442which is similar to the one present in this extension for input443parameters:4444455. If the formal parameters in both matches are output parameters, a446conversion from a type with a larger number of bits per component is447better than a conversion from a type with a smaller number of bits448per component. For example, a conversion from an "int16_t" formal449parameter type to "int" is better than one from an "int8_t" formal450parameter type to "int".451452Such a rule is not provided in this extension because there is no453combination of types in this extension and ARB_gpu_shader_fp64 where this454rule has any effect.455456457Errors458459None460461462New State463464None465466New Implementation Dependent State467468None469470Issues471472(1) What should this extension be called?473474UNRESOLVED. This extension borrows from GL_ARB_gpu_shader5, so creating475some sort of a play on that name would be viable. However, nothing in476this extension should require SM5 hardware, so such a name would be a477little misleading and weird.478479Since the primary purpose is to add integer related functions from480GL_ARB_gpu_shader5, call this extension GL_MESA_shader_integer_functions481for now.482483(2) Why is some of the formatting in this extension weird?484485RESOLVED: This extension is formatted to minimize the differences (as486reported by 'diff --side-by-side -W180') with the GL_ARB_gpu_shader5487specification.488489(3) Should ldexp and frexp be included?490491RESOLVED: Yes. Few GPUs have native instructions to implement these492functions. These are generally implemented using existing GLSL built-in493functions and the other functions provided by this extension.494495(4) Should umulExtended and imulExtended be included?496497RESOLVED: Yes. These functions should be implementable on any GPU that498can support the rest of this extension, but the implementation may be499complex. The implementation on a GPU that only supports 32bit x 32bit =50032bit multiplication would be quite expensive. However, many GPUs501(including OpenGL 4.0 GPUs that already support this function) have a50232bit x 16bit = 48bit multiplier. The implementation there is only503trivially more expensive than regular 32bit multiplication.504505(5) Should the pack and unpack functions be included?506507RESOLVED: No. These functions are already available via508GL_ARB_shading_language_packing.509510(6) Should the "BitsTo" functions be included?511512RESOLVED: No. These functions are already available via513GL_ARB_shader_bit_encoding.514515Revision History516517Rev. Date Author Changes518---- ----------- -------- -----------------------------------------5193 31-Mar-2017 Jon Leech Add ES support (OpenGL-Registry/issues/3)5202 7-Jul-2016 idr Fix typo in #extension line5211 20-Jun-2016 idr Initial version based on GL_ARB_gpu_shader5.522523524