This page gives an overview over the instruction set supported by nihstro. Note that there is a similar reference list on 3dbrew, which documents the actual implementation on hardware though. nihstro seeks to abstract away annoying details like the fact that there are 3 different CALL instructions, and instead provides convenience shortcuts where possible without giving up flexibility.
Most arithmetic instructions take a destination operand and one or more source operands. Source operands may use any kind of swizzle mask, while destination operands may not use reordering or duplicating swizzle masks. Below you will find a short operation description for each instruction, e.g. dest[i] = src[i]
, which means that the i
-th source component (as specified by the swizzle mask) will be assigned to the i
-th destination component (as specified by the swizzle mask), with i
ranging from 1 to the number of swizzle mask components. Components not listed in the destination swizzle mask hence will not be written.
Static indexing (i.e. indexing with a constant, not to be confused with the above notation) may be done for both operand types. Source operands additionally support dynamic indexing, where the index depends on one of the address registers a0
/a1
or on the loop counter lcnt
. Examples:
- static indexing:
c0[20]
- dynamic indexing:
c0[2+a0]
Syntax: mov dest_operand, src_operand
Operation: dest[i] = src[i]
Restrictions:
src
anddest
must have the same number of components
Syntax: add dest_operand, src1_operand, src2_operand
Operation: dest[i] = src1[i] + src2[i]
Restrictions:
src1
,src2
, anddest
must have the same number of components- not more than one of the source operands may be a float uniform register and/or use dynamic indexing
Notes:
- subtraction can be performed using negation:
add r0, c0, -c1
- when chaining an addition and a multiplication, consider using
mad
instead
Syntax: mul dest_operand, src1_operand, src2_operand
Operation: dest[i] = src1[i] * src2[i]
Restrictions:
src1
,src2
, anddest
must have the same number of components- not more than one of the source operands may be a float uniform register and/or use dynamic indexing
Notes:
- division can be performed by computing the reciprocal of src2 and multiplying the result:
rcp r0, c1; mul r0, c0, r0
- when chaining an addition and a multiplication, consider using
mad
instead
Syntax: mad dest_operand, src1_operand, src2_operand, src3_operand
Operation: dest[i] = src1[i] * src2[i] + src3[i]
Restrictions:
src1
,src2
,src3
, anddest
must have the same number of components- not more than two source operands may be float uniform registers
- no dynamic indexing may be performed on any of the source operands.
Notes:
- when dynamic indexing is not avoidable, use
add
andmul
instead - not supported currently
Syntax: max dest_operand, src1_operand, src2_operand
Operation: dest[i] = max(src1[i], src2[i])
Restrictions:
src1
,src2
, anddest
must have the same number of components- not more than one of the source operands may be a float uniform register and/or use dynamic indexing
Syntax: min dest_operand, src1_operand, src2_operand
Operation: dest[i] = min(src1[i], src2[i])
Restrictions:
src1
,src2
, anddest
must have the same number of components- not more than one of the source operands may be a float uniform register and/or use dynamic indexing
Syntax: flr dest_operand, src_operand
Operation: dest[i] = floor(src[i])
Restrictions:
src
anddest
must have the same number of components
Syntax: rcp dest_operand, src_operand
Operation: dest[i] = 1 / src[i]
Restrictions:
src
anddest
must have the same number of components
Syntax: rsq dest_operand, src_operand
Operation: dest[i] = 1 / sqrt(src[i])
Restrictions:
src
anddest
must have the same number of components
Syntax: exp dest_operand, src_operand
Operation: dest[i] = exp(src[i])
Restrictions:
src1
anddest
must have the same number of components
Syntax: log dest_operand, src_operand
Operation: dest[i] = log(src[i])
Restrictions:
src1
anddest
must have the same number of components
Syntax: dp3 dest_operand, src1_operand, src2_operand
Operation: dest[i] = src1[0]*src2[0]+src1[1]*src2[1]+src1[2]*src2[2])
Restrictions:
src1
,src2
, anddest
must have the same number of components- not more than one of the source operands may be a float uniform register and/or use dynamic indexing
Syntax: dp4 dest_operand, src1_operand, src2_operand
Operation: dest[i] = src1[0]*src2[0]+src1[1]*src2[1]+src1[2]*src2[2]+src1[3]*src2[3])
Restrictions:
src1
,src2
, anddest
must have the same number of components- not more than one of the source operands may be a float uniform register and/or use dynamic indexing
Syntax: dph dest_operand, src1_operand, src2_operand
Operation: dest[i] = src1[0]*src2[0]+src1[1]*src2[1]+src1[2]*src2[2]+src2[3]
Restrictions:
src1
,src2
, anddest
must have the same number of components- not more than one of the source operands may be a float uniform register and/or use dynamic indexing.
Syntax: sge dest_operand, src1_operand, src2_operand
Operation: dest[i] = (src1[i] >= src2[i]) ? 1.0 : 0.0
Restrictions:
src1
,src2
, anddest
must have the same number of components- not more than one of the source operands may be a float uniform register and/or use dynamic indexing
Syntax: slt dest_operand, src1_operand, src2_operand
Operation: dest[i] = (src1[i] < src2[i]) ? 1.0 : 0.0
Restrictions:
src1
,src2
, anddest
must have the same number of components- not more than one of the source operands may be a float uniform register and/or use dynamic indexing
Syntax: mova src_operand
Operation:
a0 = src.x
a1 = src.y
Restrictions:
- src_operand must be a two-component vector.
Notes:
- not supported currently
These allow for non-linear code execution, e.g. by conditionally or repeatedly running code.
Some flow control instruction take a "condition" parameter. A condition is either
- a boolean uniform or
- an expression consisting of one or two conditional code components, combined via
&&
("and") or||
("or"), and optionally negated. Examples:cc.x
,cc.y && !cc.x
Syntax: cmp src1_operand, src2_operand, op1, op2
op1
and op2
may be any of the strings ==
(equal), !=
(not equal), <
(less than), <=
(less than or equal to), >
(greater than), and >=
(greater than or equal to).
Operation:
cc.x = (src1[0] op1 src2[0])
cc.y = (src1[1] op2 src2[1])
Restrictions:
src1
andsrc2
must be two-component vectors- it is not possible to set
cc.x
without also settingcc.y
- not more than one of the source operands may be a float uniform register and/or use dynamic indexing
Notes:
- this instruction is used to set conditional codes, which can be used as conditions for
if
/jmp
/call
/break
.
Syntax: if condition
Operation:
If condition
is true, conditionally executes the code between itself and the corresponding else
or endif
pseudo-instruction. Otherwise, executes the code in the else
branch, if one is given (otherwise, skips the branch body and continues after the endif
statement).
Restrictions:
- not more than one
else
branch may be specified (else if
syntax is not supported)
Notes:
- all
if
branches must be closed explicitly usingendif
- jumping out of a branch body may result in undefined behavior
Example:
if cc.x && !cc.y
// do stuff
else
if b0
// do other stuff
endif
endif
Syntax: loop int_uniform
Operation:
Initialize lcnt
to int_uniform.y
, then process code between loop
and endloop
for int_uniform.x+1
iterations in total. After each iteration, lcnt
is incremented by int_uniform.z
.
Restrictions:
- no swizzle mask may be applied on the given uniform
- there is no direct way of looping zero times (the easiest way is to use
break
with an extra boolean uniform)
Notes:
lcnt
can be used to dynamically index arrays, e.g. to implement vertex lighting with multiple light sources
Syntax: break condition
Operation:
If condition
is true, break out of the current loop.
Restrictions:
- jumping out of a branch body may result in undefined behavior
Syntax: jmp target_label if condition
Restrictions:
- jumping out of or into branch bodies or loops may result in undefined behavior
- there is no way to force a jump without specifying a condition
Notes:
- if you need to automatically return from a function, use
call
instead
Example:
main:
jmp my_helper_code if b0
// if not b0, do other stuff here
nop
end
my_helper_code:
// do stuff
nop
end
Possible syntaxes:
call target_label until return_label if condition
call target_label until return_label
Operation:
If condition
is true (or none is given), jumps to target_label
and processes shader code there until return_label
is hit, at which point code execution jumps back to the caller.
Restrictions:
- jumping out of or into branch bodies or loops may result in undefined behavior
Notes:
- if you don't need to automatically return from a function, use
jmp
instead
Example:
main:
call my_helper_code until end_helper_code
nop
end
my_helper_code:
// do stuff here
nop
end_helper_code:
Syntax: nop
Notes:
- This may be necessary before using
end
to make sure all pending write operations have been completed
Syntax: end
Operation: Stops shader execution.