GPU

GPU

External Registers

This page describes the address range accessible from the ARM11, used to configure the basic GPU functionality. For information about the internal registers used for 3D rendering, see GPU/Internal Registers.

Map #

Address mappings for the external registers. GSPGPU:WriteHWRegs takes these addresses relative to 0x1EB00000.

User VAPALengthNameComments
0x1EF000000x104000004Hardware IDBit2: new model
0x1EF000040x104000044?
0x1EF000100x1040001016Memory Fill1 “PSC0”GX command 2
0x1EF000200x1040002016Memory Fill2 “PSC1”GX command 2
0x1EF000300x104000304VRAM bank controlBits 8-11 = bank[i] disabled; other bits are unused.
0x1EF000340x104000344GPU BusyBit26 = PSC0, bit27 = PSC1, Bit30 = PPF, Bit31 = P3D
0x1EF000500x104000504?Writes 0x22221200 on GPU init.
0x1EF000540x104000544?Writes 0xFF2 on GPU init.
0x1EF000C00x104000C04Backlight controlWrites 0x0 to allow backlights to turn off, 0x20000000 to force them always on.
0x1EF004000x104004000x100Framebuffer Setup “PDC0” (top screen)
0x1EF005000x104005000x100Framebuffer Setup “PDC1” (bottom)
0x1EF00C000x10400C00?Transfer Engine “DMA”
0x1EF01000/0x10401000 - 0x1EF01C00/0x10401C00 maps to GPU internal registers. These registers are usually not read/written directly here, but are written using the command list interface below (corresponding to the GPUREG_CMDBUF_* internal registers)
0x1EF010000x104010000x4?Writes 0 on GPU init and before the Command List is used
0x1EF010800x104010800x4?Writes 0x12345678 on GPU init.
0x1EF010C00x104010C00x4?Writes 0xFFFFFFF0 on GPU init.
0x1EF010D00x104010D00x4?Writes 1 on GPU init.
0x1EF014??0x104014??0x14“PPF” ?
0x1EF018E00x104018E00x14Command List “P3D”

Memory Fill #

User VADescription
0x1EF000X0Buffer start physaddr >> 3
0x1EF000X4Buffer end physaddr >> 3
0x1EF000X8Fill value
0x1EF000XCControl. bit0: start/busy, bit1: finished, bit8-9: fill-width (0=16bit, 1=3=24bit, 2=32bit)

Memory fills are used to initialize buffers in memory with a given value, similar to memset. A memory fill is triggered by setting bit0 in the control register. Doing so aborts any running memory fills on that filling unit. Upon completion, the hardware unsets bit0 and sets bit1 and fires interrupt PSC0.

Fragment Lighting

Fragment lighting is a DMP extension to the standard OpenGL pipeline with which applications can calculate object lighting for each rendered pixel instead of just per vertex. The fragment lighting algorithm furthermore supports the shading models Blinn-Phong, Cook-Terrance, Ward, and microfacet-based BRDF-models. While the lighting calculations take place in a very localized position of the pixel processing pipeline, the feature interacts with different other pipeline stages.

Overview #

In general, lighting is calculated at a particular point in space X by determining the angles (i.e. dot products) between different vectors:

Internal Registers

Overview #

GPU internal registers are written to through GPU commands. They are used to control the GPU’s behavior, that is to say tell it to draw stuff and how we want it drawn.

Each command is at least 8 bytes wide. The first word is the command parameter and the second word constitutes the command header. Optionally, more parameter words may follow (potentially including a padding word to align commands to multiples of 8 bytes).

Pitfalls

This page collects some oddities and pitfalls of the PICA GPU which is used in the 3DS.

Internal Registers #

Vertex attribute alignment #

Vertex components which are defined through GPUREG_ATTRIBBUFFERi_CONFIG1 will be accessed aligned by the GPU.

  • Vertex attributes will be aligned to their component element size.
  • Padding attributes (Component type > 11) will always aligned to 4 byte offets into the buffer.
  • The stride which is passed to the GPU should be passed unaligned.

Vertex stride in GPUREG_ATTRIBBUFFERi_CONFIG2 #

The vertex stride set in GPUREG_ATTRIBBUFFERi_CONFIG2 must match the actual size of the vertex contained in the buffer or the PICA will freeze or it won’t draw anything.

Primitive Engine

Primitive Engine (PE) is one of the PICA200’s four vertex processor units and provides some unique features which are used to implement a geometry shader stage and variable-size primitive rendering.

The full functionality of PE is not yet understood and remains to be reverse-engineered.

Variable-size Primitives #

Variable-size primitives are implemented by prefixing each per-primitive sequence of indices in an index array with a primitive size. This is used for various effects, for example Catmull-Clark subdivision and Loop subdivision. It is unknown how this feature is enabled specifically.

Procedural Texture Generation

The 3DS GPU supports procedural generation of texture data using texture unit 3. Little is known about this feature, albeit a few public hints have been dropped. The contents of this page are solely based on reports on a šŸ”— presentation given by DMP.

The related GPU registers can be found starting here.

Overview #

Procedural texture generation has four stages:

  • Noise Module (outputs u′,v′)
  • Repeat Module (outputs u′′,v′′)
  • Base Shape (also notated as G(u′′,v′′), output g)
  • F(g) and Lookup Table

Noise Module #

This stage applies noise on the input coordinates. Little is known about this other than that there are three noise parameters:

Programming Guide

This page is intended to contain more higher-level explanation of concepts and features provided by the 3DS GPU. For more detailed register-level information check GPU/Internal Registers.

Geometry Pipeline #

Fixed Vertex Attributes #

If a certain vertex attribute is constant for the duration of a draw call, instead of specifying a vertex array with repeated contents or changing the shader to use a uniform, fixed vertex attributes can be used. They let you specify a fixed value, which will be assumed by the attribute for all vertices of the batch.

Shader Instruction Set

Overview #

A compiled shader binary is comprised of two parts : the main instruction sequence and the operand descriptor table. These are both sent to the GPU around the same time but using separate GPU Commands. Instructions (such as format 1 instruction) may reference operand descriptors. When such is the case, the operand descriptor ID is the offset, in words, of the descriptor within the table. Both instructions and descriptors are coded in little endian. Basic implementations of the following specification can be found at šŸ”— 1 and šŸ”— 2. The instruction set seems to have been heavily inspired by Microsoft’s vs_3_0 šŸ”— 3 and the Direct3D shader code šŸ”— 4. Please note that this page is being written as the instruction set is reverse engineered; as such it may very well contain mistakes.