пятница, 13 апреля 2012 г.

StarCraft 2. Technolgy secrets. Part 1

Reverse engineering was done at October 2010. I decide to translate my work in backward chronological order, because older games now have a little outdated 3D technologies, and I think is not that interesting than new ones.

This part about ingame graphics rendering. Movies (engine renderer based) will be described in part 2.



So, StarCraft 2: Wings of Liberty from Blizzard corporation. There are many games with DirectX 11 support. There are game with native DX 10 renderes, that dropped DX 9 support. But Blizzard games always run on wide variety of computer hardware (World of Warcraft for example, as I know it can run even on fixed function HW). StarCraft 2 is not an exception. I think developers decide that moving from DX9 to DX10 after several years of development will be painful process, and leave their engine on DX9 API. DX10 is not supported (it is OpenGL for Mac, for obvious reasons).

First problem I faced when I start looking inside SC2 render was tools problem. My main reverse engineering tool nVidia PerfHUD is refused to work because of involved game launching scheme. Game using single launcher exe, that run SC2Switcher.exe from support folder, that run actual game exe from versions directory. PerfHUD is tool that intended to be used by developers of application (device creation C++ code must be altered) and I have special launcher application (written by my good friend) that forces PerfHUD device instead ordinary D3D device. I can change my launcher application and skip several processes before injection, but I was too lazy to do that. ;) So PerfHUD is declined. There is new profiler/debugger from nVidia called Parallel Nsight. This is really good tool, with much more comfortable interface than PerfHUD. But at the time of writing its graphics api support was limited to D3D10 and D3D11, which is not applicable to SC2 (Nsight 2.2, which in release candidate state right now, have limited D3D9 support). ATI GPU PerfStudio doesn't support D3D9 either. I was not familiar with Intel GPA at that time, so only one candidate was left: Microsoft PIX for Windows.

Second problem arises when I looked at shader assembler source code. Usually, when shader is compiled from *.fx files using DirectX HLSL compiler, there is some debug information is left in compiled shader binary. This debug info is showed in PIX shader pane above asm code, and contains preshader info (if compiled using preshaders), and, which is more important, constant names with constant register mapping.

Here an example of shader that I expected to see: http://www.everfall.com/paste/id.php?fgnlr6ffcx0y
Here an example of shaders from SC2 in PIX: http://www.everfall.com/paste/id.php?njt5do17c4bi

As in first and second examples determination of what this wall of code is doing is really problematic, but in first snippet we have constant names, that can really help to understand meaning of code. I was looking at that huge shaders and understand that I need more reliable source of information. I dig into several MPQ files and found  all shader sources. But to use this source code, I need to match draw calls with appropriate shader. Most of the time it is obviously what particular render call is doing, but some times it is not so simple. Meditating upon input textures, vertex format and constant register contents sometimes help. For example, if I see subtract operation where subtracted is constant with value -1.5 (for example), I'll search all shader sources for -1.5 constant, and often find necessary shader right away. If asm listing is not big, I study it in-place without HLSL counterpart.

Some aspects of SC2 renderer is described in http://developer.amd.com/documentation/presentations/legacy/Chapter05-Filion-StarCraftII.pdf paper. I recommend to read it.

SC2 material system was based on ubershader approach with deep C++ integration. Here is example of main pixel shader function with material color calculation: http://www.everfall.com/paste/id.php?bjpc19uks7he. As can be seen, source code contains many boolean constants which defines shading paths branching. Pixel processing subsystem based on so called "layers". Each of them consist of single texture (it is sampler, which is more correct term) and properties of this layer. Here is example how it is implemented: http://www.everfall.com/paste/id.php?5ekjmezzcllx. Shader header uses following layer declaration:

DECLARE_LAYER(Diffuse);
DECLARE_LAYER(Decal);
DECLARE_LAYER(Specular);
DECLARE_LAYER(Emissive);
DECLARE_LAYER(Emissive2);
DECLARE_LAYER(Envio);
DECLARE_LAYER(EnvioMask);
DECLARE_LAYER(AlphaMask);
DECLARE_LAYER(AlphaMask2);
DECLARE_LAYER(Normal);
DECLARE_LAYER(Heightmap);
DECLARE_LAYER(Lightmap);
DECLARE_LAYER(AmbientOcclusion);

Layer setup is performed in shader body using macro SETUP_LAYER (example of this you can see in code in previous link). DX effect system is not used. Chapter05-Filion-StarCraftII.pdf paper says that this material system is designed to be most comfortable for developers. I cannot see C++ part (I don't have SC2 source code:) ) but shader part is very interesting.

Scene rendering started from shadow map preparation. It is 2048x2048 pixels in size. NULL render target hack is not used. Instead color buffer is used to render semitransparent objects from light point of view to produce colored shadows later. Semitransparent objects also have own shadow map with same size. Perspective distortion (PSM, TSM, etc.) and cascade techniques is not used here for obvious reasons. Camera is never looking along ground plane, and directed to 60-70 degrees to terrain all the time. Using shadow texel density improvements is not needed for such point of view.


Shadow map example
Colored shadows buffer
Next, if it is zerg on the map, creep (that brown thing, that surround zerg base) texture preparation is performed. If you played SC2, maybe you noticed that surface of creep is moving. This is very simple effect. Using precalculated noise texture (which is actually heightmap), shader selects several bright spots. Then renderer converts result heightmap into tangent space normal map which is used to light creep on the ground.
Original heightmap

Resulted heightmsp

Normal map from heightmap
Note, that original heightmap have seamless tiling on every direction, so we have tiled normal map.
Next going main scene renderer. MRT configuration with 3 render targets is using. Format of RT's is ARGB16F. First render target have scene image with global light sources (sun) applied to it.


Second render target contains scene normals in camera space.


And diffuse color in third RT.


SC2 renderer have combined lighting subsystem. Global light sources (such as sun) and sources that cover big screen portions are applied in fill G-Buffer pass (e.g. forward shading). Small light sources applied later, using additive blending (e.g. deferred shading). Deferred light sources uses stencil masking to optimize fill rate numbers. Stencil buffer is cleared, and before each deferred light source rendering, corresponding footprint is drawn (sphere for omni lights, cone for spot). Firstly all light source surface is marked (using stencil increment operator), then stencil is decremented for pixels where back faces of footprint is not occluded by scene geometry (z-pass). After that we have marked area, that will be influenced by particula light source. There are limited count of forward light sources (because there are limited numbers of shader constants):
#define MAX_DIRECTIONAL_LIGHTS 3
#define MAX_POINT_LIGHTS 5
#define MAX_SPOTLIGHTS 5
But there are unlimited count of deferred light sources.

Before G-Buffer fill pass, SC2 render performs depth prepass. In this pass depth buffer is filled by ground patches. Why only ground? Irregularity of terrain surface, can cover relatively big areas, and other objects are small. Using depth buffer with terrain data can save plenty of pixel shader work later at small DIP overhead.

Main pass shaders are very long. There are shadows applications, parallax mapping (clearly visible as craters under destroyed protoss buildings. Implemented parallax occlusion mapping and relief mapping techniques),  fog of war, creep rendering, and forward lighting. Huge number of textures is used. Terrain rendering shaders often cannot fit into instruction limit count, so developers decide to mix all terrain texture layers in separate render target and then use this result texture as terrain diffuse texture in main draw call. All model motions performed by hardware skinning.

Shadows is blurred with percentage closer filtering technique. For sample randomization purposes two-channel texture is used with random uv vectors:
A-channel

L-channel
Fog of war is computed on CPU and stored in small texture:
Fog of war
Semitransparent shadows is completely described in developers paper. I will provide a brief explanation of how it is working. We have opaque objects shadow map, semitransparent objects shadow map, and semitransparent objects color map that rendered from light view (same as shadow maps). When we perform ordinary shadowing, we test if pixel is in shadow using opaque object SM. If it is not in shadow, we perform another test but now using semitransparent object SM. If this test is positive (it is not in shadow from opaque objects, but is in shadow from transparent) we pick color from light view color map, and shade shadow with it. Result effect is easily observed under semitransparent units, or near mineral fields:
Color shadows
 If level contains water, than refraction texture is rendered. Semitransparent objects and particles are rendered last.
After geometry pass, there goes deferred lighting pass, described above.

Next going UI frame rendering, plus selected unit portrait. It seems that this portrait is rendered using path that is used in game engine movies (covered in part 2)
UI frame
There is post-process chain is applied next. Bright pass + blur = bloom. Tone mapping last.
Minimap with unit markers (one draw call - good), creep and fog of war
Minimap
 Element by element and whole interface is done (not very efficient). Text rendered using scaleform GFX engine that packs all captions and appropriate fonts on single texture to optimize draw call count.
Text dynamic texture
Result image
Vertex format is very good. It is almost always 32 byte, which is cache friendly size. Frame draw call count is not very high either. Average scene have 800-1200 DIPs. Scene with zergling limit (400 units) on screen produces 1800-2000 draw calls. All normal maps stored in DXT5 format with channel swizzling technique for better quality.

This research was performed on PC with Core 2 Duo 6550 and GeForce 460GTX. All game settings, except texture quality, was ultra. With texture quality ultra I have PIX crashes with out of memory errors ;)





1 комментарий: