With Windows 10 releasing within 2 months , it brings with it DirectX 12 , but what exactly is DirectX 12 ? And WHY should YOU be excited about it ?
Throughout the course of this article , I will detail what DirectX is , and how D3D 12 will be a massive improvement over D3D 11 and the multiple levels in D3D 12 .
WHAT IS DIRECTX / DIRECT3D ?
Microsoft DirectX is a collection of application programming interfaces (APIs) for handling tasks related to multimedia, especially game programming and video, on Microsoft platforms. Originally, the names of these APIs all began with Direct, such as Direct3D, DirectDraw, DirectMusic, DirectPlay, DirectSound, and so forth. The name DirectX was coined as shorthand term for all of these APIs (the X standing in for the particular API names) and soon became the name of the collection. When Microsoft later set out to develop a gaming console, the X was used as the basis of the name Xbox to indicate that the console was based on DirectX technology
Direct3D (the 3D graphics API within DirectX) is widely used in the development of video games for Microsoft Windows, Sega Dreamcast, Microsoft Xbox, Microsoft Xbox 360, and Microsoft Xbox One. Direct3D is also used by other software applications for visualization and graphics tasks such as CAD/CAM engineering. As Direct3D is the most widely publicized component of DirectX, it is common to see the names “DirectX” and “Direct3D” used interchangeably.
At GDC 2014 Microsoft announced what can be considered the most exciting news for PC Gaming in 2015. The next iteration of Direct3D, version 12. D3D12 is a return to low level programming, it will give more control to game developers and introduce many new exciting features. The D3D12 development team is focused on reducing CPU overhead and increasing stability across CPU cores. The goal with D3D 12 is “Console API efficiency and performance”. Modern console games are able to more effectively use all CPU/GPU cores available to them with great results. In the world of PC gaming Thread 0 often does most, if not all the work while the other threads only handle other OS or system tasks. Few truly multithreaded PC games exist. Microsoft wants to change that with D3D 12. It is a superset of D3D 11 rendering functionality. This means modern GPUs can run D3D 12 as it will make more effective use of the multi core CPUs and GPUs that exist today. No need to spend money on a new GPU to get the advantages of D3D 12.
What makes Direct3D 12 better? First and foremost, it provides a lower level of hardware abstraction than ever before, allowing games to significantly improve multithread scaling and CPU utilization. In addition, games will benefit from reduced GPU overhead via features such as descriptor tables and concise pipeline state objects. And that’s not all – Direct3D 12 also introduces a set of new rendering pipeline features that will dramatically improve the efficiency of algorithms such as order-independent transparency, collision detection, and geometry culling.
3DMark – Multi-thread scaling + 50% better CPU utilization
If you’re a gamer, you know what 3DMark is – a great way to do game performance benchmarking on all your hardware and devices. This makes it an excellent choice for verifying the performance improvements that Direct3D 12 will bring to games. 3DMark on Direct3D 11 uses multi-threading extensively, however due to a combination of runtime and driver overhead, there is still significant idle time on each core. After porting the benchmark to use Direct3D 12, we see two major improvements – a 50% improvement in CPU utilization, and better distribution of work among threads.
Tested on GIGABYTE BRIX Pro (Intel Core i7-4770R + Iris Pro Graphics 5200)
Where does this performance come from?
Direct3D 12 represents a significant departure from the Direct3D 11 programming model, allowing apps to go closer to the metal than ever before. This was accomplished by overhauling numerous areas of the API. I am providing an overview of three key areas: pipeline state representation, work submission, and resource access.
Pipeline state objects
Direct3D 11 allows pipeline state manipulation through a large set of orthogonal objects. For example, input assembler state, pixel shader state, rasterizer state, and output merger state are all independently modifiable. This provides a convenient, relatively high-level representation of the graphics pipeline, however it doesn’t map very well to modern hardware. This is primarily because there are often interdependencies between the various states. For example, many GPUs combine pixel shader and output merger state into a single hardware representation, but because the Direct3D 11 API allows these to be set separately, the driver cannot resolve things until it knows the state is finalized, which isn’t until draw time. This delays hardware state setup, which means extra overhead, and fewer maximum draw calls per frame.
Direct3D 12 addresses this issue by unifying much of the pipeline state into immutable pipeline state objects (PSOs), which are finalized on creation. This allows hardware and drivers to immediately convert the PSO into whatever hardware native instructions and state are required to execute GPU work. Which PSO is in use can still be changed dynamically, but to do so the hardware only needs to copy the minimal amount of pre-computed state directly to the hardware registers, rather than computing the hardware state on the fly. This means significantly reduced draw call overhead, and many more draw calls per frame.
Command lists and bundles
In Direct3D 11, all work submission is done via the immediate context, which represents a single stream of commands that go to the GPU. To achieve multithreaded scaling, games also have deferred contexts available to them, but like PSOs, deferred contexts also do not map perfectly to hardware, and so relatively little work can be done in them.
Direct3D 12 introduces a new model for work submission based on command lists that contain the entirety of information needed to execute a particular workload on the GPU. Each new command list contains information such as which PSO to use, what texture and buffer resources are needed, and the arguments to all draw calls. Because each command list is self-contained and inherits no state, the driver can pre-compute all necessary GPU commands up-front and in a free-threaded manner. The only serial process necessary is the final submission of command lists to the GPU via the command queue, which is a highly efficient process.
In addition to command lists, Direct3D 12 also introduces a second level of work pre-computation, bundles. Unlike command lists which are completely self-contained and typically constructed, submitted once, and discarded, bundles provide a form of state inheritance which permits reuse. For example, if a game wants to draw two character models with different textures, one approach is to record a command list with two sets of identical draw calls. But another approach is to “record” one bundle that draws a single character model, then “play back” the bundle twice on the command list using different resources. In the latter case, the driver only has to compute the appropriate instructions once, and creating the command list essentially amounts to two low-cost function calls.
Descriptor heaps and tables
Resource binding in Direct3D 11 is highly abstracted and convenient, but leaves many modern hardware capabilities underutilized. In Direct3D 11, games create “view” objects of resources, then bind those views to several “slots” at various shader stages in the pipeline. Shaders in turn read data from those explicit bind slots which are fixed at draw time. This model means that whenever a game wants to draw using different resources, it must re-bind different views to different slots, and call draw again. This is yet another case of overhead that can be eliminated by fully utilizing modern hardware capabilities.
Direct3D 12 changes the binding model to match modern hardware and significantly improve performance. Instead of requiring standalone resource views and explicit mapping to slots, Direct3D 12 provides a descriptor heap into which games create their various resource views. This provides a mechanism for the GPU to directly write the hardware-native resource description (descriptor) to memory up-front. To declare which resources are to be used by the pipeline for a particular draw call, games specify one or more descriptor tables which represent sub-ranges of the full descriptor heap. As the descriptor heap has already been populated with the appropriate hardware-specific descriptor data, changing descriptor tables is an extremely low-cost operation.
In addition to the improved performance offered by descriptor heaps and tables, Direct3D 12 also allows resources to be dynamically indexed in shaders, providing unprecedented flexibility and unlocking new rendering techniques. As an example, modern deferred rendering engines typically encode a material or object identifier of some kind to the intermediate g-buffer. In Direct3D 11, these engines must be careful to avoid using too many materials, as including too many in one g-buffer can significantly slow down the final render pass. With dynamically indexable resources, a scene with a thousand materials can be finalized just as quickly as one with only ten.
Since its announcement, graphics manufacturers such as Intel, AMD, NVIDIA and even Qualcomm have claimed that the hardware they built will fully support DirectX 12 capabilities. We have come a long way since the revelation and bits by bits, we get more information regarding the different feature levels and advantages of the API. The first thing we knew about DirectX 12 API was that it was built for low CPU overhead by reducing the CPU bottleneck faced by the graphics card and fully utilize the processor to its maximum potential. The other features included more control over hardware by the developers and a fully asynchronous computing method which is faster and efficient.
The reference DirectX 12 API (Feature Level 11_0) has performance targeted features while the other two level deliver graphical improvement and this is what really matters in improving the games visually. The feature level 12_0 comes with Tiled Resources, Typed UAV Access and Bindless Textures support. Feature Level 12_1 has the Raster Order Views, Conservative Raster and Volume Tiled Raster enabled on the API. We have talked about these features in previous articles detailing the performance improvement, explicit multi-adapter technology and graphical updates. But to enable these new technologies, the hardware built by companies need to be full compliant with it.
Rasterizer Ordered Views
First and foremost of the new features is Rasterizer Ordered Views (ROVs). As hinted at by the name, ROVs is focused on giving the developer control over the order that elements are rasterized in a scene, so that elements are drawn in the correct order. This feature specifically applies to Unordered Access Views (UAVs) being generated by pixel shaders, which buy their very definition are initially unordered. ROVs offers an alternative to UAV’s unordered nature, which would result in elements being rasterized simply in the order they were finished. For most rendering tasks unordered rasterization is fine (deeper elements would be occluded anyhow), but for a certain category of tasks having the ability to efficiently control the access order to a UAV is important to correctly render a scene quickly.
The textbook use case for ROVs is Order Independent Transparency, which allows for elements to be rendered in any order and still blended together correctly in the final result. OIT is not new – Direct3D 11 gave the API enough flexibility to accomplish this task – however these earlier OIT implementations would be very slow due to sorting, restricting their usefulness outside of CAD/CAM. The ROV implementation however could accomplish the same task much more quickly by getting the order correct from the start, as opposed to having to sort results after the fact.
Along these lines, since OIT is just a specialized case of a pixel blending operation, ROVs will also be usable for other tasks that require controlled pixel blending, including certain cases of anti-aliasing.
Typed UAV Load
The second feature coming to Direct3D is Typed UAV Load. Unordered Access Views (UAVs) are a special type of buffer that allows multiple GPU threads to access the same buffer simultaneously without generating memory conflicts. Because of this disorganized nature of UAVs, certain restrictions are in place that Typed UAV Load will address. As implied by the name, Typed UAV Load deals with cases where UAVs are data typed, and how to better handle their use.
Volume Tiled Resources
The third feature coming to Direct3D is Volume Tiled Resources. VTR builds off of the work Microsoft and partners have already done for tiled resources (AKA sparse allocation, AKA hardware megatexture) by extending it into the 3rd dimension.
VTRs are primarily meant to be used with volumetric pixels (voxels), with the idea being that with sparse allocation, volume tiles that do not contain any useful information can avoid being allocated, avoiding tying up memory in tiles that will never be used or accessed. This kind of sparse allocation is necessary to make certain kinds of voxel techniques viable.
Last but certainly not least among Direct3D’s new features will be conservative rasterization. Conservative rasterization is essentially a more accurate but performance intensive solution to figuring out whether a polygon covers part of a pixel. Instead of doing a quick and simple test to see if the center of the pixel is bounded by the lines of the polygon, conservative rasterization checks whether the pixel covers the polygon by testing it against the corners of the pixel. This means that conservative rasterization will catch cases where a polygon was too small to cover the center of a pixel, which results in a more accurate outcome, be it better identifying pixels a polygon resides in, or finding polygons too small to cover the center of any pixel at all. This in turn being where the “conservative” aspect of the name comes from, as a rasterizer would be conservative by including every pixel touched by a triangle as opposed to just the pixels where the tringle covers the center point.
Conservative rasterization is being added to Direct3D in order to allow new algorithms to be used which would fail under the imprecise nature of point sampling. Like VTR, voxels play a big part here as conservative rasterization can be used to build a voxel. However it also has use cases in more accurate tiling and even collision detection.
Before continuing , I would like to remind you all of something .
Amidst all this hubub regarding DX 12 , most of us have forgotten about another important announcement Microsoft has made : D3D 11.3
Microsoft had announced that there will be a new version of Direct3D 11 coinciding with Direct3D 12. Dubbed Direct3D 11.3, this new version of Direct3D is a continuation of the development and evolution of the Direct3D 11 API and like the previous point updates will be adding API support for features found in upcoming hardware.
At first glance the announcement of Direct3D 11.3 would appear to be at odds with Microsoft’s development work on Direct3D 12, but in reality there is a lot of sense in this announcement. Direct3D 12 is a low level API – very powerful, but difficult to master and very dangerous in the hands of inexperienced programmers. The development model for Direct3D 12 is that a limited number of coders will be the ones writing the engines and renderers that target the new API, while everyone else will build on top of these engines. This works well for the many organizations that are licensing engines such as UE4, or for the smaller number of organizations that can justify having such experienced programmers on staff.
However for these reasons a low level API is not suitable for everyone. High level APIs such as Direct3D 11 do exist for a reason, their abstraction not only hides the quirks of the underlying hardware, but it makes development easier and more accessible as well. For these reasons there is a need to offer both high level and low level APIs. Direct3D 12 will be the low level API, and Direct3D 11 will continue to be developed to offer the same features through a high level API.
The following sections describe the functionality has been added in Direct3D 11.3. Note that these features are also available in Direct3D 12.
|Adaptive Scalable Texture Compression||ASTC provides developers with greater control over the size verses quality tradeoff with textures. ASTC is a lossy format, but one that is designed to provide an inexpensive route to greater quality textures. The idea is that a developer can choose the optimum format without having to support multiple compression schemes.|
|Conservative Rasterization||Conservative rasterization adds some certainty to pixel rendering, which is helpful in particular to collision detection algorithms.|
|Default Texture Mapping||The use of default texture mapping reduces copying and memory usage while sharing image data between the GPU and the CPU. However, it should only be used in specific situations. The standard swizzle layout avoids copying or swizzling data in multiple layouts.|
|Rasterizer Order Views||Rasterizer ordered views (ROVs) allow pixel shader code to mark UAV bindings with a declaration that alters the normal requirements for the order of graphics pipeline results for UAVs. This enables Order Independent Transparency (OIT) algorithms to work, which give much better rendering results when multiple transparent objects are in line with each other in a view.|
|Shader Specified Stencil Reference Value||Enabling pixel shaders to output the Stencil Reference Value, rather than using the API-specified one, enables a very fine granular control over stencil operations.|
|Typed Unordered Access View Loads||Unordered Access View (UAV) Typed Load is the ability for a shader to read from a UAV with a specific DXGI_FORMAT.|
|Unified Memory Architecture||Querying for whether Unified Memory Architecture (UMA) is supported can help determine how to handle some resources.|
|Volume Tiled Resources||Volume (3D) textures can be used as tiled resources, noting that tile resolution is three-dimensional.|
Now lets head on back to D3D12 and its feature levels .
Direct3D 10.1 API introduces a concept of “feature levels”which encapsulate features of the hardware supported in a particular version of the API, with separate levels for 10.0 and 10.1 hardware. In previous releases of the Direct3D API, certain capabilities of the graphics hardware have been synonymous with main revision number of the API.
Direct3D 12 requires graphics hardware conforming to feature levels 11_0 and 11_1 which support virtual memory address translations. It introduces a revamped resource binding model, allowing explicit control of memory using descriptor heaps and tables. This model is supported on majority of existing desktop GPU architectures and requires WDDM 2.0 drivers. Supported hardware is divided into three Resource Binding tiers, which define maximum numbers for descriptor heaps used for CBV (constant buffer view), SRV (shader resource view) and UAV (unordered access view); CBVs and SRVs per pipeline stage; UAVs for all pipeline stages; samplers per stage; and SRV descriptor tables.
|Direct3D 12 feature levels|
|Level||Driver model||Mandatory features||Optional features||Conforming GPUs|
|11_0||WDDM 2.0 or later||All 11_0 features from Direct3D 11, resource binding Tier 1, UAV only rendering with force sample count, constant buffer offsetting and partial updates.||Logical blend operations||Nvidia GeForce GTX 400/500/600/700 series (Fermi/Kepler), GTX 745/750 series (Maxwell, 1st gen)|
|11_1||Logical blend operations, target-independent rasterization, UAVs at every stage with increased slot count.||Resource binding (three tiers), tiled resources (three tiers), conservative rasterization (three tiers), stencil reference in Pixel Shader, rasterizer ordered views, typed UAV loads for additional formats, UMA/hUMA support||AMD HD 7700-7900/8500-8900, Rx 240/250/265/270/280 series (GCN 1.0);|
Intel HD Graphics 4200-5200 (7.5 gen, Haswell), 5300-6300 (8 gen, Broadwell)
|12_0||Resource Binding Tier 2, Tiled Resources Tier 2 (Texture2D), Typed UAV Loads (additional formats).||AMD HD 7790/8770, Rx 260/290 series, Xbox One (GCN 1.1), R9 285 (GCN 1.2);|
|12_1||Conservative Rasterization Tier 1, Rasterizer Ordered Views.||Nvidia GeForce 900 series (Maxwell, 2nd gen);|
Nvidia GeForce 1000 series (Pascal)
In simple terms , the MAJOR improvements of D3D12 are noticed in the following two levels :
12_0 -> CPU optimizations : Much like Mantle , it will remove any CPU bottlenecks .
12_1 -> New “real” rendering features and GPU optimizations .
Now , as you can see from the table above , Only Maxwell 2nd Gen GPU’s and Pascal will get 12_1 while AMD 7790/8770, Rx 260/290 series, Xbox One (GCN 1.1), R9 285 (GCN 1.2) and Maxwell 2nd Gen will get 12_0 . Rest of GCN 1.0 , Fermi , Kepler , Maxwell(1st gen) , Maxwell (2nd gen) , Haswell(7.5 gen) , Broadwell (8 gen) will only get support till 11_1
So should you be worried if your card does not feature support for 12_1 ?
Short answer : NO . The absence of the feature level 12_1 is no problem in principle. The key for games features are all included in the feature level 11_1 and 12_0. In addition, most games would already geared to the console and thus hardly rely on 12_1.
Not only that , Games using even Feature level 12_0 are yet to be even announced , and I personally think that such games will not release before at least Q3 2016 . Developers will mostly stick to DX11( d3d 11.3) as I said above using a low level API like D3D12 is risky and complex therefore most developers will tend to stay away from it as of now , and support for 12 will mostly be via after release patches ( much like Witcher 3 , Arkham Knight ) .
The release of DX12 and D3D 11.3 is great for the PC industry in general as the potential for improved performance and visual improvements is quite clear. With VULKAN( Vulkan is a low-level, cross-platform graphics API by Khronos Group) and DX12 , one thing is for certain now, Low Level API’s have finally reached a position where they are being utilized in a manner never seen before and this might be the first true step into the next generation .