Parallelization in Games: Codemasters Colin McRae Dirt 2
I am always on the hunt for gamedev industry parallelization approaches deployed in computer and video games. Typically I scribble down some bullet points about game parallelization infos I find to use as a reference later on - so: why not share it on the web? Here we start:
Todays issue of the Intel Visual Adrenaline Magazine (issue 4, 2009, web, pdf) contains the article The Muddy Beauty of DiRT* 2*: Where rally realism meets the raw road by Jon Jordan about Codemasters Colin McRae DiRT 2 ralley/driving game with some bits of technical details provided by Codemasters Gareth Thomas (senior graphics programmer).
DIRT2_morocco_04a picture by Codemasters, on Flickr
Following is a short overview of the parallelization info revealed in the article:
- Physics engine runs at minimum of 60 Hz - critical parts of it are simulated at 1,000 Hz
- PlayStation 3 and Xbox 360 as lead platforms, single- and multi-core PCs are also supported
- A tweaked version of Codemasters EGO game engine provides the main multi-threading and parallelization framework
- Task parallelism: rendering happens on one core, game logic and physics on another/other core(s)
- Tasks can be split off to other cores if available (It is unclear to me if the rendering, logic and physics tasks are meant here, or if smaller tasks from within these systems are described to allow scaling of performance or eye-candy with the number of cores)
- Processes are parallelized depending on the number of cores and threads of the host platform
- Worker maps specify how many threads the different subsystems use and the mapping between threads and actual cores
- XML files describe the worker maps for different platforms and parallel hardware configurations based on profiling
- In combination with different graphics resolutions for different systems, and enabling and disabling certain features like real-time environment maps on cars or dynamic shadows, worker maps enable hand-tuned scaling from lower-end to top-of-the-line machines
- Occlusion culling is done via offline pre-computed potentially-visible-set tables and real-time lookup into these tables
- At the beginning of a rendering frame visibility queries for main scene, shadows, and reflections are invoked on multiple threads. Results determine terrain and graphical object resultion/level-of detail and which hidden objects to cull
Summary and comments
Intel Visual Adrenaline Magazine articles typically don't delve deeply into technical details - and this article is no exception.
From the information that is revealed it looks like Codemasters EGO engine is a classical task parallel solution which models certain functional components of games (e.g. graphics, physics, game logic, etc.) as tasks. These tasks are statically mapped to specific threads/cores while satisfying inter-task (data-) dependencies. The mapping is controlled via XML files called worker maps.
I would have liked to get more information how the different tasks are distributed to cores and threads to learn if the parallelization approach described could scale to increasing core counts or where the scaling limits lie. The worker maps and the description of the task parallel system indicate a strong thinking in threads and a coarse-grained approach to parallelism (whole functional components/tasks are subject of threading). While this gains strong control of hardware-usage (which task to map to which core/thread) on todays multi-core machines it often also serverly limits scalability with increasing core counts.
In addition to the platform-static threading, other operations (e.g. visibility queries) can be invoked on available service threads. Depending on the am ount of operations/queries this can easily scale with core/thread count. The article doesn't explain if the queries are handled as jobs and entered into a job pool which then schedules them to available worker threads created at game start-time and destroyed at game-end-time, or if threads are created on the fly per query (which isn't advisable as thread creation and destruction is quite computationally expensive when done too often per main loop cycle).
I would be delighted if anyone with more legally shareable insight into the parallelization of Codemasters EGO engine could comment on the open questions and perhaps provide even more detailed insights into the technology behind the physics sim heavy and visually spectacular DiRT 2 game.
Bjoern