Parallelization of Game AI - Presentation from the Paris Game AI Conference '09
In June 2009 I gave an overview and motivation for Parallelization of Game AI at the Paris Game AI Conference '09. The conference was a blast - great discussions with many interesting people and inspiring presentations - look out for the next installment, I will do so for sure!
Alex and Petra Champandard from AiGameDev.com, the main organizers and master-minds behind the Paris Game AI Conferences, just published the video and audio recordings of my presentation and put it online for AiGameDev.com premium members. Petra did a great job editing the recordings and put in a lot of hard work to synchronize my talk with the downloadable slides.
Video and slides at AiGameDev.com:
- public presentation website with slides (w/o notes)
- premium members only website with video and audio recordings
Slides with speaker notes can also be found at bjoernknafla.com:
- public slides (with notes)
Slides-movie and brief presentation recap
I find it extremely awful to watch and listen to myself / my presentation. I worked very hard on the talk and created slides that were full of animations to explain the presented parallelization concepts - making the slides a kind of script to follow during speaking. A script that needs accurate transitions and explanations under time pressure to stay inside the presentation time slot. This rigid structure to follow in combination with my nervousnes lead to lots of "ahs" and "errs" and let me struggle with my English speaking skills. To amend for th is, I created a quicktime movie from the slides showing the actual animations and added a coarse talk outline below. This in combination with the speaker note extended slides and the video recording at AiGameDev.com should give you much of the useful information of my talk.
Introduction
- Why we need parallel programming: parallel hardware, performance.
- Parallel programming isn't easy - but no rocket-science either:
- parallel programming errors: race conditions, deadlocks,
- performance problems: contention,
- be aware that order of parallel instruction execution isn't deterministic.
- Parallel hardware dictates how to program for performance:
- maximize throughput - to fight memory latency,
- maximize memory locality - to save bandwidth,
- minimize (false) sharing - to prevent contention,
- combined: balance maximization of locality vs. minimization of sharing.
- Artificial intelligence (AI) model referenced in the talk: agent-based AI.
References:
- David Callahan, Design Considerations For Parallel Programming, MSDN Magazine, 2008
- Vincent Scheib, Multi-Platform Multi-Core Architecture Comparison, Beautiful Pixels blog, 2008
Parallelization approaches
Asynchronous calls for pathfinding and other service calculations
- Run computations asynchronously.
- Better memory locality as async computations work on their own data.
- Control iterations and when to merge result if determinism is important.
- Uncontrollable scaling with raw threads (over-, under-subscription of hardware and performance penalties for numerous thread creation and destruction per main loop cycle, and context switching).
- Take away: don't use raw threads but task pools or other constructs to express concurrency but not manage it by hand.
References:
- James Reinders, Intel Threading Building Blocks, O'Reilly, 2007 (cross-platform C++ task pool by Intel)
Parallel agents
- Unorganized parallel execution of agents needs synchronization of shared data.
- Unstructured memory accesses and unorganized syncing:
- saturates memory bandwidth,
- is bad for data locality,
- is not deterministic, and
- potentially leads to contention and performance degradation.
- Game entity management is ownership management (think netword and distributed programming).
- Defer all non-read accesses: collect write requests and deterministically resolve conflicts (e.g. based on entity ids).
- Identify data:
- dependencies,
- reading and writing access,
- public data readable by all entities,
- private data only accessible by entity owning it.
- Execute entity updates in two stages:
- collection stage: all entities read public data (from last game world state) in parallel (without side effects),
- barrier synchronization waits on all agents to finish their collection stage, followed by the
- instance update stage: all entities update their private entity data in parallel.
- Think about double buffering for public game/entity states.
- Hint: for determinism and if needed use a random number generator per entity instance.
- Stage internal: unstructured (read-)access to data can result in bad locality and eventually saturates the memory bus (bandwidth and latency problems).
Reference:
- Bjoern Knafla and Claudia Leopold, Parallelizing a Real-Time Steering Simulation for Computer Games with OpenMP, ParCo2007, 2007
Data parallel systems
- Each collection stage of parallel agents is different because of different agent state and surrounding game world.
- Each agent works on different aspects of its behavior in individual order and duration: gameplay logic, sensing, pathfinding, terrain analysis, etc.
- As already shown - this is bad for data locality and memory bandwidth usage.
- Slice agents into aspects and group all aspects of the same type into systems.
- Result: no single update function per agent but an update function per aspect/system.
- Independent systems can run in parallel to each other.
- Integration of agents aspects computations is defered based on the order of system execution per main loop cycle.
- Systems can internally optimize their data: enables efficient parallel processing.
- Systems run through stages:
- Collect data and update requests.
- Prepare parallel computation on data and update requests.
- Compute and transform aspect data in parallel (data-parallel computation).
- Publish results (pull, or push, or message passing, double buffering).
- Systems enable lean data structures with good locality.
- Data-parallel computations typically scale well with increasing processor core counts.
- Order-irrelevant data processing enables determinism.
- Systems might need internal low-level parallelism and synchronization.
References:
- Joe Valenzuela, GDC09 - Gameplay Systems on the SPU, Insomniac Games R&D, 2009 (slides from GDC 2009)
- Michiel van der Leeuw, The PlayStation 3's SPUs in the Real World - A Killzone 2 Case Study, Guerilla Games, 2009 (slides from GDC 2009)
- Jeff Andrews, Designing the Framework of a Parallel Game Engine, Intel Software Network, 2008
- Mike Acton, Performance and Good Data Design, CellPerformance website, 2008
Conclusion
Know your parallel hardware, program for throughput, organize memory access, balance maximization of locality vs. minimization of sharing. Use the parallelization approaches as guides, and always remember: it is easy to introduce hard to find parallel errors - therefore: KISS - keep it short and simple.
References and further reading:
- Julien Hamaide, Parallelism in AI: Multithreading Strategies and Opportunities for Multi-core Architectures, Fishing Cactus, 2009 (slides from AI Summit GDC 2009)
- Jon Olick, Current and Next Generation Parallelism in Games, id Software, 2008 (slides from Siggraph 2008: Beyond Programmable Shading)
Outlook
I am planning a number of blog articles to delve more deeply into the parallelization of games and game AI, and to take a look at parallelization approaches I had to drop from the talk because of time constraints, like:
- Pipelined main loop (see Christer Ericson's and Nick Porcino's articles).
- Actor model for agents (see Wikipedia actor model).
- Stream processing (take a look at Emergent's Floodgate technology).
- GPGPU - General-Purpose Computation on Graphics Processing Units (see Khronos group OpenCL standard).
- Combining all approaches
If you have got questions, feedback, criticism, then feel free to leave a comment or contact me via email or twitter. I am looking forward to hearing from you!
Bjoern


Comments [4]