amp - a cross-platform thread library

I am happy and proud to present you: amp – my open source, BSD licensed, cross-platform thread library.

amp is written in C and developed in a test-driven way. It’s interface is inspired by POSIX threads. All functions and data structures are documented via Doxygen comments.

You can find, download, fork the code, or read more about amp here: http://github.com/bjoernknafla/amp.

amp came into existence because I wanted to be shielded from the subtle and sometimes not so subtle differences between the threading solutions of different operating systems, namely Windows threads from Windows XP to Windows 7, and POSIX threads, also called Pthreads, that are offered by Linux and Mac OS X. For example Windows XP has no native support for so called condition variables, while Mac OS X seems to be missing the pre-Pthreads but nonetheless POSIX standardized “sem” semaphores. amp supports condition variables even on Windows XP and implements semaphores for Mac OS X 10.6 via Apple’s libdispatch library.

Screenshot of amp's main header

I have build and tested amp with Xcode 3.2.x on Mac OS X 10.6 and with Microsoft Visual Studio 2008 (MSVC 2008 for short) on Windows XP but currently do not have access to Windows 7 or MSVC 2010 nor a Linux machine to test on. An Xcode project and a simple MSVC 2008 solution are included. A build-system for Linux (e.g. makefile) is currently missing. The Xcode project creates an embeddable framework or a static lib. On Windows the MSVC 2008 solution builds a static lib for now.

Following I’ll give a short overview of amp’s threading primitives, the default and the raw usage model, the allocator and error handling concept, and finally provide an outlook for what is planned next.

Supported Threading Primitives

The following threading primitives are included in version 0.1.0 of amp:

  • amp_platform – query the system, if it supports it, for the number of installed and/or active processor cores or hardware threads per core (see simultaneous multithreading (SMT)).
  • amp_thread – launch and join with threads (independent streams of instructions executing concurrently or parallel) running functions taking a user supplied context pointer.
  • amp_thread_array – configure and then launch and finally join with a whole set of threads. This primitive comes in handy when needing a bunch of threads for testing.
  • amp_thread_local_store – manage storage that is specific to each thread (see thread-local storage).
  • amp_semaphore – a semaphore is a kind of “guard” which allows a certain number of threads to pass while others need to wait till at least one of the passed threads tells the semaphore to let another thread through.
  • amp_mutex – a mutex protects a section of code – the so called critical section – from access by more than one thread at the same time. This prevents uncontrolled and uncoordinated changes of data by multiple threads at once. Such unsynchronized changes can and often will result in invalid data and broken state invariants or pre-/post-conditions – this effect is a parallel programming bug also called race condition.
  • amp_condition_variable – condition variables enables a thread which successfully locked a mutex but finds out that the protected data doesn’t satisfy a certain condition to go to sleep and leave the mutex until another thread (re-)establishes the condition. This other thread locked the mutex itself, changed the data and then signals the one thread or even broadcasts to multiple waiting threads to wake up and try to re-lock the mutex (after the signalling thread left and unlocked the mutex) to check again if the condition they are waiting for is met.
  • amp_barrier – threads wait on a barrier until a pre-defined number of waiting threads is reached. On that event the barrier opens for the waiting threads to pass it. Barriers help groups of threads to go through certain steps of a computation in a coordinated fashion. For example during one step all threads only read from shared memory and during the next step each thread writes independent and disjoint changes to memory based on the read-step data it observed before.

To get an impression how to use these threading primitives take a look at the tests that belong to amp.

These are low-level primitives targeted at implementing higher-level parallel abstractions and I warn beginners of parallel programming from using them or other platform native threading solutions directly other than for learning.

Standard and Raw Usage

amp provides two ways to use it: standard and the raw . Select the approach that fits your needs the best.

In the standard way you call primitive specific create and destroy functions to, well, create and destroy primitives dynamically on the heap while you are completely shielded from the actual platform backend used to implement the functionality. This means that by including the standard amp\amp.h header you don’t include any platform dependencies.

amp_mutex_t mutex = AMP_MUTEX_UNINITIALIZED;
int return_code = amp_mutex_create(&mutex, allocator);
assert(AMP_SUCCESS == return_code);

/* ... some code ... */

return_code = amp_mutex_lock(mutex);
assert(AMP_SUCCESS == return_code);
{
    /* ... some more code ... */
}
return_code = amp_mutex_unlock(mutex);
assert(AMP_SUCCESS == return_code);

/* ... code, code, code ... */

return_code = amp_mutex_destroy(&mutex, allocator);
assert(AMP_SUCCESS == return_code);

By going the raw way you include amp/amp_raw.h and, based on your target build platform, automatically include the platforms headers and dependencies necessary to provide amp’s functionality. Now you can store raw amp data structures on the stack and don’t need any dynamic memory handling at all. These raw data structures are initialized via ‘init’ and finished via ‘finalize’ function calls.

struct amp_raw_mutex_s mutex;
int return_code = amp_raw_mutex_init(&mutex);
assert(AMP_SUCCESS == return_code);

/* ... some code ... */

return_code = amp_mutex_lock(mutex);
assert(AMP_SUCCESS == return_code);
{
    /* ... some more code ... */
}
return_code = amp_mutex_unlock(mutex);
assert(AMP_SUCCESS == return_code);

/* ... code, code, code ... */

return_code = amp_raw_mutex_finalize(&mutex);
assert(AMP_SUCCESS == return_code);

Hint: only the init and finalize functions for specific amp primitives contain raw in their function name.

Allocator Concept

Aside the platform internal thread API calls, all of amp’s dynamic memory allocation and freeing happens via user configurable allocators. All create and destroy functions take an allocator argument. By using your own allocator instead of the default one you can channel all memory allocations and deallocations through your own functions to get maximum control and the ability to monitor memory use. The only catch: amp internally calls backend specific threading functionality which might itself call the systems malloc or free and therefore bypasses the amp allocator.

amp_allocator_t my_allocator = AMP_ALLOCATOR_UNINITIALIZED;
int return_code = amp_allocator_create(&my_allocator, /* target allocator */
    AMP_DEFAULT_ALLOCATOR,  /* source allocator to get memory for target allocator */
    my_allocator_context, /* user supplied context, e.g. a memory pool */
    my_alloc_func,
    my_calloc_func,
    my_dealloc_func
    );

/* ... some code ... */

struct my_struct_s *data = (struct my_struct *)AMP_ALLOC(my_allocator, sizeof(struct my_struct_s));

amp_thread_t my_thread = AMP_THREAD_UNINITIALIZED;
return_code = amp_thread_create(&my_thread, my_allocator);

Error Handling

All threading platforms have their own approach for error handling. amp follows the Pthreads POSIX way of error reporting. Most functions return a return code type while query or return values are handled via function arguments.amp defines its own return and error codes to unify error handling and to shield the user from platform specific details and dependencies. Unfortunately, the fine-grained error differentiation of some platforms is lost as amp simplifies the list of error codes. For example POSIX threads EINVAL and EAGAIN error codes are both mapped to AMP_ERROR.

amp_semaphore_t sema = AMP_SEMAPHORE_UNINITIALIZED;
int return_code = amp_semaphore_create(&sema, AMP_DEFAULT_ALLOCATOR, 1);
if (AMP_SUCCESS != return_code) {
    switch (return_code) {
    case AMP_NOMEM:
        /* Out of memory ... */
        break;
    case AMP_ERROR:
        /* Eventually the maximum semaphore count of the platform has been reached ... */
    default:
        /* Unexpected error code indicating a programming error ... */
    }
}

Next Steps

In the future I want to extend amp with the following features:

  1. Atomic operations and memory fences to support lock-free parallel programming for Mac OS X, Windows, the gcc compiler, and x86 32bit and 64bit architectures.
  2. Implementation of an additional backend that uses the atomic operations to provide all amp primitives while minimizing thread context switching which happens in many cases when using the native thread synchronization facilities.
  3. Software thread to processor core or hardware thread affinity to hint to the operating system that a certain group of software threads should stay together on a certain tile of processor cores which share a cache or local memory. This also involves extending the platform detection to find out more about the memory hierarchy and organization of hardware to guide the affinity creation.
  4. Clean implementation of thread ids instead of the currently internal thread ids which are implemented with a non-portable hack when using the Pthreads backend.
  5. Add a makefile or a CMake build configuration to finally port amp to Linux and BSD, etc.
  6. Additional mutex types that support recursive locking and locking functions with timeouts (unblock the thread waiting to lock a mutex after a certain amount of time elapsed and indicate this via a return code).
  7. Trywait (return with a special return code if waiting on a semaphore would block the calling thread) and timeout support for the semaphore wait function.
  8. Adding a native Pthreads backend for the amp barrier primitive.
  9. Add a solution/project file for MSVC 2010.
  10. Offer C++ wrappers or at least C++ RAII lock guards for mutexes and semaphores.
  11. Rewrite the documentation to be more concise and consistent.

Get in Touch

That’s it for the first introduction to amp. Once more (advertising in progress: amp is open source and you can get it here: http://github.com/bjoernknafla/amp.

If you have got any questions, criticism, feedback, or feel like saying “hello”, then please send me an email to amp@bjoernknafla.com or contact me via twitter @bjoernknafla.

Filed under  //

Comments [0]

Context, context, context - and locality - how not to lose your mind in parallel times

Alex Champandard from AiGameDev.com ranted about the negative sides of Singletons on Twitter:

Been thinking about the best argument against Singletons. It's down to assumptions, they make an ass out of your code now and bite me later.

There have been lots of discussions and battles regarding the pros and cons of Singletons and I don't want to incite them here again. Singleton's are a tool to use and as every tool they have their positive and negative sides.

Read the rest of this post »

Filed under  //

Comments [0]