My fork of his code is to be found here and contains the material for this post and the up and coming post. In this post I break down the gradual steps even further, discuss the reasoning behind each step and compare the impact on performance. His repository includes the gradual improvements made, which is unusual in software, and offers us some insights into the optimization process. It's a single producer/consumer queue that is very fast indeed. Working through the improvements made and their respective impact.īack in November, Martin Thompson posted a very nice bit of code on GitHub to accompany his Lock Free presentation. ![]() If (CAS(ref, next, new pointer_t(node, unt + 1)))Įlse // tail was not pointing to last nodeĬAS(ref Tail, tail, new pointer_t(next.ptr, tail.Reading through a fine tuned lock free single producer/consumer queue. Allocate a new node from the free list If (CAS(ref Head, head, new pointer_t(next.ptr, unt + 1))) try to swing the head to the next node read value before CAS otherwise another deque might try to free the next node try to advance itĬAS(ref Tail, tail, new pointer_t(next.ptr, unt + 1)) If (compared.ptr = Interlocked.CompareExchange(ref destination.ptr, exchange.ptr, compared.ptr)) Private bool CAS(ref pointer_t destination, pointer_t compared, pointer_t exchange) / Interlocked Compare and Exchange operation C# class declaration namespace Lockfreeq )", count, ptr = null ? "null" : ptr.ToString()).Remove items from the queue: bool bIsQEmpty = Q.dequeue(i) dequeue returns false if the queue is empty and the value of i would be undefined.Declare C++ queues like this: MSQueue Q.C++ class declaration template class MSQueue.Using the code provided with this article is simple. Michael and Scott also present a "two lock" queue data structure. If CAS or similar instructions are not available I suggest using the STL queue or a similar queue with traditional sychronization primitives. Head always points to the first element of the list and only changes it's value atomically.Nodes are deleted from the beginning of the list because they are only deleted from the head pointer and head always points to the first element of the list.Nodes are only inserted at the end of the linked list and only inserted to a node that has a NULL next pointer.If the "ABA" problem never occurs the code is safe because: The reference count is meant to prevent what is referred to the "ABA problem" - if a process or threads reads a value 'A' then attempts a CAS operation, the CAS operation might succeed incorrectly if a second thread or process changes value 'A' to 'B' and then back again to 'A'. The idea is that pointers are reference counted and checked for consistency in a loop. An implementation of the code in C is available here in tar.gz format. Their paper demonstrates why the queue is "linearizable" and "lock free". In fact, an implementation of their queue is now part of the Java concurrency library. Scott on non-blocking and blocking concurrent queue algorithms. Operating Systems use atomic operations to implement sychronization - locks, mutexes, semaphores, and critical sections. ![]() The special thing about this instruction is that it is atomic- meaning that other threads\processes cannot interrupt until it is finished. For the assembler coders out there the instruction is named CMPXCHG on X86 and Itanium architectures. The flow chart below describes what Compare and Swap does. The key to the majority of lock free data structures is an instruction known as Compare and Swap (CAS). ![]() The most popular and possibly the most important is the queue, a First In First Out (FIFO) data structure. Recent research in algorithms and changes in computer architecture has led to the introduction of "wait free", "lock free", or "non-blocking" data structures. Communication between threads in multithreaded architecture has traditionally been accomplished using mutexes, critical sections, and locks. Recent developments in CPU architecture has necessitated a change in thinking in high performance software architecture - multithreaded software. Implementing lock free memory allocators is beyond the scope of this article however. It should be noted that using lock-free queues is only the beginning - a true lock free architecture use a lock free memory allocator. The goal of the article is to familiarize readers with lock free queues and provide a starting point to writing wait-free production architectures. Lock free queues are typically used in multi-threaded architectures to communicate between threads without fear of data corruption or performance loss due to threads waiting to use shared memory. This article demonstrates implementation of a "lock free" queue in C# and C++.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |