Feature request: Low-level atomics support

The biggest issue I have with Swift today is the lack of support for atomics. This is a particularly important issue because there's no workaround here, short of writing my code in Objective-C, but that can't interoperate with Swift value types or Swift generics (for the most part). The only partial workaround I have is the use of OSAtomic.h, for those cases where it actually provides the operation I need, but even that is risky because it's not actually guaranteed to be valid (because Swift operates on a writeback semantic model, there's no guarantee that e.g. calling

OSAtomicCompareAndSwap32(value, newValue, &value)
will actually behave as desired; Swift is free to make a local copy of the value, pass a pointer to that to the function, and then non-atomically write that back to the original location). And it often doesn't provide the operation I need anyway.


Every single big Swift project I've done has run into this limitation. I usually end up working around it by giving up on the idea of atomic operations and using a spinlock (although

OSSpinLockLock(&spinlock)
is not actually guaranteed to work, for the same reason as above). But this is disheartening, because I'm throwing away performance for no reason. And when I'm writing extremely performance-criticial code (such as code that needs to handle in realtime the data coming from an AVCaptureDevice) this especially makes me worry.


I get that atomics are hard, and that Swift may want to come up with some high-level model of threadsafe operations that doesn't match C++11's std::atomic. But at a low level LLVM operates on the same model as C++11's std::atomic, and having these low-level operations exposed now would be extremely helpful. At such time as Swift comes up with a high-level threadsafe model, it will presumably be built on top of these same low-level operations anyway.


Of course, C++11's

std::atomic
actually doesn't guarantee lock-free behavior, and it uses template magic to use locks when necessary. This is not what I'm asking for. What I'm really asking for is something more like Rust's atomic model, which exposes distinct types for the 4 types it can guarantee are lock-free (Bool, pointer-sized signed integer, pointer-sized unsigned integer, and pointer), and methods on these types for the various valid atomic operations. A slightly higher-level
Atomic<T>
matching C++11's
std::atomic
could be implemented later on top of this, if Swift ever grows the ability to specialize the memory representation of an atomic type, or perhaps some other high-level atomic model. The important aspect here is that the atomic capabilities of LLVM IR should be exposed to the language, such that the Swift stdlib, or us third-party devs, can write proper thread-safe implementations on top of this.


As it stands today, because of the lack of atomics, I worry that a lot of Swift code that's being written isn't actually thread-safe, and that any code that is threadsafe is unnecessarily slow. I know I've personally seen a lot of code written by coworkers or by third parties that accidentally perform concurrent access of a shared value from multiple threads, without meeting the narrow requirements of LLVM IR to have this access be defined (e.g. that in the happens-before partial order there is only a single write to that location that is visible to the non-atomic read). This sort of thing often seems to work in practice, but because it technically returns an undefined value, optimization may cause it to break, or more generally, the code may not actually race most of the time but when it does it behaves incorrectly. Granted, including proper Atomic support won't actually prevent anyone from writing thread-unsafe code, but it will allow the correct threadsafe code to be written by others, and its existence will hopefully remind people that data races are a legitimate issue.

The biggest issue I have with Swift today is the lack of support for atomics.

Did you file a formal enhancement for this already? If so, what’s the bug number?

Share and Enjoy

Quinn "The Eskimo!"
Apple Developer Relations, Developer Technical Support, Core OS/Hardware

let myEmail = "eskimo" + "1" + "@apple.com"

It's not exactly what you want:


I have a OSX app with js deploy multithread and need to make some real-time processing with video frames (to output) and other kind of stuffs, using the IOKit, etc. So, atomic operations is deeply used in this app.


And i get a very good performance only using GCD tips from WWDC 2012 Session 712 [Asynchronous Design Patterns with Blocks, GCD, and XPC] (slide: Improve Performance with Reader-Writer Access)


Basically:


• Use concurrent subsystem queue

DISPATCH_QUEUE_CONCURRENT

• Use synchronous concurrent “reads”

dispatch_sync()

• Use asynchronous serialized “writes”

dispatch_barrier_async()


Example:

var _importantArray = [AnyObject]();
let _someManagerQueue = dispatch_queue_create("SomeManager", DISPATCH_QUEUE_CONCURRENT);
func getSomeArrayItem(index:Int) -> AnyObject?{
    var importantObj:AnyObject?
    dispatch_sync(_someManagerQueue, {
        importantObj = _importantArray[index]
    });
    return importantObj
}
func addSomeArrayItem(object: AnyObject) {
    dispatch_barrier_async(_someManagerQueue,{
        _importantArray.append(object);
    });
}
func removeSomeArrayItem(object: AnyObject) {
    dispatch_barrier_async(_someManagerQueue,{
        if let i = _importantArray.indexOf({$0 === object}){
            _importantArray.removeAtIndex(i);
        }
    });
}


That way you ensure that whenever you read an information (eg an array) all the "changes" have been made or are "waiting" . And every time you write an information, your program will not be blocked waiting for the operation to be completed.

You can use serial queues, however it is a huge waste of processing and time does not parallelize the readings. That way, if you use several threads, none will have to wait another unless it is "write", which is right.


Just works for me, i do a lot of work with each frame (or other stuffs). And is more performatic than a previous version of the same app with locks. (Now i can fill all 8 cores proportionally)

My understanding of Obj-C is that all property accessors are inherently atomic, for values that are 4 bytes or less, except for pointers to reference counted objects. And (IIRC) also atomic for values that are 8-bytes or less on Intel hardware, but not ARM hardware.


I would assume the same is true of Swift, though I haven't seen any API contract about it, or seen any official statement about it. If it's true in practice, that might explain why apps aren't crashing right and left as you're afraid they might. It's possible that 8-byte accessors are already atomic in Swift even on ARM hardware, though I'd guess not.


>> As it stands today, because of the lack of atomics, I worry that a lot of Swift code that's being written isn't actually thread-safe


But atomicity isn't thread safety. In fact, in general, atomicity isn't nearly close to thread safety, and real thread safety usually doesn't need atomicity. That's the other reason, I think, why apps aren't crashing — atomicity as such is needed only in rare cases. (That may not be true for the classes of apps you write, of course.)


My guess is that an API contract regarding atomicity is something that the Swift implementors simply haven't gotten around to yet. If so, the Eskimoesque (Eskimotive? Eskimalian?) advice of submitting a bug report sounds like an excellent idea.

!I think I have seen a related problem - memory consistency.


I have some tasks that populate an array in the background using a concurrent global queue and dispatch groups are used to wait for completion. In the forground I wait for completion on a dispatch group, of index 0, which signals that the optional array element, also of index 0, has been populated (i.e. is no longer nil). However if I simply wait on the task group I occassionally get a nil exception unless I add a nil check in which case the code works as expected, see below:


while containersRef.value[0] == nil { // Added manual nil check
    usleep(1)
}
dispatch_group_wait(dispatchGroups[0], DISPATCH_TIME_FOREVER) // Does not ensure that containersRef.value[0] != nil, but should!
someFunction(containersRef.value[0]!) // Without manual nil check the ! sometimes fails when testing with XCTest


This is very odd because when the background task has completed the array element cannot be nil. I can think of two alternatives:


  1. Is this a memory consistency problem? Is there a way in Swift to wait for memory consistency of an array element?
  2. Alternativel, is the problem that the background task hasn't started and dispatch_group_wait does not check for this? The background tasks are scheduled using dispatch_group_async.

I did. I meant to put this in the original post but forgot. I filed rdar://problem/21305694 which was duped to rdar://problem/16883819.

GCD is great, and I use it a lot. But it's not a replacement for actual support for atomics.

The semantics of loads/stores on various architectures is not the same thing as the semantics of LLVM IR when it comes to atomicity. Code written with non-atomic loads/stores in assembly that just happens to work on a particular architecture may fail horribly when written in LLVM IR, because the optimizations done on LLVM IR may change things in a way that causes the code to behave differently than the hand-coded assembly. Notably, LLVM IR loads may return

undef
when a data race occurs, and optimizations can take advantage of that to transform the code in various ways.


And you're right, atomicity and thread safety aren't the same thing, but thread safety always involves atomicity at some level (even if it's just atomic compare-and-swap operations on a spinlock that protects the non-atomic property access). There's many ways to write thread-safe code in Swift that leverage pre-existing concurrency primitives (GCD, locks, etc), but similarly there are many thread-safe constructs that cannot be written in Swift because they require atomics.


Also, it's not true that all property accessors in obj-c are inherently atomic. Notably, properties with the

nonatomic
attribute (which is extremely common) are not atomic. Properties that don't specify atomicity, or that specify
atomic
, use the
monotonic
atomic memory ordering for primitives (which guarantees nothing except that any value you read must be a value written previously, but it doesn't specify which value), and use a spinlock for obj-c objects.

If you're accessing the same array from two threads concurrently (with at least one writer), you have a problem. If you write to the array, then signal the dispatch group, and the other therad waits on the group, then reads from the array, that's fine. But I note your group itself is stored in an array; any chance that groups array is being read from and written to concurrently? Also, your while loop there is absolutely violating concurrency, because you're reading from the array while another thread is writing to it.


It's worth noting that what I'm asking for (low-level atomics support) really has no bearing on your problem. You are trying to use existing concurrency primitives (which is correct, as atomics don't solve your problem unless you use them to recreate a dispatch group), which is perfectly fine in Swift.

Rust is a very interesting programming language. What could Swift learn from Rust? I would like to read a comparative discussion of what Rust offers that Swift does not.

Rust is actually one of the languages Swift has already drawn a lot of inspiration from (and in turn Rust has drawn a bit of inspiration from Swift as well, such as with `if let`). You can see this on Chris Lattner's website.

@Eridius


Thanks for the comments, yes memory consistency or what ever the issue I am seeing is is not the same as atomics. Your discussion about atomics just made me think of the strange problem I had seen and solved by trial and error.


Back to the problem, a bit more context will probabl;y be helpful:


First the Mutual reference class:


public final class MutableReference<T>: Hashable, CustomStringConvertible, CustomDebugStringConvertible {
    public var value: T
    ...
}


The purpose of this class is to hold a struct so that the struct isn't copied, the pointer to the mutual reference is copied. This achieves two things, efficiency and is simpler than using inout parameters.


Then the main method where the problem occures:


        let numContainers = ... // The number of threads is numContainers - 1, container 0 is processed in forground
        let containersRef = MutableReference([MutableReference<C>?](count: numContainers, repeatedValue: nil))
        let dispatchQueue = dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_HIGH, 0)
        let dispatchGroups = (0 ..< numContainers - 1).map { (_) in /
            dispatch_group_create()
        }
        func processContainer(cN: Int) { // Populate a container, normally in background but 0 in forgraound
            let containerRef = ... a MutableReference conatining the container
            containersRef.value[cN] = containerRef // Make container available to other threads, which wait until it is populated
            ... ppartially populate the container (via containerRef)
            for var cO = 1; ; cO <<= 1 { // Select the other container to include in the results (tree reduction)
                let cNO = cN + cO
                guard cN & cO == 0 && cNO < numContainers else {
                    return // No more writting to container therefore exit
                }
                while containersRef.value[cNO] == nil { // I needed to add this to prevent nil exception
                    usleep(1)
                }
                dispatch_group_wait(dispatchGroups[cNO - 1], DISPATCH_TIME_FOREVER) // Doesn't prevent nil exception
                ... use containersRef.value[cNO]! to continue to populate containerRef - this is were a nil exception can occure without while loop
                containersRef.value[cNO] = nil // Other container no longer required
            }
        }
        for cN in (1 ..< numContainers).reverse() { // All except 0 processed in background
            dispatch_group_async(dispatchGroups[cN - 1], dispatchQueue) {
                processContainer(cN)
            }
        }
        processContainer(0) // In forground, only completes when all the tasks have finished
        while containersRef.value[0] == nil { // Again a nil check required
            usleep(1)
        }
        ... use containersRef.value[0]! this is were a nil exception can occure without while loop


Some notes:


  1. Containers can be anything, they are generic, depending on the application, but for the sake of understanding the code think of arrays.
  2. A call to processContainer, whether forground or background, waits on tasks to complete via dispatchGroups before reading from another container.
  3. Therfore processContainer, whether forground or background, only ever writes to the container it created and only reads from finished containers.


It is weird that the while nil test is required, you would have thought it would do nothing since the code waits anyway for task completion.


Any idea what is going on?

>> Also, it's not true that all property accessors in obj-c are inherently atomic. Notably, properties with the

nonatomic
attribute (which is extremely common) are not atomic.


Under the stated restrictions (4 bytes or less on ARM, not a pointer to reference counted memory), the property accessor code is (reportedly) identical whether the property is declared atomic or non-atomic (or defaults to atomic).

I didn't check arm, but I did check x86_64, and it is true that for an

int32_t
property,
nonatomic
and
atomic
generate the same assembly for the accessors. But they generate different LLVM IR. Specifically, the
nonatomic
property uses regular nonatomic load / store:


  %5 = load i32* %4, align 4, !tbaa !21
  store i32 %nonatomicInt, i32* %4, align 4, !tbaa !21


But the

atomic
property uses atomic load / store:


  %5 = load atomic i32* %4 unordered, align 4
  store atomic i32 %atomicInt, i32* %4 unordered, align 4


Given that the resulting assembly is the same, you may wonder what the difference is. And the difference here is that optimization passes handle atomic loads/stores differently than they handle non-atomic loads/stores. With Obj-C, since this is a property, and since properties are accessed via dynamic dispatch (so no inlining), the actual body of the getter/setter is so short and simple that the optimization passes aren't really going to do anything with them. But Swift definitely inlines things, and it makes fewer guarantees about the in-memory representation of things like properties (Obj-C ivar behavior is well-understood and synthesized properties are backed by ivars), so we definitely can not get away with assuming that accesses of 4-byte primitives are atomic.

Just curious, may I ask examples of use-cases where atomic properties are preferred over dispatch_barrier_sync/async?

I had an exchange with Greg Parker (reknowned Xcode compiler engineer) on the older listserv (you can find the thread on objc-language@lists.apple.com (or perhaps just googling). His most direct post is attached, which basically says "Don't do that". In fact he posted several responses all which said "Don't do that".


I was heartbroken because I loved OSAtomics, but have since switched to GCD as others have suggested.

-----


Subject Re: Best practice for using OSAtomic to provide threaded access to an ivar?


On Apr 19, 2013, at 9:44 AM, David Hoerl <my email> wrote:

> Fine - so I thought I'd provide setters and getters that use a serial dispatch queue, but I find one deadlock condition, and I really think its overkill to construct yet another queue just for this one ivar.

>

> So I turned to OSAtmic, which I've used with success before, and have been fiddling around for a bit on how best to code what I need: a single BOOL ivar/property that lets me set and read it from multiple threads.


Short answer: don't.


OSAtomic is playing with fire. Architecture-specific, impossible-to-test fire. You can't use OSAtomic for anything more advanced than a global int for printing usage counts until you understand how modern CPUs deal with memory coherency, what memory barriers are, how they work, why you need different barriers on x86 versus ARM, and why nobody will ever again design a CPU that works like the DEC Alpha.



> My thoughts are that in the setter, if I use a barrier operation, then the moment that completes the getter should see the proper value in any thread.


And that is incorrect. Depending on the architecture and the ordering guarantees you need, both the setter and the getter need a barrier in order to achieve the correct synchronization.


The problem is that you almost certainly need to think bigger than just the boolean ivar. If there is any code that reads that ivar and makes decisions based on its value then you almost certainly need a multithreading design that is bigger than just the implementations of the getter and setter methods.

Feature request: Low-level atomics support
 
 
Q