Questions about `dispatch_sync` vs `dispatch_async_and_wait` and DispatchWorkloops

In the header for workloop.h there is this note:

A dispatch workloop is a "subclass" of dispatch_queue_t which can be passed to all APIs accepting a dispatch queue, except for functions from the dispatch_sync() family. dispatch_async_and_wait() must be used for workloop objects. Functions from the dispatch_sync() family on queues targeting a workloop are still permitted but discouraged for performance reasons.

I have a couple questions related to this. First, I'd like to better understand what the alluded-to 'performance reasons' are that cause this pattern to be discouraged in the 'queues targeting a workloop' scenario. From further interrogation of the headers, I've found these explicit callouts regarding differences in the dispatch_sync and dispatch_async_and_wait API:

dispatch_sync:

Work items submitted to a queue with dispatch_sync() do not observe certain queue attributes of that queue when invoked (such as autorelease frequency and QOS class).

dispatch_async_and_wait:

Work items submitted to a queue with dispatch_async_and_wait() observe all queue attributes of that queue when invoked (inluding [sic] autorelease frequency or QOS class).

Additionally, dispatch_async_and_wait has a section of the headers devoted to 'Differences with dispatch_sync()', though I can't say I entirely follow the distinctions it attempts to draw.

Based on that, my best guess is that the 'performance reasons' are something about either QoS not being properly respected/observed or some thread context switching differences that can degrade performance, but I would appreciate insight from someone with more domain knowledge.

My second question is a bit more general – taking a step back, why exactly do these two API exist? It's not clear to me from the existing documentation I've found why I would/should prefer dispatch_sync over dispatch_async_and_wait (other than the aforementioned callout noting the former is unsupported on workloops). What is the motivation for preserving both these API vs deprecating dispatch_sync in favor of dispatch_async_and_wait (or functionally subsuming one with the other)?

Credit to Luna for originally posing/inspiring these questions.

dispatch_sync always* runs the submitted block on the thread calling dispatch_sync, while dispatch_async_and_wait doesn't. If there's already a worker thread processing a given queue, dispatch_syncing to it guarantees two context switches (worker thread needs to stop processing the queue, signal the dispatch_sync thread, and then park, then the dispatch_sync thread needs to wakeup, execute the block, then wake up a new thread to handle the rest of the items on the queue), while dispatch_async_and_wait can just have the existing worker execute the block, signal the async_and_wait thread, and continue processing the queue. In theory dispatch_async_and_wait is also more cache efficient since the data the queue protects should be in the caches of the thread already processing the queue, but that depends on what the queue is doing and is hard to measure empirically. Replacing dispatch_sync with dispatch_async_and_wait should almost** always be a drop in replacement, and should have equal or better performance.

*Note that dispatch_sync to the main queue behaves like dispatch_async_and_wait, in that it runs the block on the main thread, not the calling thread.

**There are times when you need a block to run on the calling thread, not a drainer, though they're rare. If the block modifies thread specific data/pthread keys, and the code relies on being able to read those TSDs/pthread keys after the block is run, then dispatch_sync is required. This is why dispatch_sync can't switch to the preferred async_and_wait behaviour, since there's a lot of preexisting code in the world that would break if the TSDs were set on a different worker thread instead of the calling thread.

Questions about `dispatch_sync` vs `dispatch_async_and_wait` and DispatchWorkloops
 
 
Q