thanks for posting the descriptor.
Here's the descriptor of alternate interface 1 of interface 3, the audio streaming interface, when set to 12 channels of 8-bit-per-sample audio:
Endpoint 0x03 - Isochronous Output
Address: 0x03 (OUT)
Attributes: 0x09 (Isochronous adaptive data endpoint)
Max Packet Size: 0x024c (588 x 1 transactions opportunities per microframe)
Polling Interval: 4 (8 microframes (1 msecs)
Your sampling rate is 48kHz, so 48 samples every millisecond. The polling interval is 1 millisecond (the actual number in the descriptor is 4, we're on USB 2.0 high speed here, so that means every 2^(4-1) or 8 micro-frames. Each sample is 1 bytes, and there are 12 channels. So in each frame, there are 12x48 or 576 bytes to transfer (nominally). Up to one additional sample frame (12 bytes) may be transferred in each interval to synchronize the device rate with the host rate, which is where the 588 bytes comes from.
In the 16 bits-per-channel case, the polling interval is the same - 1ms. But you need to transfer up to 1152+24 = 1176 bytes per interval. However, the maxPacketSize here is 1024 - the maximum allowed by the USB 2.0 Audio Specification.
One way to fix this is to change the descriptor to reduce the polling interval from every millisecond to twice every millisecond (the number '3' in the bInterval
field, and change the wMaxPacketSize
to 588.
Another possibility would be to ask for up to two transactions per micro frame - this is squeezed into bits 12 and 11 of bMaxPacketSize. However, the (now ancient) tech note TN2274 says "At this time, AppleUSBAudio supports at most one transaction per microframe".
See Table 9-13 and section 5.6.3 of the USB 2.0 Specification and Table 4-33 of the USB Audio Class Specification v2.0
You might be asking yourself why it "works on Linux", but does it really?