MIDI Network Driver Protocol

This document describes implementation details, particularly the network protocols, of the MIDI Network driver in macOS.

The MIDI data packet format is a mostly conforming implementation of RFC 6295, RTP Payload Format for MIDI. The internet draft delegates issues such as session setup, accurate synchronization, and receiver feedback to other protocols. The Apple driver, by contrast, uses a small set of custom commands to perform these functions.

Control and MIDI Data Ports

A participant creates two UDP ports, using an arbitrary pair of consecutive port numbers. The Apple driver attempts to reuse ports from the last time it ran; otherwise, it finds and uses a pair of unused ports.

The Apple driver advertises the control port via Bonjour using the service name "_apple-midi._udp".

Session Initiation and Termination

To set up a session, the participants exchange several types of command packets. Exchange packet formats shows the common structure of these packets. Each row in this figure (and other figures in this document) describes a 32-bit value stored in the packet. All numbers are stored in network byte order unless otherwise stated.

Figure 1-1  Exchange packet formats

Item

Value

command

16-bit command identifier (two ASCII characters, first in high 8 bits, second in low 8 bits)

protocol version

2 (stored in network byte order)

initiator token

A random number generated by the session’s initiator.

SSRC

The sender's synchronization source identifier.

name

A NULL-terminated C string, UTF-8 encoded, indicating the user-visible name of the initiator's session. May be omitted.

The full sequence of packets sent to establish a session is:

  1. Initiator sends an Invitation ('IN') packet on control port

    This is a request to create a connection. The sender is the "initiator"; the receiver is the "responder." The name field should be included. Invitation requests are resent every second, up to 12 times, until an Invitation Accepted or Invitation Rejected response is received.

  2. Responder sends an Invitation Accepted ('OK') or Invitation Rejected ('NO') packet on control port

    A participant has received an invitation and is responding to it. In its response, the responder should copy the initiator token from the received invitation, but use its own SSRC identifier. The name field should be included in an Invitation Accepted packet and excluded from an Invitation Rejected packet. If the invitation was rejected, no further packets are exchanged.

  3. Initiator sends an Invitation ('IN') packet on MIDI port

  4. Responder sends an Invitation Accepted ('OK') or Invitation Rejected ('NO') packet on MIDI port

    A participant has received an invitation and is responding to it. In its response, the responder should copy the initiator token from the received invitation, but send its own SSRC identifier. The name field should be included in an Invitation Accepted packet and excluded from an Invitation Rejected packet. If the invitation was rejected, no further packets are exchanged.

  5. Initiator initiates clock synchronization (see Timestamp Synchronization)

Ending a Session

When either participant is ready to leave the session, it sends an Exit ('BY') packet. The name field may be omitted.

Timestamp Synchronization

This feature of the protocol provides a common basis for the timestamps in the MIDI RTP payload data packets. The degree of accuracy possible with this method varies with network conditions, but on a local network, it is possible for a receiver to know when a MIDI event occurred (or is scheduled to occur) with a high degree of accuracy, within 1-2 ms.

The thread servicing a participant's control port may run at a relatively normal priority, whereas the thread servicing a participant’s MIDI port should run at a high priority in order to process incoming MIDI responsively.

Synchronization packets are exchanged between the participants' MIDI ports. The packet format is shown in Timestamp packet structure.

Figure 1-2  Timestamp packet structure

Item

Value

SSRC

The sender's synchronization source identifier.

count

The count is the number of valid timestamps in the packet minus 1.

timestamp

A 64-bit number indicating a time, relative to an arbitrary and unknown point in the past, in units of 100 microseconds.

The original initiator initiates clock synchronization after the end of the initial invitation handshake packets. A full clock synchronization exchange is as follows:

  1. Initiator sends sync packet with count = 0, current time in timestamp 1

  2. Responder sends sync packet with count = 1, current time in timestamp 2, timestamp 1 copied from received packet

  3. Initiator sends sync packet with count = 2, current time in timestamp 3, timestamps 1 and 2 copied from received packet

At the end of this exchange, each party can estimate the offset between the two clocks using the following formula:

offset_estimate = ((timestamp3 + timestamp1) / 2) - timestamp2

Furthermore, by maintaining a history of synchronization exchanges, each party can calculate a rate at which the clock offset is changing.

The initiator must initiate a new sync exchange at least once every 60 seconds; otherwise the responder may assume that the initiator has died and terminate the session.

Exchanging MIDI Packets

The participants in a session exchange MIDI packets between the participants' MIDI data ports. A MIDI packet consists of a header, a MIDI command section and a journal session. The format of the packet header is shown in MIDI packet header format, with a few limitations and assumptions made.

Figure 1-3  MIDI packet header format

Item

Value

V

2

P

0

X

0

CC

0

M

1

PT

0x61

The timestamp is in the same units as described in Timestamp Synchronization (units of 100 microseconds since an arbitrary time in the past). The lower 32 bits of this value is encoded in the packet. The Apple driver may transmit packets with timestamps in the future. Such messages should not be played until the scheduled time. (A future version of the driver may have an option to not transmit messages with future timestamps, to accommodate hardware not prepared to defer rendering the messages until the proper time.)

The MIDI command section is shown in MIDI command section format and matches the format defined in RFC 6295.

Figure 1-4  MIDI command section format

Here are some specific details about how Apple’s driver uses MIDI packets:

Recovery Journal Format

Apple's driver implements the following chapters of the recovery journal:

  • P (Program Change)

  • C (Control Change)

    • A bit for toggle tool is not implemented

  • W (Pitch Wheel)

  • N (Note On/Off)

  • T (Channel Aftertouch)

  • A (Poly Aftertouch)

  • Q (Sequencer state, i.e. beat clock)

  • F (MIDI Time Code)

The following recovery journal chapters are not implemented:

  • M (MIDI Parameter System)

  • E (Note Command Extras)

  • D (Song Select, Tune Request, Reset, undefined system commands)

  • V (Active Sense)

  • X (System Exclusive)

Receiver Feedback Transmitted to the Sender

The recovery journal mechanism requires that the receiver periodically inform the sender of the sequence number of the most recently received packet. This allows the sender to reduce the size of the recovery journal, to encapsulate only those changes to the MIDI stream state occurring after the specified packet number.

This message is sent on the control port. Its format is described in Journal feedback packet format.

Figure 1-5  Journal feedback packet format

Item

Value

SSRC

The sender's synchronization source identifier.

Sequence number

The sequence number of the most recently received packet.