Is there a problem with slow speed when using libUSB to control type transfer on the MAC m2 architecture?

Why is using control type transfer on the MAC m2 architecture slower than other platforms?

The following is my test code, which was tested using the same USB device in Mac m2, Mac x64, and Ubuntu x64 environments.

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <libusb.h>
#include <sys/time.h>

uint64_t get_us()
{
    uint64_t time = 0;
    struct timeval tv;
    gettimeofday(&tv, NULL);
    time = tv.tv_sec * 1000 * 1000 + tv.tv_usec;

    return time;
}

int main()
{
    // open usb handle
    int ret;
    ret = libusb_init(NULL);
    libusb_device_handle *usb_handle; 
    usb_handle = libusb_open_device_with_vid_pid(NULL, 0x248a, 0x8266);
    if (usb_handle == NULL) {
        exit(1);
    }
    ret = libusb_set_auto_detach_kernel_driver(usb_handle, 1);
    ret = libusb_claim_interface(usb_handle, 0);

    uint64_t start = get_us();
    unsigned char buf[256];

    ret = libusb_control_transfer(usb_handle, 0xC1, 0x02, 0x8014, 0, buf, 12, 1000);
    ret = libusb_control_transfer(usb_handle, 0xC1, 0x02, 0x8014, 0, buf, 12, 1000);
    ret = libusb_control_transfer(usb_handle, 0xC1, 0x02, 0x8014, 0, buf, 12, 1000);
    ret = libusb_control_transfer(usb_handle, 0xC1, 0x02, 0x8014, 0, buf, 12, 1000);

    memset(buf, 0, 256);
    
    ret = libusb_control_transfer(usb_handle, 0x41, 0x02, 0x9548, 0, buf, 16, 1000);
    ret = libusb_control_transfer(usb_handle, 0x41, 0x02, 0x9548, 0, buf, 16, 1000);
    ret = libusb_control_transfer(usb_handle, 0x41, 0x02, 0x9548, 0, buf, 16, 1000);
    ret = libusb_control_transfer(usb_handle, 0x41, 0x02, 0x9548, 0, buf, 16, 1000);
    uint64_t end = get_us();

    printf("spent time %ld(us)\n", end - start);

    libusb_close(usb_handle);
    libusb_exit(NULL);

    return 0;
}

Finally, my conclusion is to call libusb_control_transfer() function on the MAC m2 architecture takes about 10 times slower than the average in x64 architecture.

The following are the software running records and wireshark software packet capture records on the MAC m2 architecture.

MacBook-Pro % ./libusb_test
spent time 22949(us)

The following are the running records and wireshark software packet capture records on Ubuntu20 X64.

ubuntu20:~$ ./libusb_test 
spent time 1246(us)

From the data captured by Wireshark software, it can be seen that on the MAC m2 architecture, a control transmission takes about 3ms from submission to reply. On the x64 architecture, only around 200us is required.

I also tested bulk type transmission, and the performance on the MAC m2 architecture is consistent with that on the x64 architecture.

Is this the reason for the MAC m2 architecture USB driver? Is there a solution to this problem?