What is MTLIOAccelCommandQueue and how do I write code optimal for it?

Hey, so I'm currently testing our code on two different machines. One machine uses a AMD Radeon Pro Vega II 32 GB card, and uses GFX9AMDMtlCmdQueue for submitting command buffers. The other machine uses an Intel HD Graphics 4000 1536 MB, an old integrated graphics card, and it uses MTLIOAccelCommandQueue to submit command buffers. According to my profile, MTLIOAccelCommandQueue takes a lot more time in the dispatch loop than GFX9AMD_MtlCmdQueue. Is this because MTLIOAccelCommandQueue emulates certain functionality not originally available on it? Our code uses a lot of compute shaders so I'm wondering if that's what it's trying to emulate. I have both profiles I would attach but the forum doesn't want to accept my google drive urls for some reason.

If that's the case, how do I optimize for machines with this type of old hardware? I've been looking at argument buffers, but I'm not entirely sure how much of a different implementing this will make. We would also have to revamp our entire codebase to allow for argument buffers to begin with, which we would rather not do.

Any help would be appreciated. Thanks in advance!
What is MTLIOAccelCommandQueue and how do I write code optimal for it?
 
 
Q