Now my problem is:
Is it even possible for an AVAudioEngine to have different sample rates for inputNode and outputNode ? (And also mainMixerNode)
My first try on performing my own resampling in callback function gives error with OSStatus error code -10863 and -50, I know it is extremely hard to tell what the problem is just by this information, but what could be the cause? (Definitely caused by the sample rate mismatch, but is it because I'm pulling too many samples?)
If it is super tricky to perform resampling ourselves, what is the native way to meet the need (process samples in 44.1kHz) ? should I also care about the sample rate conversion for the output side ? (Or say, would the speaker sample rate also change to meet the new microphone sample rate, caused by the headphone insertion / hardware change) ?
Yes, input and output can have different rates, except when the configuration requires them to be linked. (For example, if you use voice processing on the input, the input and output rates will be forced to match.)
Error -50 is an invalid parameter error, and error -10863 may indicate you're doing something at an inappropriate time. You can try looking at your console log for additional information that might give clues about what went wrong.
Mixer nodes are documented as handling rate conversions automatically (see https://developer.apple.com/documentation/avfoundation/avaudiomixernode ). You will need to provide custom conversions if your audio has mismatched formats (such as int vs. float samples, or interleaved vs. non-interleaved frames). AVAudioPlayer node is also capable of converting formats from the buffers or file you play to the format of its output bus.