Hi @Mir46 , thanks for your question!
When building a visionOS app in Unity, the vast majority of your development will happen in the Unity Editor, not Xcode, and you will write code in C#, not Swift. You will not have access to RealityKit APIs to create volumes or windows directly, but you can create them in Unity with Unity's APIs. When building your app, you will build in Unity, which will generate an Xcode project that hopefully will build and run without any further development required (however you may need to configure things like entitlements and developer teams in Xcode).
You do have the ability to create native plugins that can be used in Unity to communicate with native Apple APIs. I recommend reading Unity's official documentation on this topic, to see if this is relevant to you.
Pinch and grab interactions will be handled by Unity's APIs. You will need to use Unity's InputSystem plugin. You can use the familiar EnchancedTouch
API provided by Unity, as it works the same way as it would on a 2D screen, just with a third spatial dimension.
You mentioned PolySpatial: this will be necessary for mixed reality apps. You can create apps for visionOS in Unity without PolySpatial, but these apps will be "Full VR" without any passthrough.
Because most of the technologies you'll be using will be made by Unity, you will likely find more specific help on Unity's developer forums, so I recommend asking there as well. Good luck!