Recognize people speaking on VisionOS

On VisionOS, is it possible to recognize what people are speaking nearby (and eventually writing it in the app as subtitles for example)?