Visual Intelligence API SemanticContentDescriptor labels are empty

I'm trying to use Apple's new Visual Intelligence API for recommending content through screenshot image search. The problem I encountered is that the SemanticContentDescriptor labels are either completely empty or super misleading, making it impossible to query for similar content on my app. Even the closest matching example was inaccurate, returning a single label ["cardigan"] for a Supreme T-Shirt.

I see other apps using this API like Etsy for example, and I'm wondering if they're using the input pixel buffer to query for similar content rather than using the labels?

If anyone has a similar experience or something that wasn't called out in the documentation please lmk! Thanks.

Visual Intelligence API SemanticContentDescriptor labels are empty
 
 
Q