On Protocol Extensibility & Multi-Modal Data

The Foundation Models framework is adding built-in OCR and barcode reader tools this year . If we implement a custom backend using the Language Model Protocol, can we return complex multi-modal objects (like bounding boxes or segmentation masks) back to the agentic flow, or is the protocol currently limited to text-based responses? For the 'Phone a Friend' pattern, is there a standard way to pass 'privacy-preserving embeddings' instead of raw text when calling a third-party model to maintain a higher level of user data protection?

Answered by Frameworks Engineer in 892983022

Yes, absolutely! You can use a CustomSegment to provide anything back that may not be fully defined in the framework currently.

Additionally, their is a SKILL.md file in the Foundation Models Utilities that can help build a LanguageModel implementation.

Yes, absolutely! You can use a CustomSegment to provide anything back that may not be fully defined in the framework currently.

Additionally, their is a SKILL.md file in the Foundation Models Utilities that can help build a LanguageModel implementation.

On Protocol Extensibility & Multi-Modal Data
 
 
Q