Model Guardrails Too Restrictive?

I'm experimenting with using the Foundation Models framework to do news summarization in an RSS app but I'm finding that a lot of articles are getting kicked back with a vague message about guardrails.

This seems really common with political news but we're talking mainstream stuff, i.e. Politico, etc.

If the models are this restrictive, this will be tough to use. Is this intended?

FB17904424

Thanks for filing the feedback report. Just let you know that your report is now under the investigation of the Foundation Models framework team.

Best,
——
Ziqiao Chen
 Worldwide Developer Relations.

I updated the report with more examples. It's really, really sensitive.

Any news article about someone dying for instance, rejected.

They're insanely restrictive. I've filed multiple reports with examples that aren't even in the same universe as unsafe content.

If Apple doesn't fix them, the entire FoundationModels framework is essentially useless for a production app. You just can't ship something that fails 50% of the time with spurious "safety" violations.

Chiming in here as well - been playing around with some use cases around camping (when an offline assistant can be useful), but I am getting guardrail violations on more than 50% of prompts. Even talking about things like purifying water or where to position my tent.... Model is basically unusable with current level of guardrails

It feels like since Beta 3, the model seems to have become overly restrictive. Even simple tasks, like generating a title based on a content from a book I am reading or generating a summary from a longer citation are now blocked by guardrails.

It’s becoming nearly unusable for basic use cases. :(

I have experienced similar. I appreciate safety. But can we have options to set the guardrail level? This can work better.

I had a similar experience in Beta 3, even questions like "What is the capital of France?" were hitting guardrails. Tried the same question with a number of real countries, always guardrailed. Then tried with Gondor and Westeros and for those fictional countries the model sent a response.

I'm assuming mentioning real country names must have triggered guard rails against political topics.

As of Beta 4 my test questions for capitals work for both real and fictional countries.

My app is going to use more sophisticated models locally but I'm trying to utilize FM for relatively less demanding tasks, including some categorization and summarization. But my users will be importing a wide variety of content/data types, and in my own testing with my personal journal content had discussion of a violent crime against a friend throw a guardrailViolation (unsafe content) exception. Not sure if I'll just exclude the model from activities where this might happen, or catch the exception and use a downloaded model as the fallback. I appreciate the importance of safety but this is not safety, it's blatantly censorship (however well intentioned). People discuss unsafe things, and it's critically important we do so for reasons related to safety itself, both personal and social. At least provide an option to configure how this is done and provide content-specific information in the error so we are not guessing how we are crossing the guardrail.

Hi all! We're actively working to improve the guardrails and reduce false positives. Apologies for all the headaches.

Workaround

The #1 workaround I can offer you is for summarizing content like news articles, you can use .permissiveContentTransform to turn the guardrails off.

Please checkout this article I wrote on Improving the safety of generative model output because there are some caveats on when it's appropriate to turn the guardrails off, and cases where the model may refuse to answer anyway.

Please send us your false-refusal prompts

If you feel comfortable doing so, send us any prompts that falsely trigger the guardrails.

While we're actively working to improve guardrail false-refusals, it's incredibly helpful to see prompts from real developers (like you) to identify blind-spots we might have in our guardrail evaluations.

Check out this post on sending us feedback. It shows a handy way to send feedback from Xcode. Make sure to send your Feedback Assistant bug reports to Foundation Models framework so we get your issue.

Include your prompt in your report. If you feel comfortable, please also include your language/locale Siri setting, e.g. "Spanish (Mexico)" in your report, since the guardrails are influenced by locale. Thanks!

Model Guardrails Too Restrictive?
 
 
Q