Plenty of LanguageModelSession.GenerationError.refusal errors after 26.4 update

Question

Created 5d

Replies 3

Boosts 0

Participants 4

Hello!

After the 26.4 update I get a huge number of LanguageModelSession.GenerationError.refusal errors when using guided generation Generables for inexplicable reasons. Such errors also occur, if I want to cast a response to boolean by using 'generating: Bool.self'. The explanation generated on the grounds of the error always looks like this:

Response<String>(userPrompt: "", duration: 0.230917542, promptTokenCount: Optional(66), responseTokenCount: Optional(11), feedbackAttachment: nil, content: "I apologize, but I cannot fulfill this request.", rawContent: "I apologize, but I cannot fulfill this request.", transcriptEntries: ArraySlice([]))

All the prompts and Generables I use are definitely not profane. Before 26.4 such errors on the same prompts and Generables never occurred. The 26.4 update rendered those features unusable to me.

Is this a known bug or what am I doing wrong?

Boost

Answer 1

Developer Tools Engineer OP

Apple

4d

Thank you for your feedback! Can you submit a LanguageModelFeedback through Feedback Assistant for this specific issue?

0

Answer 2

DTS Engineer OP

Apple

4d

Just in case, the following post provides details about filing feedback for the Foundation Models framework:

Provide actionable feedback for the Foundation Models framework and the on-device LLM.

Best,
——
Ziqiao Chen
 Worldwide Developer Relations.

0

Answer 3

eddiewangyw OP

2d

I've been hitting the same refusal regression after 26.4 on guided generation. In my case I'm using LanguageModelSession with custom Generables for structured output from transcribed text, and the refusal rate jumped from near-zero to roughly 30% of requests after the update. Two workarounds that helped reduce it:

that frames the task as data transformation rather than content generation. Something like: "You are a structured data extractor. Convert the following input into the requested format." This seems to bypass whatever safety classifier is being overly aggressive.
When you get a refusal, retry the same prompt with a slightly different temperature (0.1 increments). In my testing, about 80% of refusals succeed on retry, suggesting the classifier is borderline on these inputs rather than fundamentally objecting to them.

The Bool.self casting issue you mention is particularly telling — a boolean response should never trigger content safety. This looks like a regression in the on-device safety classifier that shipped with 26.4, not an intentional policy change. I'd recommend filing a Feedback with specific prompt examples that trigger refusals — the more concrete reproduction cases Apple gets, the faster they can tune the classifier threshold.

0