Performance and customization of alternate options

Performance wise what are trade-offs when running an MLX-backed model on-device compared to using the system's AFM Core model? Also semiconnected: How do I use the 'model judge evaluator' to compare the accuracy of a custom LoRA adapter against the system's private cloud compute models?

Answered by Engineer in 892914022

The ModelJudgeEvalutor is used to evaluate a response where the score is subjective - e.g. "is this a good explanation".

There are some good examples here:

https://developer.apple.com/documentation/evaluations/scoring-with-model-as-judge-evaluators

And in the sample code project:

https://developer.apple.com/documentation/evaluations/book-tracker-using-evaluations-to-evaluate-an-intelligent-feature

You can use it to evaluate responses from the PrivateCloudComputeLanguageModel. You just need to set up your LanguageModelSession correctly:

let session = LanguageModelSession(model: PrivateCloudComputeLanguageModel())
let response = try await session.respond(to: "Analyze this document...")

The ModelJudgeEvalutor is used to evaluate a response where the score is subjective - e.g. "is this a good explanation".

There are some good examples here:

https://developer.apple.com/documentation/evaluations/scoring-with-model-as-judge-evaluators

And in the sample code project:

https://developer.apple.com/documentation/evaluations/book-tracker-using-evaluations-to-evaluate-an-intelligent-feature

You can use it to evaluate responses from the PrivateCloudComputeLanguageModel. You just need to set up your LanguageModelSession correctly:

let session = LanguageModelSession(model: PrivateCloudComputeLanguageModel())
let response = try await session.respond(to: "Analyze this document...")
Performance and customization of alternate options
 
 
Q