I'm distributing an encrypted .mlpackage to my app and want to load it entirely in memory without ever writing decrypted weights to disk. I tried MLModelAsset(specification:blobMapping:) as the path to achieve this, but ran into a significant inference performance gap compared to the compiled code path.
What I'm trying to do
The encrypted .enc file is a serialized FileWrapper of the full .mlpackage, sealed with AES-GCM. At runtime I decrypt it in memory, deserialize the FileWrapper, extract the spec and weight blob, and load via MLModelAsset:
static func loadEncryptedPackage(url: URL, configuration: MLModelConfiguration) async throws -> MLModel {
// AES-GCM decryption → decryptedData (full serialized .mlpackage)
guard let wrapper = FileWrapper(serializedRepresentation: decryptedData) else { throw ... }
guard let (specWrapper, specParent) = findSpecWrapper(in: wrapper),
let spec = specWrapper.regularFileContents else { throw ... }
var blobs: [URL: Data] = [:]
collectBlobs(in: specParent, relativePath: "", excluding: specWrapper, into: &blobs)
// keys built as URL(fileURLWithPath: rel), e.g. "weights/weight.bin"
let asset = try MLModelAsset(specification: spec, blobMapping: blobs)
let model = try await MLModel.load(asset: asset, configuration: configuration)
// See observation #3 below — must retain these for the model's lifetime
objc_setAssociatedObject(model, &retentionKey, Retainer(spec: spec, blobs: blobs), .OBJC_ASSOCIATION_RETAIN)
return model
}
What I observed
-
Predictions are accurate. The blobs are found, weights are applied, and the model produces correct results.
-
Inference is drastically slower than the compiled code path. The same model loaded via MLModel.compileModel(at:) + MLModel.load(contentsOf:) runs inference much faster on the same device with the same MLModelConfiguration (computeUnits = .all). With MLModelAsset the slowdown is consistent across every prediction call, not just the first one.
-
The spec and blob Data objects must stay alive for the model's lifetime. Without retaining them via objc_setAssociatedObject, inference produces NaN outputs or crashes. This suggests Core ML holds a reference back into those Data buffers beyond the load() call, rather than copying them into its own memory during loading.
-
Using the exact blob URI from the spec as the blobMapping key triggers a compilation error. The spec (inspected via strings on the .mlmodel protobuf) stores blob references as @model_path/weights/weight.bin. When I key the blobMapping with URL(string: "@model_path/weights/weight.bin"), MLModel.load(asset:) throws:
compiler error: Encountered an error while compiling a model: validator error: The in-memory ML Program must not have a blob file reference but found a reference to mem://weights/weight.bin. With other key formats (e.g. URL(fileURLWithPath: "weights/weight.bin")), this error does not appear — the model loads and predictions are accurate, but inference is slow as in observation #2.
The working alternative (which I want to avoid)
Decrypting to a temporary directory, calling MLModel.compileModel(at:), loading from the compiled .mlmodelc, then deleting the temp files produces fast inference. Same model, same device, same configuration. The only difference is the compilation step — and the fact that decrypted weights touch disk, which I want to avoid for security reasons.
Questions
-
Is MLModelAsset(specification:blobMapping:) expected to produce inference performance equivalent to loading from a compiled .mlmodelc? If not, is the performance gap fundamental to the API or something that can be addressed?
-
Is there any supported way to load an mlprogram model with external weight blobs entirely in memory and achieve inference performance comparable to the compiled code path — i.e. without writing decrypted model data to disk at any point?
-
The validator error "in-memory ML Program must not have a blob file reference" is a hard block when Core ML successfully resolves the blobs and attempts mlprogram compilation. Is this an intended constraint, and does it mean MLModelAsset(specification:blobMapping:) is not the right API for this use case?