Data(contentsOf:) with huge file

I have a function that computes MD5 hash of a file:

func ComputeMD5(ofFile path: String) -> [UInt8]? {
    if let data = try? Data(contentsOf: URL(fileURLWithPath: path)) {
        var digest = [UInt8](repeating: 0, count: 16)
        data.withUnsafeBytes {
            _ = CC_MD5($0.baseAddress, UInt32(data.count), &digest)
        }
        return digest
    }
    return nil
}

Now I wonder/worry what happens if the file is very huge. Does the runtime perform disk memory paging?

Answered by DTS Engineer in 772788022

Code like this is bad in all cases, but the exact pathology will vary based on the size of the file and the OS you’re running on. On iOS, you’ll likely run out of address space or be jetsammed. On macOS, you’ll either run out of swap or suffer pathologically bad performance.

Does the runtime perform disk memory paging?

You can opt in to memory mapping with init(contentsOf:options:) but that yields more problems. Specifically, if the file is on a volume that can fail — like a network volume or a USB stick — such a failure will trigger a memory access exception in your process.

If you need to work with arbitrary files, do what ssmith_c suggests and calculate the digest piecewise.

Share and Enjoy

Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"

You asked this question before pointing your function at a file much larger than the memory in your machine to find out what happens?

Should it be a problem, you can use CC_MD5_Init, CC_MD5_Update and CC_MD5_Final, breaking up your huge file into manageable chunks.

Accepted Answer

Code like this is bad in all cases, but the exact pathology will vary based on the size of the file and the OS you’re running on. On iOS, you’ll likely run out of address space or be jetsammed. On macOS, you’ll either run out of swap or suffer pathologically bad performance.

Does the runtime perform disk memory paging?

You can opt in to memory mapping with init(contentsOf:options:) but that yields more problems. Specifically, if the file is on a volume that can fail — like a network volume or a USB stick — such a failure will trigger a memory access exception in your process.

If you need to work with arbitrary files, do what ssmith_c suggests and calculate the digest piecewise.

Share and Enjoy

Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"

Data(contentsOf:) with huge file
 
 
Q