Read binary file in swift

HI,

I can't find how to read my data file in swift (converting from Objective-C). The file consists a mixture of bytes, UInt16, UInt32, SInts16 ,strings (as pascal strings), png and mp3 data all mixed up as I use a 'byte' to indicate what sort of data is next.

I tried:-

var data = (NSData)();fileHandle?.readData(ofLength: MemoryLayout.size(ofValue: UInt8()))
But can't use "getBytes etc" so can't get the value.

I thought this would work for UInt16s but no-go
var num = UInt16(); fileHandle?.readData(ofLength: MemoryLayout.size(ofValue: UInt16()))

I'm about to give up and stay with Objective-C.

Any help appreciated.
Paul

Replies

Is the file too big to load into memory?
If it is not so big, you can read whole file as a Data, and then read each element one by one from the Data.
As OOPer says, if the file is small you can just read the whole thing into memory. However, once you’ve got it there you have to be very careful about how you read data from it. Based on your description of the file format it sounds like it doesn’t enforce any alignment requirements. If so, it’s not safe to read data from the file using Swift’s various pointer APIs.

I typically handle this sort of thing with an explicit parser. See below for an example. It takes a little work to set up but I like this approach it uses no unsafe APIs. This is an important consideration when you parse data coming from an untrusted source.

Share and Enjoy

Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@apple.com"



Code Block
import Foundation
struct Parser {
private var data: Data
init(data: Data) {
self.data = data
}
private mutating func parseLEUIntX<Result>(_: Result.Type) -> Result?
where Result: UnsignedInteger
{
let expected = MemoryLayout<Result>.size
guard data.count >= expected else { return nil }
defer { self.data = self.data.dropFirst(expected) }
return data
.prefix(expected)
.reversed()
.reduce(0, { soFar, new in
(soFar << 8) | Result(new)
})
}
mutating func parseLEUInt8() -> UInt8? {
parseLEUIntX(UInt8.self)
}
mutating func parseLEUInt16() -> UInt16? {
parseLEUIntX(UInt16.self)
}
mutating func parseLEUInt32() -> UInt32? {
parseLEUIntX(UInt32.self)
}
mutating func parseLEUInt64() -> UInt64? {
parseLEUIntX(UInt64.self)
}
}
func main() {
var parser = Parser(data: Data([0x11, 0x22, 0x33, 0x44, 0x55, 0x66, 0x77]))
let u8 = parser.parseLEUInt8()!
let u16 = parser.parseLEUInt16()!
let u32 = parser.parseLEUInt32()!
print(String(u8, radix: 16)) // 11
print(String(u16, radix: 16)) // 3322
print(String(u32, radix: 16)) // 77665544
}
main()

  • See below for my latest take on this idea.

Add a Comment
Hi Quinn,

The data is too big to read all into memory. I usually read my header which shows version and store offsets 'UInt64' to the data in the file (as 'pages') then close the file. When required, open the file again and jump to the data required as per the offsets read earlier.

I really need to easily read chars, and ints of various sizes and big chunks as they may be png or mp3s. The char is read then a switch statement is used to read the data 'defined' by the char.

The data is 'trusted?' as I wrote another app, on Windows Csharp, to create and assemble it, everything (png, mp3 etc) is created my me - it's educational software for high schools for Windows and Mac. I'm trying to convert my Objective-C app to Swift after converting my Windows version from Cplusplus to Csharp. (I'm worried that Microsoft will kill Cplusplus so went to Csharp and now worried that Apple will kill Objective-C so thinking of going to Swift. Or maybe I should just retire and go to the beach?)

Thanks for your help.
Paul
Code Block
Hi,
Managed to read a char... :-)
// char
var ch : UInt8 = 0
var data = fileHandle?.readData(ofLength: MemoryLayout.size(ofValue:ch))
data?.copyBytes(to: &ch, count: MemoryLayout.size(ofValue:ch))
var id = Character(UnicodeScalar(ch))
But this is no-go... :-(
// UInt16
var num : UInt16 = 0
data = fileHandle?.readData(ofLength: MemoryLayout.size(ofValue:UInt16()))
data?.copyBytes(to: &num, count: MemoryLayout.size(ofValue:num))
Cannot convert value of type 'UnsafeMutablePointer<UInt16>' to expected argument type 'UnsafeMutablePointer<UInt8>'
Paul

You can write an extension of FileHandle like this.
Code Block
extension FileHandle {
    func readLittleEndian<T>() -> T?
        where T: FixedWidthInteger
    {
        let data = readData(ofLength: MemoryLayout<T>.size)
        if data.count < MemoryLayout<T>.size {
            return nil
        }
        var value: T = 0
        _ = withUnsafeMutableBytes(of: &value) {bufPtr in
            data.copyBytes(to: bufPtr)
        }
        return T(littleEndian: value)
    }
    func readString(ofSize size: Int, encoding: String.Encoding = .isoLatin1) -> String? {
        let strData = readData(ofLength: size)
        if strData.count == size {
            return String(data: strData, encoding: encoding)
        } else {
            return nil
        }
    }
    func readPascalString(encoding: String.Encoding = .isoLatin1) -> String? {
        if let len: UInt8 = readLittleEndian() {
            return readString(ofSize: Int(len), encoding: encoding))
        } else {
            return nil
        }
    }
}

And use it like this:
Code Block
// char
if let ch: UInt8 = fileHandle?.readLittleEndian() {
    var id = Character(UnicodeScalar(ch))
    //...
}
// UInt16
if let num: UInt16 = fileHandle?.readLittleEndian() {
    //Use num here...
    print(String(format: "%04X", num))
    //..
}

(Sorry, not fully tested and you may need to fix some parts.)

But the method readData(ofLength:) is marked deprecated in the documentation (not yet in actual SDKs), so you may need to update it in the near future.

The data is 'trusted?' as I wrote another app, on Windows Csharp

Right. But unless the data is cryptographically signed, and you verify that signature before you read any of the data, it’s not trusted. Something could have messed with the data between the producer and the consumer.

Whenever you read data from the file system you must treat it as untrusted. Avoiding unsafe APIs is a good idea in general, but it’s particularly imported when working with untrusted data.

We talked about this issue in depth of WWDC this year. See WWDC 2020 Session 10189 Secure your app: threat modeling and anti-patterns. I haven’t yet had a chance to watch it myself but folks I trust (hey hey) gave it glowing reviews.



The data is too big to read all into memory.

Even with memory mapping? Just how big is this data?

This matters because you have a stark choice here: memory access or file system APIs. There are definite drawbacks to using memory access but there’s also drawbacks to using the file system. An obvious way forward here is FileHandle, but that’s a really bad choice for this sort of thing because every readData(ofLength:) call translates to a read system call. Doing that for every tiny value in your file is going to annihilate your performance.

For the file system to work well for this task you need some sort of user-space buffering. None of the Foundation APIs do this for you. My go-to API for this is C’s standard I/O library. However, that API is chock full of unsafe pointers, and thus needs careful wrapping to prevent that unsafeness escaping into other parts of your code.

Share and Enjoy

Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@apple.com"
Thanks very much, OOper and Quinn.
So now can't retire and can't go to the beach :-(
But getting too old for this stuff now, started in 1990 with DOS using C and ThinkC on the Mac.
Regards,
Paul

Since posting the above, I’ve evolved this idea into something that’s more convenient for the small test projects that I work on every day. Here’s what I now use:

extension Data {
    
    mutating func peekBytes(count: Int) -> Data? {
        guard self.count >= count else { return nil }
        return self.prefix(count)
    }
    
    mutating func parseBytes(count: Int) -> Data? {
        guard let result = self.peekBytes(count: count) else { return nil }
        self = self.dropFirst(count)
        return result
    }
    
    mutating func parseBigEndian<T>(_ x: T.Type) -> T? where T: FixedWidthInteger {
        guard let bytes = self.parseBytes(count: MemoryLayout<T>.size) else { return nil }
        return bytes.reduce(0, { $0 << 8 | T($1) })
    }
    
    mutating func parseLittleEndian<T>(_ x: T.Type) -> T? where T: FixedWidthInteger {
        guard let bytes = self.parseBytes(count: MemoryLayout<T>.size) else { return nil }
        return bytes.reversed().reduce(0, { $0 << 8 | T($1) })
    }
}

Share and Enjoy

Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"