I need to build an array that has 22 columns in it and 6000 rows. The bottleneck here is it takes about 500 seconds to build it. I have tracked it to this directly. I read a file to get this data. Does anyone have any ideas on speeding this up? More information can be provided on request.
ccDataStr_StrArr.append(itemHolder ?? [])
It is not clear where is TAB in your newly shown text example, but is looks like a Tab-Separated Value where some items can contain line breaks when enclosed in double-quotes.
If you are working such data as I guess, counting 22 is not a good way to handle it.
You can try something like this:
import Foundation
let tsvText = """
VFA122_EF\tPE3\tFA-18E\tAMAH\t165897\t3BTVMMF\tPE3336110\t12/1/2016 11:01:42.71\t4/11/2017 09:00:43.63\t020\t84 DAY SPECIAL CORROSION COMPLY WITH 84 DAY SPECIAL CORROSION INSPECT(ATFLIR) INSPECTION ON [ATFLIR, E/F AN/ASQ-228(v)2 - FRP403] IAW AW-228AC-MRC-300 SC 0 000\t030000F\t09355\t12/1/2016 11:01:59.85\t"COMPLIED WITH 84 DAY SPECIAL INSPECTION: 120-PO2 C BLAIS-12/27/2016-17:53;210-AT2 J WAGNER-4/11/2017
-03:47.
"\t200
VFA122_EF\tPE3\tFA-18E\tAMAH\t166438\t3BTWWLG\tPE3129515\t5/9/2017 00:03:21.936\t6/12/2017 16:44:25.29\t020\tDD: 5/10/2017\t14 DAY SPE PERFORM 14 DAY SPECIAL INSP INSPECTION SC 0\t000\t030000A\t09355\t5/9/2017 00:04:31.606\t"PERFORMED 14 DAY SPECIAL INSPECTION: X51-CIV T MILLER-5/18/2017-09:32; 110-AD1 C WANG-6/12/2017-08:23;
230-AO2 C HALE-6/10/2017-08:58; 310-CIV T SMITH-6/10/2017-11:52; 120-AM1 R HOHMANN-6/10/2017-14:31;
13B-AME2 J MARALDO-6/12/2017-15:15; 220-AE2 J CA"\t" STILLO-6/10/2017-14:33
"\t215
"""
let pattern = "[ ]*(?:\"((?:[^\"]|\"\")*)\"|([^\t\"\r\\n]*))[ ]*(\t|\r\\n?|\\n|$)"
let regex = try! NSRegularExpression(pattern: pattern)
var result: [[String]] = []
var record: [String] = []
let offset: Int = 0
regex.enumerateMatches(in: tsvText, options: .anchored, range: NSRange(0..<tsvText.utf16.count)) {match, flags, stop in
guard let match = match else {fatalError()}
if let quotedRange = Range(match.range(at: 1), in: tsvText) {
let field = tsvText[quotedRange].replacingOccurrences(of: "\"\"", with: "\"")
record.append(field)
} else if let range = Range(match.range(at: 2), in: tsvText) {
let field = tsvText[range].trimmingCharacters(in: .whitespaces)
record.append(field)
}
let separator = tsvText[Range(match.range(at: 3), in: tsvText)!]
switch separator {
case "": //end of text
//Ignoring empty last line...
if record.count > 1 || (record.count == 1 && !record[0].isEmpty) {
result.append(record)
}
stop.pointee = true
case "\t": //tab
break
default: //newline
result.append(record)
record = []
}
}
print(result)This is not super-efficient, but handles TSV data more appropriately.