I wrote a Python script to analyze a large dataset of chess games. I've been testing it with a small dataset, about 1 mb, and it works no problem. Running the same code with a larger dataset (about 10gb) will run for a few minutes but will always crash around the same time (but never at the exact same moment). It seems like the system is killing the process when it starts using a certain threshold of resources. I am on a brand new M2 MacBook Pro with 16gb of ram. I've tried restarting my computer and I've reinstalled python from several different sources, all to no avail.
When it crashed, the shell says nothing more than "zsh: killed". Here is the Python script in question:
print('opening file')
games = open("games.txt", "r")
gamesArray = []
print("reading file")
for textLine in games:
gamesArray.append(textLine)
posArray = []
print("dissecting games")
ct=0
for game in gamesArray:
ct+=1
if (ct % 10000 == 0):
print("dissecting games: " + str(ct / len(gamesArray) * 100) + "%")
positions = game.split('.')
pos = ''
for state in positions:
pos+=state
if (len(pos) > 5):
posArray.append(pos)
print("converting positions to set")
posArraySet = set(posArray)
print("removing duplicates")
posArrayUnique = (list(posArraySet))
posFreqArray = []
print("counting duplicates")
ct = 0
for i in posArrayUnique:
ct+=1
if (ct % 10000 == 0):
print("counting duplicates " + str(100 * (ct / len(posArrayUnique))) + "%")
posFreqArray.append({"position": i, "ct": posArray.count(i)})
# print("counting variations")
posFreqArray = sorted(posFreqArray, key=lambda x: x['ct'])
# the below is all unnecessary lol
# ct = 0
# for i in range(0,len(posFreqArray)):
# ct += 1
# if (ct % 100 == 0):
# print(str(100*(ct/len(posFreqArray))))
# for n in range(0, len(posFreqArray)):
# if (i != n):
# if (posFreqArray[i]["position"] in posFreqArray[n]["position"]):
# posFreqArray[i]["ct"] += posFreqArray[n]["ct"]
print('writing to data.txt')
outputFile = open("data.txt", "w")
output = ''
for i in posFreqArray:
output += str(i) + '\n'
outputFile.write(output)
print("done")
I'm on MacOS 13.2.1 and Python 3.11.
Any help would be greatly appreciated. Thanks in advance!