I'm using Numbers to build a spreadsheet that I'm exporting as a CSV. I then import this file into Create ML to train a word tagger model. Everything has been working fine for all the models I've trained so far, but now I'm coming across a use case that has been breaking the import process: commas within the training data. This is a case that none of Apple's examples show. My project takes Navajo text that has been tokenized by syllables and labels the parts-of-speech. Case that works... Raw text: Naaltsoos yídéeshtah. Tokens column: Naal,tsoos, ,yí,déesh,tah,. Labels column: NObj,NObj,Space,Verb,Verb,VStem,Punct Case that breaks... Raw text: óola, béésh łigaii, tłʼoh naadą́ą́ʼ, wáin, akʼah, dóó á,shįįh Tokens column with tokenized text (commas quoted): óo,la,,, ,béésh, ,łi,gaii,,, ,tłʼoh, ,naa,dą́ą́ʼ,,, ,wáin,,, ,a,kʼah,,, ,dóó, ,á,shįįh (Create ML reports mismatched columns) Tokens column with tokenized text (commas escaped): óo,la,,, ,béésh, ,łi,gaii,,, ,tłʼoh,
Topic:
Machine Learning & AI
SubTopic:
Create ML
Tags:
Natural Language
Machine Learning
Create ML
TabularData