




Reply to Create ML Trouble Loading CSV to Train Word Tagger With Commas in Training Data
Here is a more recent case to show what I'm trying to do, as that example with the punctuation was a proof-of-concept for testing. This includes a few commas within the text to be trained. Other examples include quotation marks. Hodeeyáádą́ą́ʼ,Diyin,God,yótʼááh,hiníláii,índa,nahasdzáán,áyiilaa,.,Nahasdzáán,tʼáadoo,ánoolniní,da,",",índa,tʼáadoo,bikááʼ,siláhí,da,;,bikáaʼgi,tʼáá,átʼéé,nítʼééʼ,chahałheełgo,Diyin,God,biNíłchʼi,Diyinii,tó,yikááʼgóó,nahazleʼ,.,Áádóó,Diyin,God,ádííniid,",",Adinídíin,leʼ,.,Tʼáá,áko,adinídíín,hazlį́į́ʼ,.,Áko,Diyin,God,éí,adinídínígíí,yinééłʼį́įʼgo,bił,yáʼíítʼééh,",",áádóó,adinídínígíí,chahałheeł,yił,ałtsʼáyíínil,. Adv,Adj,NSub,NObjPos,VPerf,Conj,NObj,VPerf,Punct,NSub,AdvNeg,VProg,PartNeg,Punct,Conj,AdvNeg,Adp,VImpf,PartNeg,Punct,Adp,Adv,VImpf,Adv,Adv,Adj,NSub,NSubPos,NSubPos,NAdp,AdpPos,VImpf,Punct,Conj,Adj,NSub,VPerf,Punct,NObj,VImp,Punct,Adv,Adv,NSub,VPerf,Punct,Adv,Adj,NSub,Pro,NObj,VPerfAdv,ProAdp,VPerf,Punct,Conj,NSub,NAdp,Adp,VPerf,Punct The core problem is that Create ML does not seem to support several CSV escaping formats that various spreadsheet tools do (including Apple's own Numbers). Additionally, it does not support other file formats directly exported from Numbers that I could find. That makes commas and quotation marks difficult to include in any training data. I've been able to get around this by writing my own tool that imports TSV files from Numbers and converts them to JSON files that Create ML accepts, adding 2 more steps to the training process each time. However, this post was originally about how to get Create ML to directly accept a Numbers CSV file without added steps every time. If it is not a bug, and Create ML just lacks the functionality, I will continue as I have with my custom work-around, and we can consider this issue resolved.
Jan ’25