MLRecommender training data error

I have such training data

Code Block
X user_id item_id score
0 d1d1dad9-af15-4bc6-9066-5bc39a830eb0 10034e91-1698-4f16-9cc2-483aa2e84372 1
1 d1d1dad9-af15-4bc6-9066-5bc39a830eb0 22ca0dc8-1607-4f48-bef3-84a267607cf5 1
2 d1d1dad9-af15-4bc6-9066-5bc39a830eb0 396a1d47-ca8f-4189-8526-85e40875c363 35
3 d1d1dad9-af15-4bc6-9066-5bc39a830eb0 4827fb22-4bd1-47f8-aee4-fb45b8900cb6 1
4 f69df0fd-3b7e-489d-9197-28d94be3d281 53fb60b1-7d6c-473f-91bf-42fd670ae055 6
5 8730655f-b7b9-4d36-a4c2-f48e866e4533 53fb60b1-7d6c-473f-91bf-42fd670ae055 1
6 d1d1dad9-af15-4bc6-9066-5bc39a830eb0 53fb60b1-7d6c-473f-91bf-42fd670ae055 1
7 9155b83d-d443-46d8-a24d-cff329eb0d07 73bfd56b-b799-43cd-b17b-4ef259d18fcc 35
8 d1d1dad9-af15-4bc6-9066-5bc39a830eb0 be9d114d-2a16-4e24-b142-44fc97351cc6 1
9 d1d1dad9-af15-4bc6-9066-5bc39a830eb0 d8ab8d67-7efa-4370-b7d0-6d176c81901f 1

and while trying to train MLRecommender model I get such error:

Item IDs in the recommender model must be numbered 0, 1, ..., numitems - 1

when I do not use score column as rating model is trained, but this score data is very important.

I tried to mock with "dummy" user
id which scores 0 to every itemids and "dummy" itemid which is scored by all user_ids but this did not fixed model training error.

How then train my model with above data?

Accepted Reply

Hello, as added to Feedback I have used normalised data with dummy ratings of 0.0 and it also didn't worked at all

I have been talking with other person who had similar problems and according to this post:

https://stackoverflow.com/posts/comments/110994450?noredirect=1

I have tested my dummy ratings with values bigger than 0 (used 0.1) and model was trained for the first time with ratings enabled.

So data has to be:
  • normalized (every user rates every item)

  • dummy ratings has to be higher than 0

I've created pandas python script for data normalisation and I'm really happy to have it working

Please update docs with this informations if possible so people trying to use MLRecommender has direct knowledge what they should do to make it working.

Replies

here are source files. original data and mocked normalised

https://gist.github.com/nysander/4a37db1abde1bfa4b58706ca2d1ae27e
Thanks for supplying the data, that's really going to help us.

Looks like this feedback is yours: FB7854032
yes. that is my feedback


Update:

This error is the same for CreateML app bundled with Xcode 11.5 and 12-beta
Hi Paweł. Thanks for the sample data, it was very helpful, and I was able to confirm the issue existed on macOS Catalina.

I was able to train using your data without an error using MacOS Big Sur beta-2 and Xcode 12 beta-2.
Hi Pawel, Thank you for taking the time to reach out. A potential workaround to try for this issue on macOS Catalina is to add a dummy user that has rated all items and a dummy item that has been rated by all the users -- if you are able to update your OS to the latest macOS 11, this issue should be addressed.


Hello, as added to Feedback I have used normalised data with dummy ratings of 0.0 and it also didn't worked at all

I have been talking with other person who had similar problems and according to this post:

https://stackoverflow.com/posts/comments/110994450?noredirect=1

I have tested my dummy ratings with values bigger than 0 (used 0.1) and model was trained for the first time with ratings enabled.

So data has to be:
  • normalized (every user rates every item)

  • dummy ratings has to be higher than 0

I've created pandas python script for data normalisation and I'm really happy to have it working

Please update docs with this informations if possible so people trying to use MLRecommender has direct knowledge what they should do to make it working.