Problems with EPF feed full dump 20211007

Hello friends, we recently ran into some problems will full dump of EPF feed dated 20211007.

I'm wondering if anyone has any suggestions or solutions for these.

itunes/collection file

The collection file seems to have "lost" its column types?

The collection_id column has changed from BIGINT in previous file, to VARCHAR(1000) in latest.

Similarly the media_type_id was INTEGER and is also now VARCHAR(1000), and several datetime fields changed to varchar. And even the longer varchar columns for 4000 chars became 1000 chars.

For reference the file header is now reporting the following:

#export_date collection_id name title_version search_terms parental_advisory_id artist_display_name view_url artwork_url original_release_date itunes_release_date label_studio content_provider_name copyright p_line media_type_id is_compilation collection_type_id
#primaryKey:collection_id
#dbTypes:BIGINT VARCHAR(1000) VARCHAR(1000) VARCHAR(1000) VARCHAR(1000) VARCHAR(1000) VARCHAR(1000) VARCHAR(1000) VARCHAR(1000) VARCHAR(1000) VARCHAR(1000) VARCHAR(1000) VARCHAR(1000) VARCHAR(1000) VARCHAR(1000) VARCHAR(1000) INTEGER VARCHAR(1000)
#exportMode:FULL

Comparing with the previous full dump 20210722

#export_date collection_id name title_version search_terms parental_advisory_id artist_display_name view_url artwork_url original_release_date itunes_release_date label_studio content_provider_name copyright p_line media_type_id is_compilation collection_type_id
#primaryKey:collection_id
#dbTypes:BIGINT BIGINT VARCHAR(1000) VARCHAR(1000) VARCHAR(3000) INTEGER VARCHAR(1000) VARCHAR(1000) VARCHAR(1000) DATETIME DATETIME VARCHAR(1000) VARCHAR(1000) VARCHAR(4000) VARCHAR(4000) INTEGER BOOLEAN INTEGER
#exportMode:FULL

itunes/artist_collection file

This file the columns are still unchanged, so it's a different error. In this case the file contents are not consistent wit the primaryKey constraint in the file.

eg. It's reported the pkey is the tuple (artist_id,collection_id,role_id).

However, when importing from EPFimporter tool it gives many errors with the latest data because there are duplicate rows.

For example the first error I see is for the following entries (which I extracted manually). The problem is the rows are identical except for the "is_primary_artist" value.

export_date artist_id collection_id is_primary_artist role_id
1633587189 36270 1461423948 1 1
1633587189 36270 1461423948 0 1

The first error I think I can handle by forcing the column type to be same as before. But the second the data itself has problems so I'm not sure what to do for it.

Judging by some of the older posts on this forum I'm not sure there will be any reply, but thanks for looking all the same.

I managed to find a couple of answers. Hope it helps some people.

collection column names

I think the column name issue might be related to the new "v5" EPF format which seems to have been added recently. The readme file is dated Oct 13th 2021 so I think these feed dumps are being worked on.

The solution for us was to ensure that all <somename>_id columns are either INTEGER or BIGINT type. Similarly anything is_<name> can be considered as BOOLEAN, and anything <name>_date can be DATETIME except for export_date which is always BIGINT.

artist_collection inconsistencies

For the inconsistent data issue, the EPFImporter tool has an option --skipkeyviolators which will ignore the "extra" rows. So that part is handled too.

Problems with EPF feed full dump 20211007
 
 
Q