sql server - SQL query to split a comma separated column into many-to-many relationships -
i given 3gb csv file need import in sql server 2012.
i have 5 million rows data in staging table looks (simplified).
staging table:
+-------------------+------------+---------------+------------+ | name | thumbnail | tags | categories | +-------------------+------------+---------------+------------+ | history | thumb1.jpg | history,essay | history | | nutricion lecture | thumb2.jpg | food,essay | health | +-------------------+------------+---------------+------------+ the question tags , categories column in staging table.
how can transfer information staging table actual table , create unique record each tag , category -- and create needed many-to-many relationships?
it need check each tag against existing tags either create new record -or- fetch id of existing tag.
programs:
+----+-----------+------------+ | id | program | thumbnail | +----+-----------+------------+ | 1 | history | thumb1.jpg | | 2 | nutricion | thumb2.jpg | +----+-----------+------------+ tags:
+----+---------+ | id | tag | +----+---------+ | 1 | history | | 2 | essay | | 3 | food | +----+---------+ (categories table omitted because looks same tags)
the many-to-many relationships:
programs_tags:
+---------+-----+ | program | tag | +---------+-----+ | 1 | 1 | | 1 | 2 | | 2 | 2 | +---------+-----+ programs_categories:
+---------+----------+ | program | category | +---------+----------+ | 1 | 1 | | 2 | 2 | +---------+----------+ i assume faster in pure sql write tool it.
i'm not sure if faster in sql. but, here approach.
first, create 5 table need this:
- programs
- tags
- categories
- programtags
- programcategories
with appropriate structures, including identity id columns.
then load data programs. easy, appropriate select.
then create tags , categories tables. here how load tags table:
with cte ( select (case when tags '%,%' left(tags, charindex(tags, ',')) else tags end) tag, (case when tags '%,%' substring(tags, charindex(tags, ',') + 1, len(tags)) end) resttags staging tags not null , tags <> '' union select (case when resttags '%,%' left(resttags, charindex(tags, ',')) else resttags end) tag, (case when tags '%,%' substring(resttags, charindex(resttags, ',') + 1, len(testtags)) end) resttags cte resttags not null , resttags <> '' ) select distinct tags cte; (obviously requires insert).
do same categories.
then load programtags using:
select p.programid, t.tagid staging s join programs p on s.<whatever> = p.<whatever> join tags t on ','+s.tags+',' '%,'+t.tag+',%'; the first join program id. second fetch appropriate tags. performance not great, might enough need do.
Comments
Post a Comment