sql server - SQL query to split a comma separated column into many-to-many relationships -
i given 3gb csv file need import in sql server 2012.
i have 5 million rows data in staging table looks (simplified).
staging
table:
+-------------------+------------+---------------+------------+ | name | thumbnail | tags | categories | +-------------------+------------+---------------+------------+ | history | thumb1.jpg | history,essay | history | | nutricion lecture | thumb2.jpg | food,essay | health | +-------------------+------------+---------------+------------+
the question tags
, categories
column in staging table.
how can transfer information staging table actual table , create unique record each tag , category -- and create needed many-to-many relationships?
it need check each tag against existing tags either create new record -or- fetch id
of existing tag.
programs
:
+----+-----------+------------+ | id | program | thumbnail | +----+-----------+------------+ | 1 | history | thumb1.jpg | | 2 | nutricion | thumb2.jpg | +----+-----------+------------+
tags
:
+----+---------+ | id | tag | +----+---------+ | 1 | history | | 2 | essay | | 3 | food | +----+---------+
(categories table omitted because looks same tags)
the many-to-many relationships:
programs_tags
:
+---------+-----+ | program | tag | +---------+-----+ | 1 | 1 | | 1 | 2 | | 2 | 2 | +---------+-----+
programs_categories
:
+---------+----------+ | program | category | +---------+----------+ | 1 | 1 | | 2 | 2 | +---------+----------+
i assume faster in pure sql write tool it.
i'm not sure if faster in sql. but, here approach.
first, create 5 table need this:
- programs
- tags
- categories
- programtags
- programcategories
with appropriate structures, including identity id columns.
then load data programs. easy, appropriate select.
then create tags
, categories
tables. here how load tags
table:
with cte ( select (case when tags '%,%' left(tags, charindex(tags, ',')) else tags end) tag, (case when tags '%,%' substring(tags, charindex(tags, ',') + 1, len(tags)) end) resttags staging tags not null , tags <> '' union select (case when resttags '%,%' left(resttags, charindex(tags, ',')) else resttags end) tag, (case when tags '%,%' substring(resttags, charindex(resttags, ',') + 1, len(testtags)) end) resttags cte resttags not null , resttags <> '' ) select distinct tags cte;
(obviously requires insert
).
do same categories
.
then load programtags
using:
select p.programid, t.tagid staging s join programs p on s.<whatever> = p.<whatever> join tags t on ','+s.tags+',' '%,'+t.tag+',%';
the first join program id. second fetch appropriate tags. performance not great, might enough need do.
Comments
Post a Comment