sql server - SQL query to split a comma separated column into many-to-many relationships -


i given 3gb csv file need import in sql server 2012.

i have 5 million rows data in staging table looks (simplified).

staging table:

+-------------------+------------+---------------+------------+ |       name        | thumbnail  |     tags      | categories | +-------------------+------------+---------------+------------+ | history           | thumb1.jpg | history,essay | history    | | nutricion lecture | thumb2.jpg | food,essay    | health     | +-------------------+------------+---------------+------------+ 

the question tags , categories column in staging table.

how can transfer information staging table actual table , create unique record each tag , category -- and create needed many-to-many relationships?

it need check each tag against existing tags either create new record -or- fetch id of existing tag.

programs:

+----+-----------+------------+ | id |  program  | thumbnail  | +----+-----------+------------+ |  1 | history   | thumb1.jpg | |  2 | nutricion | thumb2.jpg | +----+-----------+------------+ 

tags:

+----+---------+ | id |   tag   | +----+---------+ |  1 | history | |  2 | essay   | |  3 | food    | +----+---------+ 

(categories table omitted because looks same tags)

the many-to-many relationships:

programs_tags:

+---------+-----+ | program | tag | +---------+-----+ |       1 |   1 | |       1 |   2 | |       2 |   2 | +---------+-----+ 

programs_categories:

+---------+----------+ | program | category | +---------+----------+ |       1 |        1 | |       2 |        2 | +---------+----------+ 

i assume faster in pure sql write tool it.

i'm not sure if faster in sql. but, here approach.

first, create 5 table need this:

  • programs
  • tags
  • categories
  • programtags
  • programcategories

with appropriate structures, including identity id columns.

then load data programs. easy, appropriate select.

then create tags , categories tables. here how load tags table:

with cte (       select (case when tags '%,%'                    left(tags, charindex(tags, ','))                    else tags               end) tag,              (case when tags '%,%'                    substring(tags, charindex(tags, ',') + 1, len(tags))               end) resttags       staging       tags not null , tags <> ''       union       select (case when resttags '%,%' left(resttags, charindex(tags, ','))                    else resttags               end) tag,              (case when tags '%,%'                    substring(resttags, charindex(resttags, ',') + 1, len(testtags))               end) resttags       cte       resttags not null , resttags <> ''      ) select distinct tags cte; 

(obviously requires insert).

do same categories.

then load programtags using:

select p.programid, t.tagid staging s join      programs p      on s.<whatever> = p.<whatever> join      tags t      on ','+s.tags+',' '%,'+t.tag+',%'; 

the first join program id. second fetch appropriate tags. performance not great, might enough need do.


Comments

Popular posts from this blog

c++ - How to add Crypto++ library to Qt project -

jQuery Mobile app not scrolling in Firefox -

How to use vim as editor in Matlab GUI -