In some projects, we may have bulk import functionality to a table which is already exist with our system. SQL Server given lots of facilities for importing data into the database, some are –
- Using bcp Utility
- Using BULK INSERT statement
- Using OPENROWSET
- Etc.,
So the use case will be as below –
- The input for the bulk import would be either .csv file or an .xml file.
- The records from the input file will as per the importing table schema. So the values will be able to insert or update without any issue in the same order and data type.
- The records from the csv may already present in the table or may not.
- When no records present, all the records will be inserted (Ex: initial stage).
- If some records are already present, the records will be updated otherwise inserted.
Note: This implementation will work only from SQL Server 2008 and later versions (as MERGE statement introduced from SQL Server 2008).
The implementation as follows-
Step 1: I am taking Customer table for importing the records (Customer table created from Northwind database schema).
Below is the table schema –
IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[Customers]') AND type in (N'U')) DROP TABLE [dbo].[Customers] GO CREATE TABLE [dbo].[Customers]( [CustomerID] [nchar](5) NOT NULL, [CompanyName] [nvarchar](40) NOT NULL, [ContactName] [nvarchar](30) NULL, [ContactTitle] [nvarchar](30) NULL, [Address] [nvarchar](60) NULL, [City] [nvarchar](15) NULL, [Region] [nvarchar](15) NULL, [PostalCode] [nvarchar](10) NULL, [Country] [nvarchar](15) NULL, [Phone] [nvarchar](24) NULL, [Fax] [nvarchar](24) NULL, CONSTRAINT [PK_Customers1] PRIMARY KEY CLUSTERED ( [CustomerID] ASC ) ) GOImplementing with a csv file for importing the data. Some of the records in the csv file would be
ALFKI,Alfreds Futterkiste,Maria Anders1,Sales Representative,Obere Str. 57,Berlin,NULL,12209,Germany,030-0074321,030-0076545 ANATR,Ana Trujillo Emparedados y helados,Ana Trujillo,Owner,Avda. de la Constituci¢n 2222,M‚xico D.F.,NULL,5021,Mexico,(5) 555-4729,(5) 555-3745 ANTON,Antonio Moreno Taquer¡a,Antonio Moreno,Owner,Mataderos 2312,M‚xico D.F.,NULL,5023,Mexico,(5) 555-3932,NULL AROUT,Around the Horn,Thomas Hardy,Sales Representative,120 Hanover Sq.,London,NULL,WA1 1DP,UK,(171) 555-7788,(171) 555-6750 BERGS,Berglunds snabbk”p,Christina Berglund,Order Administrator,Berguvsv„gen 8,Lule†,NULL,S-958 22,Sweden,0921-12 34 65,0921-12 34 67 BLAUS,Blauer See Delikatessen,Hanna Moos,Sales Representative,Forsterstr. 57,Mannheim,NULL,68306,Germany,0621-08460,0621-08924 BLONP,Blondesddsl pŠre et fils,Fr‚d‚rique Citeaux,Marketing Manager,"24, place Kl‚ber",Strasbourg,NULL,67000,France,88.60.15.31,88.60.15.32 BOTTM,Bottom-Dollar Markets,Elizabeth Lincoln,Accounting Manager,23 Tsawassen Blvd.,Tsawassen,BC,T2F 8M4,Canada,(604) 555-4729,(604) 555-3745 BSBEV,B's Beverages,Victoria Ashworth,Sales Representative,Fauntleroy Circus,London,NULL,EC2 5NT,UK,(171) 555-1212,NULL CACTU,Cactus Comidas para llevar,Patricio Simpson,Sales Agent,Cerrito 333,Buenos Aires,NULL,1010,Argentina,(1) 135-5555,(1) 135-4892 CENTC,Centro comercial Moctezuma,Francisco Chang,Marketing Manager,Sierras de Granada 9993,M‚xico D.F.,NULL,5022,Mexico,(5) 555-3392,(5) 555-7293The following store procedure is used for getting the data from csv file and importing to the Customers table (both insert and update).
CREATE PROCEDURE SP_ImportCustomerData AS BEGIN -- Creating a Temproary Table for importing the data from csv file. CREATE TABLE #Customers( [CustomerID] [nchar](5) NOT NULL, [CompanyName] [nvarchar](40) NOT NULL, [ContactName] [nvarchar](30) NULL, [ContactTitle] [nvarchar](30) NULL, [Address] [nvarchar](60) NULL, [City] [nvarchar](15) NULL, [Region] [nvarchar](15) NULL, [PostalCode] [nvarchar](10) NULL, [Country] [nvarchar](15) NULL, [Phone] [nvarchar](24) NULL, [Fax] [nvarchar](24) NULL, CONSTRAINT [PK_Customers1] PRIMARY KEY CLUSTERED ( [CustomerID] ASC ) ); -- Inserting all the from csv to temproary table using BULK INSERT BULK INSERT #Customers FROM 'D:\Blog\ImportData\CustomerImport.csv' WITH ( FIELDTERMINATOR = ',', ROWTERMINATOR = '\n' ); -- Selecting the records from temproary table. This is just to know the records inserted or not. -- SELECT * FROM #Customers; -- By using MERGE statement, inserting the record if not present and updating if exist. MERGE Customers AS TargetTable -- Inserting or Updating the table. USING #Customers AS SourceTable -- Records from the temproary table (records from csv file). ON (TargetTable.CustomerID = SourceTable.CustomerID) -- Defining condition to decide which records are alredy present WHEN NOT MATCHED BY TARGET -- If the records in the Customer table is not matched? THEN INSERT (CustomerID, CompanyName, ContactName, ContactTitle, [Address], -- then INSERT the record City, Region, PostalCode, Country, Phone, Fax) VALUES(SourceTable.CustomerID, SourceTable.CompanyName, SourceTable.ContactName, SourceTable.ContactTitle, SourceTable.[Address], SourceTable.City, SourceTable.Region, SourceTable.PostalCode, SourceTable.Country, SourceTable.Phone, SourceTable.Fax) WHEN MATCHED -- If not matched then UPDATE THEN UPDATE SET TargetTable.CustomerID = SourceTable.CustomerID, TargetTable.CompanyName = SourceTable.CompanyName, TargetTable.ContactName = SourceTable.ContactName, TargetTable.ContactTitle = SourceTable.ContactTitle, TargetTable.[Address] = SourceTable.[Address], TargetTable.City = SourceTable.City, TargetTable.Region = SourceTable.Region, TargetTable.PostalCode = SourceTable.PostalCode, TargetTable.Country = SourceTable.Country, TargetTable.Phone = SourceTable.Phone, TargetTable.Fax = SourceTable.Fax; SELECT * FROM Customers; ENDThe store procedure does the following points
- Create a Temproary Table for importing the data from csv file.
- Insert all the records from csv to Temproary Table using BULK INSERT
- Insert the records from Temproary Table to the Customers table when not exist and Update when not exist.
Implementation for XML data:
Let us take the same example and explore using XML file as input.
The XML file looks as below:
<?xml version="1.0" encoding="utf-8" ?> <Customers> <Customer> <CustomerID>AEHM1</CustomerID> <CompanyName>Around the Horn test</CompanyName> <ContactName>Thomas Hardy</ContactName> <ContactTitle>Owner</ContactTitle> <Address>120 Hanover Sq.</Address> <City>London</City> <Region>SP</Region> <PostalCode>WA1 1DP</PostalCode> <Country>Mexico</Country> <Phone>(171) 555-7788</Phone> <Fax>(171) 555-6750</Fax> </Customer> <Customer> <CustomerID>AFGK</CustomerID> <CompanyName>Antonio Moreno Taquer-ía</CompanyName> <ContactName>Antonio Moreno</ContactName> <ContactTitle>Owner</ContactTitle> <Address>Mataderos 2312</Address> <City>MGÇÜxico D.F.</City> <Region>SP</Region> <PostalCode>5023</PostalCode> <Country>Mexico</Country> <Phone>(5) 555-3932</Phone> <Fax>(5) 555-3745</Fax> </Customer> <Customers>
The Store Procedure would be
CREATE PROCEDURE SP_ImportCustomerData_XML AS BEGIN -- Creating a Temproary Table for importing the data from csv file. CREATE TABLE #Customers( [CustomerID] [nchar](5) NOT NULL, [CompanyName] [nvarchar](40) NOT NULL, [ContactName] [nvarchar](30) NULL, [ContactTitle] [nvarchar](30) NULL, [Address] [nvarchar](60) NULL, [City] [nvarchar](15) NULL, [Region] [nvarchar](15) NULL, [PostalCode] [nvarchar](10) NULL, [Country] [nvarchar](15) NULL, [Phone] [nvarchar](24) NULL, [Fax] [nvarchar](24) NULL, CONSTRAINT [PK_Customers1] PRIMARY KEY CLUSTERED ( [CustomerID] ASC ) ); -- Inserting all the rows from xml to temproary table using OPENROWSET -- Thanks to http://pratchev.blogspot.com/2008/11/import-xml-file-to-sql-table.html INSERT INTO #Customers (CustomerID, CompanyName, ContactName, ContactTitle, [Address], City, Region, PostalCode, Country, Phone, Fax) SELECT X.Customer.query('CustomerID').value('.', 'nchar(5)'), X.Customer.query('CompanyName').value('.', 'nvarchar(40)'), X.Customer.query('ContactName').value('.', 'nvarchar(30)'), X.Customer.query('ContactTitle').value('.', 'nvarchar(30)'), X.Customer.query('Address').value('.', 'nvarchar(60)'), X.Customer.query('City').value('.', 'nvarchar(15)'), X.Customer.query('Region').value('.', 'nvarchar(15)'), X.Customer.query('PostalCode').value('.', 'nvarchar(10)'), X.Customer.query('Country').value('.', 'nvarchar(15)'), X.Customer.query('Phone').value('.', 'nvarchar(24)'), X.Customer.query('Fax').value('.', 'nvarchar(24)') FROM (SELECT CAST(x AS XML) FROM OPENROWSET(BULK 'D:\ImportData\XMLData1.xml',SINGLE_BLOB) AS T(x) ) AS T(x) CROSS APPLY x.nodes('Customers/Customer') AS X(Customer); -- Selecting the records from temproary table. This is just to know the records inserted or not. -- SELECT * FROM #Customers; -- By using MERGE statement, inserting the record if not present and updating if exist. MERGE Customers AS TargetTable -- Inserting or Updating the table. USING #Customers AS SourceTable -- Records from the temproary table (records from csv file). ON (TargetTable.CustomerID = SourceTable.CustomerID) -- Defining condition to decide which records are alredy present WHEN NOT MATCHED BY TARGET -- If the records in the Customer table is not matched? THEN INSERT (CustomerID, CompanyName, ContactName, ContactTitle, [Address], -- then INSERT the record City, Region, PostalCode, Country, Phone, Fax) VALUES(SourceTable.CustomerID, SourceTable.CompanyName, SourceTable.ContactName, SourceTable.ContactTitle, SourceTable.[Address], SourceTable.City, SourceTable.Region, SourceTable.PostalCode, SourceTable.Country, SourceTable.Phone, SourceTable.Fax) WHEN MATCHED -- If not matched then UPDATE THEN UPDATE SET TargetTable.CustomerID = SourceTable.CustomerID, TargetTable.CompanyName = SourceTable.CompanyName, TargetTable.ContactName = SourceTable.ContactName, TargetTable.ContactTitle = SourceTable.ContactTitle, TargetTable.[Address] = SourceTable.[Address], TargetTable.City = SourceTable.City, TargetTable.Region = SourceTable.Region, TargetTable.PostalCode = SourceTable.PostalCode, TargetTable.Country = SourceTable.Country, TargetTable.Phone = SourceTable.Phone, TargetTable.Fax = SourceTable.Fax; SELECT * FROM Customers; END
[Update 10-Sep-2012]
As Charles Rice comment, I am updating this post with an example for importing data into a table that has Auto increment column.
Requirement 1: There is a table which contains Country, State, City, Remarks columns. It also contains a column Id which is auto increment values. The requirement is to import these values from csv files.
The table script would be:
CREATE TABLE [dbo].[TempCity]( [Id] [int] IDENTITY(1,1) NOT NULL, [Country] [varchar](50) NULL, [State] [varchar](50) NULL, [City] [varchar](50) NULL, [Remarks] [varchar](200) NULL ) ON [PRIMARY] GOThe .csv file would be:
1,India,Tamil Nadu,Chennai, 2,India,Tamil Nadu,Coimbatore, 3,India,Tamil Nadu,Madurai,testremarks 4,India,Tamil Nadu,Vellore, 5,India,Karnataka,Bangalore, 6,India,Karnataka,Mangalore, 7,India,Gujarat,Ahmedabad,Note: Add an empty line at the end.
Here, the autoincrement id act as primary key. So to identity which record to update (if exist) or insert (if not exist) - we need to include that id also in csv file (included as first column). It is important to note is, the autoincrement id will be consider to compare with already inserted records id and not for new records. So, there is no need to include exact id in the csv file. It can contain any number which is not the existing record id would be fine.
The SQL SP would be:
CREATE PROCEDURE SP_ImportCustomerData AS BEGIN CREATE TABLE #TempCity( [Id] [int] IDENTITY(1,1) NOT NULL, [Country] [varchar](50) NULL, [State] [varchar](50) NULL, [City] [varchar](50) NULL, [Remarks] [varchar](200) NULL ) ON [PRIMARY]; BULK INSERT #TempCity FROM 'C:\Blog\test1.csv' WITH ( FIELDTERMINATOR = ',', ROWTERMINATOR = '\n' ); MERGE TempCity AS TargetTable USING #TempCity AS SourceTable ON (TargetTable.Id = SourceTable.Id) WHEN NOT MATCHED BY TARGET THEN INSERT (Country, State, City, Remarks) VALUES(SourceTable.Country, SourceTable.State, SourceTable.City, SourceTable.City) WHEN MATCHED THEN UPDATE SET TargetTable.Country = SourceTable.Country, TargetTable.State = SourceTable.State, TargetTable.City = SourceTable.City, TargetTable.Remarks = SourceTable.Remarks; SELECT * FROM TempCity; ENDIf we run this SP, this will insert/update the TempCity table from the input csv file.
Requirement 2: In some cases, the requirement is to not include the autoincrement id in the input csv/xml file. The comparision must be decided by other columns (may be more the one) to insert or update the records from csv file.
In our example, let us take the combination of Country, State, City columns are considered as identity column (primary column) for a single record. So the merging process needs to consider by combining these three column. So once a record inserted with these three values, the only possibilities is to update the Remarks column. It is important to note is the auto-increment id is not consider anywhere.
The csv file would be
India,Tamil Nadu,Chennai, India,Tamil Nadu,Coimbatore, India,Tamil Nadu,Madurai, India,Tamil Nadu,Vellore,testremarks India,Karnataka,Bangalore, India,Karnataka,Mangalore, India,Gujarat,Ahmedabad,Here, I am not passing the value of Id column in the csv file. Note: Add an empty line at the end.
The SQL SP would be :
CREATE PROCEDURE SP_ImportCustomerData AS BEGIN CREATE TABLE #TempCity( [Country] [varchar](50) NULL, [State] [varchar](50) NULL, [City] [varchar](50) NULL, [Remarks] [varchar](200) NULL ) ON [PRIMARY]; BULK INSERT #TempCity FROM 'C:\Blog\test2.csv' WITH ( FIELDTERMINATOR = ',', ROWTERMINATOR = '\n' ); MERGE TempCity AS TargetTable USING #TempCity AS SourceTable ON (TargetTable.Country = SourceTable.Country) AND (TargetTable.State = SourceTable.State) AND (TargetTable.City = SourceTable.City) WHEN NOT MATCHED BY TARGET THEN INSERT (Country, State, City, Remarks) VALUES(SourceTable.Country, SourceTable.State, SourceTable.City, SourceTable.Remarks) WHEN MATCHED THEN UPDATE SET TargetTable.Remarks = SourceTable.Remarks; SELECT * FROM TempCity; END
Below are the screen shot of execution job
Table Creation |
SP execution at first time (with 76 records in the csv file) - All records are inserted |
Adding some more records in the csv file and updating some existing records |
Inserted the additional records and updated that 2 records |
The updated records are pointed out |
The output for XML input will the same as csv input, only the difference is the input file.
When using MERGE statement, there are some points required to keep in mind.
- Create an index on the join columns in the source table that is unique and covering.
- Create a unique clustered index on the join columns in the target table.
How would this work with auto incrementing IDs rather than your alphanumeric unique IDs?
Hi Charles, I updated the post merging with Auto incremental Ids. Please review the update portion.
Many Thanks
great job
Hello,
How do I achieve this with Unique Identifier column ..which is a primary key. Please let me know
Thank you very much. The post is awesome. Step by step. The code is very clean and well commented. Great job. I'd like to see some benchmarks, though.
Thanks
Thanks for this SQL Query nicely explained how to insert and update in bulk using insert and merge statement