Fastest way to insert millions of records in SQL Server

Whenever there are some SQL related task with high performance requirements, it often takes a lot of research. Inserting data in SQL tables is a simple task, but when we have to insert millions of records with high performance, it might take a bit a research. Also we will discuss about inserting relational data too. We will be using C# in below Examples. So let’s get straight to the point.

There are 2 types of data that we might need to insert:

  1. Non Relational Data/ Single Table Data
  2. Relational data/ Hierarchical data

Non Relational Data

  • Approach 1 – Inserting using SQL command

Inserting rows one by one through code using SQLConnection and SQLCommand classes is not a good idea, since there will be any server-DB hits which is a very costly process

  • Approach 2 – Bulk Insert

Bulk insert is an efficient method for inserting millions of records with high performance. Bulk insert will collect all your data into a Data Table on the server side and will push the data into the DB in a single hit. Also, we can use batching in it. We will be doing it using .Net framework’s SqlBulkCopy class in C#.

We will be using below schema for example, in which we will be inserting the data:

CREATE TABLE [dbo].[products](
     [id] [bigint] IDENTITY(1,1) NOT NULL,
     [product_name] [varchar](1000) NULL,
     [product_code] [varchar](100) NULL,
     [base_price] [numeric](18, 0) NULL
) 

Remember that while creating data table, the sequence of the columns must be same as of the table in DB. Creating data table below, altough this data table may contain millions of records as per the requirement:

DataTable dt = new DataTable();  
dt.Columns.Add(new DataColumn("id", typeof(Int32)));   
dt.Columns.Add(new DataColumn("product_name", typeof(string)));  
 
dt.Columns.Add(new DataColumn("product_code", typeof(string)));
dt.Columns.Add(new DataColumn("base_price", typeof(Int32)));

DataRow row = tbl.NewRow();
row["id"] = 1;
row["product_name"] = "Test Product Name";
row["product_code"] = "Test";
row["base_price"] = 100;

tbl.Rows.Add(row);

At last, below is the code for insert through bulk insert command in a single hit:

string connection = "Data Source=localhost;Initial Catalog=XXXXXXX;User Id = {userId}; Password = {pass}";
SqlConnection con = new SqlConnection(connection);
SqlBulkCopy bulkCopy = new SqlBulkCopy(con);

bulkCopy.DestinationTableName = "tblTest";

There is a limitation related to transaction for bulk insert. If we are inserting data into multiple tables and need transaction for the same, then we will need to explicitly handle transaction using the SqlTransaction class.

Relational Data / Hierarchical data

Now here comes the relational part. The main problem with relational data insert is that we need to insert data into multiple tables containing relations with foreign keys. So after inserting data into the base table, we will need to set foreign keys into the related data before inserting into DB. Also while inserting millions of records, we most likely need to maintain transactions also, in case something fails.

  • Approach 1 – using SQL command class

SQL command will return the primary key of the inserted data, but setting that primary key into the foreign key of the related data, that too while maintaining transaction, is a very tedious task. So lets quickly move to the next solution.

  • Approach 2 – Using Entity Framework

Entity framework is very comfortable to use in case of relational / hierarchical data. We need to just create hierarchical object and push it to the configured EF on the Database. but since we are talking about WRITE operation of millions of records, entity framework becomes a bit slow and also, there isn’t much we can do about it. EF is the best solution when most of the times, there is no high performance requirement. So lets quickly move to the best solution below.

  • Approach 3 – Creating reference Ids for related data and inserting through Stored procedure

Lets take an example of below schema that we will be using to explain the same.

CREATE TABLE [dbo].[products](
	[id] [bigint] IDENTITY(1,1) NOT NULL,
	[product_name] [varchar](1000) NULL,
	[product_code] [varchar](100) NULL,
	[base_price] [numeric](18, 0) NULL
) 

CREATE TABLE [dbo].[products_variants](
	[id] [bigint] IDENTITY(1,1) NOT NULL,
	[product_id] [bigint] NULL,
	[variant_name] [varchar](1000) NULL
 )

CREATE TABLE [dbo].[products_packages](
	[id] [bigint] IDENTITY(1,1) NOT NULL,
	[product_id] [bigint] NULL,
	[variant_id] [bigint] NULL,
	[width] [numeric](18, 0) NULL,
	[height] [numeric](18, 0) NULL,
	[length] [numeric](18, 0) NULL,
	[weight] [numeric](18, 0) NULL
)

Now we have 3 tables, product_variants contains product_id as foreign key and product_packages contains 2 foreign keys – variant_id and product_id.

We will be creating 3 Data Tables and pass the same to a Stored procedure to insert. But since we do not have any relation between the rows of the 3 data tables (i.e. which product_variant belongs to which product) because the data is not inserted into the DB and there is no foreign key linkage yet. We will create a reference ID in the 3 tables and link the same using a unique GUID. This will make a relation between the entities on the server side. Below are the models:

 class ProductsTableEntity
 {
      public string product_name { get; set; }
      public long user_id { get; set; }
      public string product_code { get; set; }
      public decimal base_price { get; set; }
      public string reference_id { get; set; }
 }

class ProductsVariantsEntity
{
      public long? id{ get; set; }
      public long? product_id{ get; set; }
      public string variant_name{ get; set; }
      public string reference_id{ get; set; }
      public string variant_reference_id { get; set; }
}

class ProductsPackagesEntity
{
      public long? id { get; set; }
      public long? product_id { get; set; }
      public long? variant_id { get; set; }
      public decimal? width { get; set; }
      public decimal? height { get; set; }
      public decimal? length { get; set; }
      public decimal? weight { get; set; }
      public string reference_id { get; set; }
      public string variant_reference_id { get; set; }
}

In above models, we have created reference_id in product and product_variant for their linkage and variant_reference_id for linkage beween product_variant and product_package.

Below is an exmaple for the linkage so that out Store procedure will be able to identify between the same.


Dictionary<string, ProductsTableEntity> productsDictionary = new Dictionary<string, ProductsTableEntity>();

ProductsTableEntity productData = new ProductsTableEntity();
productData.product_code = "P_code1";
productData.product_name = "Lenovo Laptop";
productData.base_price = 100;
productData.reference_id = Guid.NewGuid().ToString();

productsDictionary.Add(product.product_code , productData);

Now will add the same reference id to the product_variant and similarly for product_packages

List<ProductsVariantsEntity> productVariantsList = new List< ProductsVariantsEntity>();
List<ProductsPackagesEntity> productPackagesList = new List<ProductsPackagesEntity>();

var prodVariant = new ProductsVariantsEntity();
prodVariant.variant_name = "8 GB RAM Variant";
prodVariant.product_id = productData?.product_id;
prodVariant.reference_id = productData.reference_id;
prodVariant.variant_reference_id = Guid.NewGuid().ToString();
productVariantsList.Add(prodVariant);

var prodPackage = new ProductsPackagesEntity();
prodPackage.length = 10;
prodPackage.height = 20;
prodPackage.width = 30;
prodPackage.refrence_id = productData.reference_id;
prodPackage.variant_reference_id = prodVariant.variant_reference_id;
productPackagesList.Add(prodPackage);

The above code could be in a loop to add millions of records as per the requirement.

Below is the data tables and Stored prodecure which will receive the dta tables and will insert the data:

Data Tables
CREATE TYPE [dbo].[ProductTable2] AS TABLE(
	[product_name] [varchar](100) NULL,
	[product_code] [varchar](100) NULL,
	[base_price] [numeric](18, 0) NULL,
	[reference_id] [nvarchar](100) NULL
)

CREATE TYPE [dbo].[ProductsVariantsTable] AS TABLE(
	[id] [bigint] NULL,
	[product_id] [bigint] NULL,
	[variant_name] [varchar](1000) NULL,
	[reference_id] [nvarchar](100) NULL,
	[variant_reference_id] [nvarchar](100) NULL
)

CREATE TYPE [dbo].[ProductsPackagesTable] AS TABLE(
	[id] [bigint] NULL,
	[product_id] [bigint] NULL,
	[variant_id] [bigint] NULL,
	[width] [numeric](18, 2) NULL,
	[height] [numeric](18, 2) NULL,
	[length] [numeric](18, 2) NULL,
	[weight] [numeric](18, 2) NULL,
	[reference_id] [nvarchar](100) NULL,
	[variant_reference_id] [nvarchar](100) NULL
)
Stored Procedure
CREATE PROCEDURE [dbo].[sp_insertProducts]  
 @ProductTable ProductTable READONLY  ,
 @ProductVariantTable [dbo].[ProductsVariantsTable] READONLY,    
 @ProductPackageTable [dbo].[ProductsPackagesTable] READONLY   
AS  
BEGIN  
 BEGIN TRY  
 BEGIN TRANSACTION  
  
    DECLARE @InsertedValuesProduct TABLE(ID BIGINT NOT NULL,  
    [UID] NVARCHAR(100) NULL ) 

    DECLARE @InsertedValuesProductVariant TABLE(ID BIGINT NOT NULL,  
    [UID] NVARCHAR(100) NULL ) 

    DECLARE @InsertedValuesProductPackage TABLE(ID BIGINT NOT NULL,  
    [UID] NVARCHAR(100) NULL ) 
     
    -- --products  
    INSERT INTO [dbo].[products]( product_name, product_code
    ,[base_price])   
    Output Inserted.id, Inserted.reference_id into
    @InsertedValuesProduct  
    SELECT product_name, diq_product_code,[base_price]
    from @ProductTable PT 

    -- --prod variants  
    INSERT INTO [dbo].[products_variants]([product_id],
                [variant_name], reference_id, variant_reference_id)   
    Output Inserted.id, Inserted.variant_reference_id into 
    @InsertedValuesProductVariant  
    SELECT ISNULL(product_id, IVP.ID),[variant_name] 
    reference_id,variant_reference_id 
    from @ProductVariantTable PVT left join @InsertedValuesProduct     
    IVP ON PVT.reference_id = IVP.[UID]  
     
    --prod package  
    INSERT INTO [dbo].[products_packages] ([product_id],[variant_id],
    [width],[height],[length],[weight])  
    SELECT ISNULL([product_id], IVP.ID),ISNULL([variant_id], IVP.ID),
    [width],[height],[length],[weight] FROM @ProductPackageTable PPT
    left join @InsertedValuesProduct IVP ON PPT.reference_id = IVP. 
    [UID] left join @InsertedValuesProductVariant IVPV ON 
    PPT.variant_reference_id = IVPV.[UID]
	 
  commit TRANSACTION   
 END TRY  
 BEGIN CATCH 
 ROLLBACK TRAN
 throw  
 END CATCH  
END  

The above stored procedure takes less than 1 second to insert a million of records. Please let us know in below comment box for suggestions.

Leave a Comment

Your email address will not be published.