Whenever there are some SQL related task with high performance requirements, it often takes a lot of research. Inserting data in SQL tables is a simple task, but when we have to insert millions of records with high performance, it might take a bit a research. Also we will discuss about inserting relational data too. We will be using C# in below Examples. So let’s get straight to the point.
There are 2 types of data that we might need to insert:
- Non Relational Data/ Single Table Data
- Relational data/ Hierarchical data
Non Relational Data
- Approach 1 – Inserting using SQL command
Inserting rows one by one through code using SQLConnection and SQLCommand classes is not a good idea, since there will be any server-DB hits which is a very costly process
- Approach 2 – Bulk Insert–
Bulk insert is an efficient method for inserting millions of records with high performance. Bulk insert will collect all your data into a Data Table on the server side and will push the data into the DB in a single hit. Also, we can use batching in it. We will be doing it using .Net framework’s SqlBulkCopy class in C#.
We will be using below schema for example, in which we will be inserting the data:
CREATE TABLE [dbo].[products](
[id] [bigint] IDENTITY(1,1) NOT NULL,
[product_name] [varchar](1000) NULL,
[product_code] [varchar](100) NULL,
[base_price] [numeric](18, 0) NULL
)
Remember that while creating data table, the sequence of the columns must be same as of the table in DB. Creating data table below, altough this data table may contain millions of records as per the requirement:
DataTable dt = new DataTable();
dt.Columns.Add(new DataColumn("id", typeof(Int32)));
dt.Columns.Add(new DataColumn("product_name", typeof(string)));
dt.Columns.Add(new DataColumn("product_code", typeof(string)));
dt.Columns.Add(new DataColumn("base_price", typeof(Int32)));
DataRow row = tbl.NewRow();
row["id"] = 1;
row["product_name"] = "Test Product Name";
row["product_code"] = "Test";
row["base_price"] = 100;
tbl.Rows.Add(row);
At last, below is the code for insert through bulk insert command in a single hit:
string connection = "Data Source=localhost;Initial Catalog=XXXXXXX;User Id = {userId}; Password = {pass}";
SqlConnection con = new SqlConnection(connection);
SqlBulkCopy bulkCopy = new SqlBulkCopy(con);
bulkCopy.DestinationTableName = "tblTest";
There is a limitation related to transaction for bulk insert. If we are inserting data into multiple tables and need transaction for the same, then we will need to explicitly handle transaction using the SqlTransaction class.
Relational Data / Hierarchical data
Now here comes the relational part. The main problem with relational data insert is that we need to insert data into multiple tables containing relations with foreign keys. So after inserting data into the base table, we will need to set foreign keys into the related data before inserting into DB. Also while inserting millions of records, we most likely need to maintain transactions also, in case something fails.
- Approach 1 – using SQL command class
SQL command will return the primary key of the inserted data, but setting that primary key into the foreign key of the related data, that too while maintaining transaction, is a very tedious task. So lets quickly move to the next solution.
- Approach 2 – Using Entity Framework
Entity framework is very comfortable to use in case of relational / hierarchical data. We need to just create hierarchical object and push it to the configured EF on the Database. but since we are talking about WRITE operation of millions of records, entity framework becomes a bit slow and also, there isn’t much we can do about it. EF is the best solution when most of the times, there is no high performance requirement. So lets quickly move to the best solution below.
- Approach 3 – Creating reference Ids for related data and inserting through Stored procedure
Lets take an example of below schema that we will be using to explain the same.
CREATE TABLE [dbo].[products](
[id] [bigint] IDENTITY(1,1) NOT NULL,
[product_name] [varchar](1000) NULL,
[product_code] [varchar](100) NULL,
[base_price] [numeric](18, 0) NULL
)
CREATE TABLE [dbo].[products_variants](
[id] [bigint] IDENTITY(1,1) NOT NULL,
[product_id] [bigint] NULL,
[variant_name] [varchar](1000) NULL
)
CREATE TABLE [dbo].[products_packages](
[id] [bigint] IDENTITY(1,1) NOT NULL,
[product_id] [bigint] NULL,
[variant_id] [bigint] NULL,
[width] [numeric](18, 0) NULL,
[height] [numeric](18, 0) NULL,
[length] [numeric](18, 0) NULL,
[weight] [numeric](18, 0) NULL
)
Now we have 3 tables, product_variants contains product_id as foreign key and product_packages contains 2 foreign keys – variant_id and product_id.
We will be creating 3 Data Tables and pass the same to a Stored procedure to insert. But since we do not have any relation between the rows of the 3 data tables (i.e. which product_variant belongs to which product) because the data is not inserted into the DB and there is no foreign key linkage yet. We will create a reference ID in the 3 tables and link the same using a unique GUID. This will make a relation between the entities on the server side. Below are the models:
class ProductsTableEntity
{
public string product_name { get; set; }
public long user_id { get; set; }
public string product_code { get; set; }
public decimal base_price { get; set; }
public string reference_id { get; set; }
}
class ProductsVariantsEntity
{
public long? id{ get; set; }
public long? product_id{ get; set; }
public string variant_name{ get; set; }
public string reference_id{ get; set; }
public string variant_reference_id { get; set; }
}
class ProductsPackagesEntity
{
public long? id { get; set; }
public long? product_id { get; set; }
public long? variant_id { get; set; }
public decimal? width { get; set; }
public decimal? height { get; set; }
public decimal? length { get; set; }
public decimal? weight { get; set; }
public string reference_id { get; set; }
public string variant_reference_id { get; set; }
}
In above models, we have created reference_id in product and product_variant for their linkage and variant_reference_id for linkage beween product_variant and product_package.
Below is an exmaple for the linkage so that out Store procedure will be able to identify between the same.
Dictionary<string, ProductsTableEntity> productsDictionary = new Dictionary<string, ProductsTableEntity>();
ProductsTableEntity productData = new ProductsTableEntity();
productData.product_code = "P_code1";
productData.product_name = "Lenovo Laptop";
productData.base_price = 100;
productData.reference_id = Guid.NewGuid().ToString();
productsDictionary.Add(product.product_code , productData);
Now will add the same reference id to the product_variant and similarly for product_packages
List<ProductsVariantsEntity> productVariantsList = new List< ProductsVariantsEntity>();
List<ProductsPackagesEntity> productPackagesList = new List<ProductsPackagesEntity>();
var prodVariant = new ProductsVariantsEntity();
prodVariant.variant_name = "8 GB RAM Variant";
prodVariant.product_id = productData?.product_id;
prodVariant.reference_id = productData.reference_id;
prodVariant.variant_reference_id = Guid.NewGuid().ToString();
productVariantsList.Add(prodVariant);
var prodPackage = new ProductsPackagesEntity();
prodPackage.length = 10;
prodPackage.height = 20;
prodPackage.width = 30;
prodPackage.refrence_id = productData.reference_id;
prodPackage.variant_reference_id = prodVariant.variant_reference_id;
productPackagesList.Add(prodPackage);
The above code could be in a loop to add millions of records as per the requirement.
Below is the data tables and Stored prodecure which will receive the dta tables and will insert the data:
Data Tables
CREATE TYPE [dbo].[ProductTable2] AS TABLE(
[product_name] [varchar](100) NULL,
[product_code] [varchar](100) NULL,
[base_price] [numeric](18, 0) NULL,
[reference_id] [nvarchar](100) NULL
)
CREATE TYPE [dbo].[ProductsVariantsTable] AS TABLE(
[id] [bigint] NULL,
[product_id] [bigint] NULL,
[variant_name] [varchar](1000) NULL,
[reference_id] [nvarchar](100) NULL,
[variant_reference_id] [nvarchar](100) NULL
)
CREATE TYPE [dbo].[ProductsPackagesTable] AS TABLE(
[id] [bigint] NULL,
[product_id] [bigint] NULL,
[variant_id] [bigint] NULL,
[width] [numeric](18, 2) NULL,
[height] [numeric](18, 2) NULL,
[length] [numeric](18, 2) NULL,
[weight] [numeric](18, 2) NULL,
[reference_id] [nvarchar](100) NULL,
[variant_reference_id] [nvarchar](100) NULL
)
Stored Procedure
CREATE PROCEDURE [dbo].[sp_insertProducts]
@ProductTable ProductTable READONLY ,
@ProductVariantTable [dbo].[ProductsVariantsTable] READONLY,
@ProductPackageTable [dbo].[ProductsPackagesTable] READONLY
AS
BEGIN
BEGIN TRY
BEGIN TRANSACTION
DECLARE @InsertedValuesProduct TABLE(ID BIGINT NOT NULL,
[UID] NVARCHAR(100) NULL )
DECLARE @InsertedValuesProductVariant TABLE(ID BIGINT NOT NULL,
[UID] NVARCHAR(100) NULL )
DECLARE @InsertedValuesProductPackage TABLE(ID BIGINT NOT NULL,
[UID] NVARCHAR(100) NULL )
-- --products
INSERT INTO [dbo].[products]( product_name, product_code
,[base_price])
Output Inserted.id, Inserted.reference_id into
@InsertedValuesProduct
SELECT product_name, diq_product_code,[base_price]
from @ProductTable PT
-- --prod variants
INSERT INTO [dbo].[products_variants]([product_id],
[variant_name], reference_id, variant_reference_id)
Output Inserted.id, Inserted.variant_reference_id into
@InsertedValuesProductVariant
SELECT ISNULL(product_id, IVP.ID),[variant_name]
reference_id,variant_reference_id
from @ProductVariantTable PVT left join @InsertedValuesProduct
IVP ON PVT.reference_id = IVP.[UID]
--prod package
INSERT INTO [dbo].[products_packages] ([product_id],[variant_id],
[width],[height],[length],[weight])
SELECT ISNULL([product_id], IVP.ID),ISNULL([variant_id], IVP.ID),
[width],[height],[length],[weight] FROM @ProductPackageTable PPT
left join @InsertedValuesProduct IVP ON PPT.reference_id = IVP.
[UID] left join @InsertedValuesProductVariant IVPV ON
PPT.variant_reference_id = IVPV.[UID]
commit TRANSACTION
END TRY
BEGIN CATCH
ROLLBACK TRAN
throw
END CATCH
END
The above stored procedure takes less than 1 second to insert a million of records. Please let us know in below comment box for suggestions.