I have a batch process that imports data uploaded by vendors into a table for comparison and reporting. I created three SPs to handle the task but I could use some advice on making it more efficient. Here's the gist of the process (apologies in advance for the long post):
Vendor data can contain up to 10 fields of data, but are required to carry only 2 fields (part # and price).
The import process uses a tab delimited file with an ODBC Text connection to the file defined by a Schema.ini in the folder. The folder location is passed as a parameter to the SP:
Once the connection is made to the file, the second SP is called to import the data from the file; compare the columns; and insert the appropriate columns into a pre-existing table for proper formatting for later processing. The linked server name and the number of header rows are passed to the SP:
During the import shown above, the third SP is called to compare the columns of the imported data to the columns of the formatted table. The results are then formatted into a comma delimited string using the coalesce() function for use in the insert sql command string. The column comparison uses the sys tables to find the common names based on the two table names provided:
All of this executes in a batch during "off hours", so it doesn't really affect the system performance for the user. However, I would still appreciate any advice on optimization as I am still a bit of a novice.
- Glen
Know thy data.
Vendor data can contain up to 10 fields of data, but are required to carry only 2 fields (part # and price).
The import process uses a tab delimited file with an ODBC Text connection to the file defined by a Schema.ini in the folder. The folder location is passed as a parameter to the SP:
Code:
CREATE PROCEDURE get_catalog_data_sp (@path varchar(2000)) AS
declare @srv_name varchar(50)
set @srv_name = 'text_catalog'
EXEC sp_addlinkedserver @srv_name, 'Jet 4.0',
'Microsoft.Jet.OLEDB.4.0',
@path,
NULL,
'Text'
exec import_text_data_sp @srv_name, 1
exec sp_dropserver @srv_name, 'droplogins'
select * from data_import
GO
Once the connection is made to the file, the second SP is called to import the data from the file; compare the columns; and insert the appropriate columns into a pre-existing table for proper formatting for later processing. The linked server name and the number of header rows are passed to the SP:
Code:
-- Imports text catalog data into formated table for processing
CREATE PROCEDURE import_text_data_sp @server nvarchar(255), @skip_rows int
AS
begin
declare @sql_string varchar(2000)
declare @col_list varchar(1000)
-- Query the text file and create temp table
set @sql_string = 'select rownum = identity (int, 1, 1), * '
+ 'into import_temp '
+' from ' + @server +'...[Catalog#txt]'
exec (@sql_string)
-- Create index in the identity field
create unique clustered index import_IDX
on import_temp (rownum)
-- Get the common fields
select @col_list = coalesce(@col_List + ', ' ,'') + column_name
from get_common_columns_fn('data_import', 'import_temp')
-- Build the sql command for the data insert
set @sql_string = 'insert into data_import(' + @col_list
+ ') select ' + @col_list
+ ' from import_temp'
+ ' where rownum > ' + cast(@skip_rows as varchar(10))
-- Clear the import table and Insert the new data
delete data_import
exec (@sql_string)
-- Drop the temporary objects
drop index import_temp.import_IDX
drop table import_temp
end
GO
During the import shown above, the third SP is called to compare the columns of the imported data to the columns of the formatted table. The results are then formatted into a comma delimited string using the coalesce() function for use in the insert sql command string. The column comparison uses the sys tables to find the common names based on the two table names provided:
Code:
CREATE PROCEDURE get_common_columns_sp @Table1_name varchar(50), @Table2_name varchar(50)
AS
declare @Table1_Id int
declare @Table2_Id int
set @Table1_Id = (select id from sysobjects where name = @Table1_name)
set @Table2_Id = (select id from sysobjects where name = @Table2_name)
select name
from syscolumns
where id = @Table1_Id
and name in(select name from syscolumns where id = @Table2_Id)
GO
All of this executes in a batch during "off hours", so it doesn't really affect the system performance for the user. However, I would still appreciate any advice on optimization as I am still a bit of a novice.
- Glen
Know thy data.