BULK LOAD & OPENROWSET
The OPENROWSET function is tempting because the text file can be viewed through the select clause. It is a more difficult function to use because of the need for a format file.
SELECT a.*
FROM OPENROWSET( BULK 'C:\_stuff\2007_08\sample_web_log.txt',
FORMATFILE = 'C:\_stuff\2007_08\sql_bcp_format.fmt'
) AS a;
The simplest -- and most robust-- way to read a file is to copy it into a table using BULK INSERT.
create table [dbo].[log_file]( data varchar(max) null)
BULK INSERT [dbo].[log_file]
from 'C:\_stuff\2007_08\sample_web_log.txt'
with
(codepage = 'acp'
,datafiletype = 'char'
--,fieldterminator = ','
,firstrow = 1
--,lastrow = 3
,rowterminator = '\n')
--This minimalist bulk insert works for most files
BULK INSERT [dbo].[log_file]
from 'C:\_stuff\2007_08\sample_web_log.txt'
with (datafiletype = 'char',rowterminator = '\n')
Ounce the text data is loaded it is useful to have a numeric column to uniquely identify each row. I use "alter table" to add an identity column named "line_nr".
Alter table is a "logged" operation and may take significant time and I/O resources to complete. Performance would have been better if we added the identity column when creating the table. Unfortunately since the table may be loaded by parallel streams identity values may not match the physical order in which the rows where loaded.
By adding the column after the data is loaded the identity value matches the physical order of rows.
Note that ounce the clustered index has been added "BULK LOAD" operations will no longer be "non-logged". I/O for non-logged operations are significantly higher.
alter table log_file
add line_nr int not null identity(1,1)
-- create a clustered index
create unique clustered index pk__log_file on log_file(line_nr)