Tuesday 10 May 2011

How to read a Flat File with data delimited by any wild card character like ‘#’, ‘$’ (SSIS)

Requirement:
Generally we deal with Flat Files which are comma (,) separated i.e. CSV files. And in SSIS by making use of ‘Flat File Source’ component we can read them very easily. These files can contain results/data from various sources like IBM, Mainframe, SAP etc. They may also be manually created or generated through some custom applications. And thus due to some specific requirements or limitations; they may have data which is not comma (,), bar (|), semi colon (;) etc delimited. These are among the few possible options available in Flat File connection manager used while reading the Flat Files in SSIS.
Apart from that there may be the requirement of having data containing comma (,) in their values itself. And thus we cannot just replace all the delimiters like dollar or # with a comma to have a comma separated file.
So to make it possible & read the records from a Flat File which can have any known separator for its column values, we need to do some kind of exercise explained below.
 Solution:
Since we don’t have any inbuilt option to select ‘#’ or ‘$’ etc as a delimiter for Flat File in connection manager, we would take the whole line/record into one column and then would split out all the values as per the present delimiter, using a Script Component.
Let’s take an example:
We have a .txt (Flat) file (say InputFile.txt) with three columns A, B and C. These headers are comma separated but the values under these columns are ‘#’ delimited.
Data into the file:
A,B,C
100#200#300
500#600#700
 Steps for the SSIS Package:
  1. Take a Data Flow Task (DFT).
  2. On the data flow tab, Drag one ‘Flat File Source’, edit it and click on ‘Flat File Connection Manager -> New’.
  • Name it as ‘SourceFileConnectionManager’.
  • Select the ‘Inputfile.txt’ in the File Name column by browsing the file.
  • Select Format as ‘delimited’.
  • Select ‘Header row delimiter’ as ‘Comma {,}’.
  • Select the check box for ‘Column names in the first row’.
1
  

  • Go to Column tab of this connection manager.
  • Select ‘{CR} {LF}’ as Row delimiter.
Screen would look like as:
2



  
As you can see all the data is now coming into a single column named ‘A,B,C’. 
  • Click OK.
  • Go to ‘Columns’ tab of the Flat File Source.
  • Rename the Output column as ‘A’.
3

 Click Ok.
 3. Drag a Script Component & select to use it as a Transformation while adding it.
  • Point the output of Flat File Source to this script component.
  • Edit its properties and select “A” as an input column there.
  • Go to ‘Input and Output’ tab and add three output columns (col1, col2 & col3) to the existing output ‘Output 0’. Select ‘four byte signed integer’ as their data type.
4



  • Modify the ‘SyncronizeInputId’ of Output0 to 0. This is to make the script component asyncronized.
5




  • Go to ‘Script’ tab and click on ‘Script Designer..’ to edit the script (VB.Net).
  • Select ‘Pre Compile’ as False.
  • Write the following script in the ‘Input0_ProcessInputRow’ event handler:
         Dim values() As String
        values = Row.A.Split(“#”.ToCharArray)
         Output0Buffer.AddRow()
        Output0Buffer.Col1 = Convert.ToInt32(values(0))
        Output0Buffer.Col2 = Convert.ToInt32(values(1))
        Output0Buffer.Col3 = Convert.ToInt32(values(2)) 
  • Close this and Click Ok.
  1. Take one Flat File Destination and with the following shown setting configure its connection manager (DestinationFlatFileConnection).
6


 
7


  • Go to Advanced tab and modify the name of the columns from col1, col2 & col3 to A, B & C.
  • Go to the ‘Mappings’ tab of the flat File Destination and do the below shown column mappings:
8


 
Now execute the package to have the following output in the ‘OutputFile.txt’ file.
 A,B,C
100,200,300
500,600,700

No comments:

Post a Comment