Monday 6 January 2014

How to Convert CSV to TSV files by Batch

A CSV (comma separated values) file is basically a text file containing structured data, where a comma is used to separate or delimit values in each row or record. A TSV (tab separated values) file is similar to a CSV file, except that it uses a tab to separate values.

Both CSVed and TablePad are free applications for manipulating delimitor separated values files. These programs let you open a CSV file and save it as a TSV file, thereby effectively converting from CSV to TSV. But the programs can only convert one file at a time, and currently cannot batch convert all the files contained in a folder in one go.

Until this feature becomes available to the programs, here is an alternative way of batch converting files from CSV to TSV:

Go to Jojoyee.myfil.es and double click CSV2TSV.exe to download the executable (230 KB), rename the file get_file to CSV2TSV.exe.

Or use the VBScript code following the steps below.

Copy the code (highlighted in blue) and paste it into Notepad.
Save the file with the vbs extension (e.g. csv2tsv.vbs)
Double-click on the vbs file to trigger the conversion process.

'This VBScript is for converting CSV files to TSV (tab delimited) files by batch.
'Copy and paste it in Notepad and save the file with vbs extension (e.g. CSV2TSV.vbs)
'Double click CSV2TSV.vbs to do the conversion.
'The converted files will be saved as *.tsv while the original files remain.

Option Explicit
Dim objWsShell, objFSO, objShellAp, objFolder, objFile, objFileTSV, objResult
Dim strPath, strLine, strNewLine, strNewFileName
Dim TotalFilesConverted, FileNameLength
Set objWsShell = CreateObject("WScript.Shell")
Set objFSO = CreateObject("scripting.filesystemobject")
Set objShellAp = CreateObject("Shell.Application")
Set objFolder = objShellAp.BrowseForFolder(0, "BATCH CONVERSION FROM CSV TO TSV FILES" _
& vbLf & vbLf & "Select a folder containing CSV files for the conversion.", 0, 17)
If objFolder is Nothing Then
msgBox "No folder is selected for conversion of CSV files."
WScript.Quit
Else
strPath = objFolder.Self.Path
objWsShell.CurrentDirectory = strPath
End if
TotalFilesConverted = 0
For Each objFile In objFSO.getfolder(strPath).Files
If UCase(Right(objFile.Name, 4)) = ".CSV" Then
objResult = objWsShell.Popup("Converting " & objFile.Name & " ...",3,"")
FileNameLength = Len(objFile.Name)-4
strNewFileName = Left(objFile.Name,FileNameLength) & ".tsv"
Set objFile = objFSO.OpenTextFile(objFile, 1)
Set objFileTSV = objFSO.CreateTextFile(strNewFileName)
Do Until objFile.AtEndOfStream
strLine = objFile.ReadLine
If instr(strLine,Chr(34)) =0 Then
strNewLine = Replace(strLine,",",vbTab)
Else
Call LineQuote(strNewLine)
End if
objFileTSV.WriteLine strNewLine
Loop
objFile.Close
TotalFilesConverted = TotalFilesConverted +1
objFileTSV.Close
End If
Next
If TotalFilesConverted =0 Then
MsgBox "No CSV files are found for conversion in the folder."
Else
MsgBox CStr(TotalFilesConverted) + " Files Converted from CSV to TSV."
End if

Sub LineQuote(strNewLine)
Dim LineLength, Linepos, blnQuote, Quotepos
LineLength = Len(strLine)
Linepos =1
strNewLine =""
blnQuote = False
Do While Linepos <= LineLength
Quotepos = instr(Mid(strLine,Linepos,LineLength-Linepos+1),Chr(34))
If Quotepos =1 Then
If Linepos < LineLength Then
If Mid(strLine,Linepos,2) = Chr(34) & Chr(34) and blnQuote Then
strNewLine = strNewLine & Chr(34)
Linepos = Linepos +2
Else 'one quote
If blnQuote Then
      blnQuote = False
Else
      blnQuote = True
End if
Linepos = Linepos +1
End if
Else 'last character
Linepos = Linepos +1
End if
Elseif Quotepos >1 Then
If blnQuote Then
strNewLine = strNewLine + Mid(strLine,Linepos,Quotepos-1)
Else 'not Quote
strNewLine = strNewLine + Replace(Mid(strLine,Linepos,Quotepos-1),",",vbTab)
End if
Linepos = Linepos +Quotepos -1
Else 'Quotepos =0
strNewLine = strNewLine + Replace(Mid(strLine,Linepos,LineLength-Linepos+1),",",vbTab)
Linepos = LineLength +1
End If
Loop
End Sub

What this code has considered for converting a csv file:

Fields are separated by commas in csv files. Each record is one line terminated by a linefeed or a carriage return. Fields with embeded commas are enclosed with double-quote characters. Fields with embeded double-quote characters are enclosed within double-quote characters, and each of the embedded double-quote characters is represented by a pair of double-quote characters. Fields with leading or trailing spaces are enclosed within double-quote characters. Fields with embeded line breaks have not been considered.

No comments:

Post a Comment