Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Shaun E on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Comparing excel files 1

Status
Not open for further replies.

snuv

Programmer
Oct 30, 2001
751
GB
I have got a function to compare two xls files
it skips cells if a third spreadsheet has a marker in that cell.

It works ok but I need to make it quicker.
It seems that the delay is caused by reading from cells one at a time.

Any suggestions gratefully received

Code:
	Private Function comparefiles(ByVal testfile As String, ByVal samplefile As String, ByVal exclusionsFile As String) As Integer

		Dim RetVal As Integer = 0

		Dim testBook As Excel.Workbook = m_ExcelApp.Workbooks.Open(testfile)
		Dim sampleBook As Excel.Workbook = m_ExcelApp.Workbooks.Open(samplefile)
		Dim exclusionBook As Excel.Workbook = m_ExcelApp.Workbooks.Open(exclusionsFile)

		Dim testSheet As Excel.Worksheet = testBook.Sheets.Item(1)
		Dim sampleSheet As Excel.Worksheet = sampleBook.Sheets.Item(1)
		Dim exclusionSheet As Excel.Worksheet = exclusionBook.Sheets.Item(1)


		'for each column in the work book
		For i As Integer = 1 To MAX_COL
			'for each row in the work book
			For j As Integer = 1 To MAX_ROW
				'check whether to ignore this string
				Dim exclusionString As String = exclusionSheet.Cells(j, i).Value

				If exclusionString Is Nothing Then
					'get the values to compare
					Dim testString As String = testSheet.Cells(j, i).Value
					Dim sampleString As String = sampleSheet.Cells(j, i).Value

					If IsNumeric(testString) Then
						If IsNumeric(sampleString) Then
							If Not numericComparison(testString, sampleString) Then
								RetVal += 1
							End If
						Else
							'They are different
							RetVal += 1
						End If
					ElseIf Not StringComparison(testString, sampleString) Then
						'Do a string comparison
						RetVal += 1
					End If
				End If
			Next
		Next

		If RetVal = 0 Then
			Log(resultsPath, samplefile & ":- Match")
		Else
			Log(resultsPath, samplefile & " did not match" & testfile & " : Error count = " & RetVal)
		End If
		testBook.Close(Savechanges:=False)
		sampleBook.Close(Savechanges:=False)
		exclusionBook.Close(Savechanges:=False)
	End Function

Thanks
Snuv

"If it could have gone wrong earlier and it didn't, it ultimately would have been beneficial for it to have." : Murphy's Ultimate Corollary
 
Reading Excel cell by cell is extremly slow. have you thought about reading the spreadsheets into datatables, then comparing with the results into a different datatable that you can then write to a spreadsheet? I am currently working on a project reading in multiple spreadsheets at a time and the actual import/export is really fast, and spinning through a couple of datatables is fast as well. There are all kinds of threads on the forum on reading Excel into a datatable (even multiple sheets into multiple tables)

Chris
 
Cheers SprintFlunky
Iterating through datatables is much quicker

Any ideas why this code doesn't bring back the first row from the spreadsheet?

Code:
Private CONNECT_SKElETON As String = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source={0};Extended Properties=Excel 8.0;"
Private SQL_SKElETON As String = "select * from [{0}$]"

Private Function fillDataTable(ByVal Path As String) As DataTable
    Dim RetVal As New DataTable
    Dim cOLE As OleDbConnection
    Try
        Dim ConnectionString As String = String.Format(CONNECT_SKElETON, Path)
        Dim SQL As String = String.Format(SQL_SKElETON, getWorksheetName(Path))

        cOLE = New OleDbConnection(ConnectionString)
        Dim daOLE As New OleDbDataAdapter(SQL, cOLE)
        daOLE.Fill(RetVal)
    Catch ex As Exception
        Throw ex
    Finally
        cOLE.Close()
    End Try

    Return RetVal
End Function

Private Function getWorksheetName(ByVal filePath As String, Optional ByVal index As Integer = 1) As String
    Dim RetVal As String = ""
    Dim workbook As Excel.Workbook
    Try
        workbook = m_ExcelApp.Workbooks.Open(filePath)
        Dim worksheet As Excel.Worksheet = workbook.Worksheets(index)
        RetVal = worksheet.Name
    Catch ex As Exception
        Throw ex
    Finally
        workbook.Close()
    End Try

    Return RetVal
End Function

"If it could have gone wrong earlier and it didn't, it ultimately would have been beneficial for it to have." : Murphy's Ultimate Corollary
 
You need to tell the Connection that it has headers by using the HDR-YES text. Also notice the IMEX at the end - That forces Excel to treat everything as text. Since Excel only scans the first 8 rows to determine the column format, if you have numbers in the first 8 rows, then text later on, it will null the text values in your DataTable since it does not match the column format.

cn = New System.Data.OleDb.OleDbConnection("provider=Microsoft.Jet.OLEDB.4.0;" & _
"data source=" & path & "\" & file & ";Extended Properties=""Excel 8.0;HDR=YES;IMEX=1""")

That last little trick took me a long time to find.

Hope it helps.

Chris
 
Thanks Sprintflunky for your help

For anyone that's interested in the future - it still needs some work but the basics seem to be working well

Code:
#Region " error messages "
	Private DirectoryNotExist As String = "{0} folder ({1}) doesnt exist"
	Private FileNotExist As String = "file ({0}) doesnt exist"
	Private errorPath As String = "c:\arstest\Log.txt"
	Private resultsPath As String = "c:\arstest\results.txt"
	Private Const MISMATCH As String = "<{0}> did not match <{1}> : Error count = {2}"
#End Region

#Region " Constants "
	Private CONNECT_SKElETON As String = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source={0};Extended Properties=""Excel 8.0;HDR=NO;IMEX=1"""
	Private SQL_SKElETON As String = "select * from [{0}$]"
	Private Const DATESTRING_LENGTH As Integer = 19
	Private Const MAX_ROW As Integer = 8000
	Private Const MAX_COL As Integer = 256
	Private m_MaxDif As Double = 0.0000005

	Private Const TEST_RANGE_NAME As String = "TestRangeName"
#End Region


	Private Function compareFilesFaster(ByVal testfile As String, ByVal samplefile As String, ByVal exclusionsFile As String, ByVal ResultsFile As String) As Integer

		Dim RetVal As Integer = 0

		Try
			Dim dtTest As DataTable = fillDataTable(testfile)
			Dim dtSample As DataTable = fillDataTable(samplefile)
			Dim dtExclude As DataTable = fillDataTable(exclusionsFile)

			Dim xTestLimit As Integer = dtTest.Columns.Count - 1
			Dim yTestLimit As Integer = dtTest.Rows.Count

			Dim xSampleLimit As Integer = dtSample.Columns.Count - 1
			Dim ySampleLimit As Integer = dtSample.Rows.Count

			Dim xExcludeLimit As Integer = dtExclude.Columns.Count - 1
			Dim yExcludeLimit As Integer = dtExclude.Rows.Count - 1

			Dim xLimit As Integer = Math.Max(xTestLimit, xSampleLimit)
			Dim yLimit As Integer = Math.Max(yTestLimit, ySampleLimit)

			Dim strResult(xLimit, yLimit) As String

			For i As Integer = 0 To xLimit
				For j As Integer = 0 To yLimit - 1

					Dim test As String = Nothing
					Dim exclude As String = Nothing
					Dim sample As String = Nothing
					Try
						If i <= xTestLimit And j <= yTestLimit Then
							test = dtTest.Rows(j).Item(i).ToString
						End If
						If i <= xSampleLimit And j <= ySampleLimit Then
							sample = dtSample.Rows(j).Item(i).ToString
						End If
						If i <= xExcludeLimit And j <= yExcludeLimit Then
							exclude = dtExclude.Rows(j).Item(i).ToString
						End If
					Catch ex As Exception
						Throw ex
					End Try

					Try
						If Not skipCell(exclude) Then
							If IsNumeric(test) Then
								If IsNumeric(sample) Then
									If Not numericComparison(test, sample) Then
										strResult(j, i) = 1
										RetVal += 1
									Else
										strResult(j, i) = 0
									End If
								Else
									strResult(j, i) = 1
									'They are different
									RetVal += 1
								End If
							ElseIf Not StringComparison(test, sample) Then
								strResult(j, i) = 1
								RetVal += 1
							Else
								strResult(j, i) = 0
							End If
						End If
					Catch ex As Exception
						Throw ex
					End Try
				Next
			Next
			If RetVal = 0 Then
				Log(resultsPath, samplefile & ":- Match")
			Else
				Dim strMismatch As String = String.Format(MISMATCH, samplefile, testfile, RetVal)
				saveResults(strResult, ResultsFile)
				Log(resultsPath, strMismatch)
			End If

		Catch ex As Exception
			Throw ex
		Finally

		End Try

		Return RetVal

	End Function

	Private Function fillDataTable(ByVal Path As String) As DataTable
		Dim RetVal As New DataTable
		Dim cOLE As OleDbConnection
		Try
			Dim ConnectionString As String = String.Format(CONNECT_SKElETON, Path)
			Dim SQL As String = String.Format(SQL_SKElETON, getWorksheetName(Path))

			cOLE = New OleDbConnection(ConnectionString)
			Dim daOLE As New OleDbDataAdapter(SQL, cOLE)
			daOLE.Fill(RetVal)

		Catch ex As Exception
			Throw ex
		Finally
			cOLE.Close()
		End Try

		Return RetVal
	End Function

	Private Function getWorksheetName(ByVal filePath As String, Optional ByVal index As Integer = 1) As String
		Dim RetVal As String = ""
		Dim workbook As Excel.Workbook
		Try
			workbook = m_ExcelApp.Workbooks.Open(filePath)
			Dim worksheet As Excel.Worksheet = workbook.Worksheets(index)
			RetVal = worksheet.Name
		Catch ex As Exception
			Throw ex
		Finally
			workbook.Close()
		End Try

		Return RetVal
	End Function

	Private Sub saveResults(ByRef OutputArray(,) As String, ByVal resultspath As String)

		Dim Book As Excel.Workbook = m_ExcelApp.Workbooks.Add
		Dim sheet As Excel.Worksheet = Book.Worksheets.Add()

		Try
			Dim x As Integer = OutputArray.GetUpperBound(0) + 1
			Dim y As Integer = OutputArray.GetUpperBound(y) + 1
			Dim Range As Excel.Range = sheet.Range("A1", sheet.Cells(x, y))
			Range.Value2 = OutputArray
		Catch ex As Exception
			Throw ex
		End Try

		Book.SaveAs(resultspath)
		Book.Close()

	End Sub

	Private Sub Log(ByVal path As String, ByVal output As String)

		Dim sw As StreamWriter = File.AppendText(path)
		sw.WriteLine(output)
		sw.Flush()
		sw.Close()
	End Sub

"If it could have gone wrong earlier and it didn't, it ultimately would have been beneficial for it to have." : Murphy's Ultimate Corollary
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top