Use Ole DB Providers to talk to Excel 97-2003 Workbook

施工中……

工事中

好久,好久,好久没有写东西了,见谅。

Outline

  1. Jet and ACE

    1.1. connectionString

  2. XLS query string

    2.1. <> is not null

    2.2. carriage

    2.3. AS

    2.4. SheetName$A1:F(0)

    2.5. select ‘’2022”

    2.6. F1,F2

    2.7. LIKE %

  3. 运行速度/性能问题?

  4. C# DataTable

    4.1. LINQ Cartesian Product

    4.2. Merge DataTable with same column number but different column name

Use Ole DB Providers to talk to Excel 97-2003 Workbook

最近需要处理Excel 97-2003 Workbook(.xls文件),之前一直使用的ClosedXML并不支持这个格式的文件,只能另请高明了。😂

面向搜索引擎编程,搜索到了这篇《C#: Read and Write Excel (.xls and .xlsx) Files Content without Excel Automation (using NPOI and ADO.NET)》[2],文章推荐了Ole DB和Third Party Library两种方式,NPOI看起来有很多Dependency,笔者不是很喜欢。笔者又搜索了一下其他的第三方库,基本上是要收费的,告辞……

COM Interop嘛,想想就觉得不仅写起来麻烦,跑起来还慢,拜拜。

那就只剩Ole DB了,之前倒也不是没用过,不过当时处理的是XLSX格式的,当时代码也写完了,但是最后发现需要把Platform Target设定为x64才能运行(小问题,当时好像找到解决方法,需要额外安装x86的Driver,好麻烦不想装),而且运行一次查询后程序需要很久才会退出(大问题,因为需要查询很多很多次,没找到好的解决方法),于是就放弃了.

而这次需要处理的是XLS文件,笔者印象中之前研究Ole DB方法时好像运行过,还挺快的,这次便决定用Ole DB来做.

Dev Environment

  • Windows 10 Enterprise Version 21H2
  • Microsoft Office Professional Plus 2016 (x64)
  • Microsoft Visual Studio 2022 (x64)

附加限制:不希望装其他软件

本次需要的Provider之一Microsoft.Jet.OleDb.4.0是随Windows系统安装的,因此没有问题.

It is pre-installed on Windows 2000 and later.[2]

ConnectionString

网上搜到的很多文章都会写:

Normal ConnectionString: (work for xls files)

1
Provider=Microsoft.Jet.OLEDB.4.0;Data Source={XLS_FILENAME};Extended Properties="Excel 8.0;HDR=YES;"

XLSX文件需要使用Microsoft.ACE.OLEDB.12.0.反正笔者读完的反应是“XLS只能用Jet,XLSX只能用XLSX”……其实,Microsoft.ACE.OLEDB.12.0完全具备处理XLS文件的能力:

When you you have ACE drivers, there is no reason to use the old deprecated JET drivers—even for older versions of Microsoft Access and Excel. A common mistake I see, even with seasoned developers, is to drop to JET for .mdb and .xls files when you don’t need to. I have made this mistake myself past.[1]

因此我们完全可以使用以下连接字符串:

1
2


但是这里有几点需要注意:

  1. Microsoft.Jet.OleDb.4.0只支持32位,64位程序使用该Provider会出现错误(如下图):

    The ‘Microsoft.Jet.OLEDB.4.0’ provider is not registered on the machine.

The 'Microsoft.Jet.OLEDB.4.0' provider is not registered on the machine.

  1. Microsoft.ACE.OLEDB.12.0由多种软件包提供【待补充】,因此使用时需要注意系统中是否已安装相关软件包,及注意软件包的Bit-ness,否则还是有可能会出现provider is not registered on the machine的问题,下图便是x86程序使用64位的provider触发的错误(这台电脑的Microsoft.ACE.OLEDB.12.0是Office 2016(x64)提供的):

‘ACE.OLEDB.12.0’ not registered

在Powershell中使用下面的命令可以检查电脑中OLEDB支持的providers:

1
2


‘ACE.OLEDB.12.0’ not registered

可以看到这台电脑在x86的Powershell中列出了’Microsoft.Jet.OLEDB.4.0’,x64的Powershell中列出了Microsoft.ACE.OLEDB.12.0

由于笔者需要尽可能使程序在64位和32位下都能运行,因此笔者在程序中进行了判断,根据Bit-ness使用不同的连接字符串:

1
2


Query String

下面是几个关于Query String的语法,以供参考。

“不等于”的写法

In OleDbCommand using the Microsoft.Jet.OLEDB.4.0 provider, there are several ways to express “not equal to” in a SQL query:

  1. “<>” - This is the most commonly used operator for “not equal to” in SQL queries, including OleDbCommand.
  2. “IS NOT” - This operator can be used to check for null values in a SQL query, and is sometimes used in place of “<>” to check for non-null values.

———————————–修改自ChatGPT的回答。

以下为范例:

1
oleDbCommand.CommandText = "SELECT [应收],[名称/_个体],[陈述/说明] From [" +"应收$"+ "] where 序号 <> null";

ChatGPT其实还提到了另外两种不等于的表达方式:“!=”和“NOT =”,但是笔者这边试下来会引起Exception。

需要选中列的列名竟然是多行的?

笔者这次遇上了一张Excel表格,其中几列的列名中居然有换行……因为ChatGPT开着,就直接问ChatGPT了。

You can replace the line break in the column name with a space or underscore, and then reference the modified column name in the SQL query:

1
2
SELECT [Column_1]
FROM [Sheet1$]

———————————–摘自ChatGPT的回答。

经过验证,如下的列名(Column<换行>1)确实可以通过使用下划线[Column_1]匹配到:

1
2
3
//列名Column\n1与列名Column2
|Column|Column2|
| 1 | |

(后来想再尝试Google下这种情况怎么办,结果我都带上答案了也没搜到解决方案……)

AS

The AS command is used to rename a column or table with an alias.
An alias only exists for the duration of the query.

The following SQL statement creates two aliases, one for the CustomerID column and one for the CustomerName column:

1
2
SELECT CustomerID AS ID, CustomerName AS Customer
FROM Customers;

SheetName$A1:AA0 or SheetName$A1:AA

“[SheetName$A1:F0]” is the name of the worksheet and the range of cells you want to retrieve data from. In this case, it specifies cells A1 through AA0 in the worksheet named SheetName.

There is no AA0 cell in Excel. However, when you specify the range “A1:AA0”, the OleDb provider will interpret it as “A1:AA”, which means all the cells from A1 to AA(the last row) in the worksheet.
———————————–修改自ChatGPT的回答。

想给一列全指定为某个值

如果想要给查询结果添加全部为固定值的一列的话。可以尝试

1
2
select '学生' as 人员类型, ID
from StudentsTable

以上语句便会返回形如下表的结果:
|人员类型|ID|
|-|-|
|学生|123|
|学生|234|

2.6. F1,F2

2.7. LIKE %




DataTable dt1 = new DataTable("dt1");

DataTable dt2 = new DataTable("dt2");

dt1.Columns.Add("a");

dt1.Columns.Add("b");

dt2.Columns.Add("c");

dt2.Columns.Add("d");

dt2.Columns.Add("e");



dt1.Rows.Add("1", "2");

dt2.Rows.Add("3", "4", "5");

dt2.Rows.Add("6", "7", "8");



dt1.AcceptChanges();

dt2.AcceptChanges();



var query3 = from x in dt1.AsEnumerable()

             from y in dt2.AsEnumerable()

             select new

             {

                 x,

                 y

             };

    DataTable dt3 = new DataTable("dt3");

    dt3.Columns.Add("a");

    dt3.Columns.Add("b");

    dt3.Columns.Add("c");

    dt3.Columns.Add("d");

    dt3.Columns.Add("e");

    dt3.AcceptChanges();

    foreach (var r3 in query3)

    {

        dt3.Rows.Add(r3.x.ItemArray.Concat(r3.y.ItemArray).ToArray());

    }

    dt3.AcceptChanges();



    stbQuery.Append("SELECT F1,F9,F5,F17,F18,F20,F8 FROM [" + "Sheet1" + "$A6:AC0] WHERE F9 LIKE '%" + "02222333" + "'");

参考资料:

  1. Use ACE Drivers and PowerShell to Talk to Access and Excel
  2. C#: Read and Write Excel (.xls and .xlsx) Files Content without Excel Automation (using NPOI and ADO.NET)
  3. SQL AS Keyword