Wednesday, September 21, 2011

Reading a MS Word 2007 Document in .docx format using C#

If you want need to read (or write) from a MS Word 2007 Document that has been saved in the Open XML format (.docx) then you can use the Open XML SDK 2.0 for Microsoft Office to do just. The first thing you will need to do is download and install the SDK. In particular, you must download and install the OpenXMLSDKv2.msi. In addition, you can download the OpenXMLSDKTool.msi if you want. It has some VERY nice features like generating code from an existing .docx file.

Now that you have the files you need, open Visual Studio (2008 or 2010 works fine), open the project you want to use, and add a reference to the DocumentFormat.OpenXml (I had to browse to it in the adding references windows by going to C:\Program Files (x86)\Open XML SDK\V2.0\lib) and WindowsBase (mine was located in the list of .NET tab when adding references). Please note, this code does not require MS Word be installed and is safe to run on the server such as with ASP.NET.

Now that you have the api, the rest is just working with the document. To get a better understand on how to work with the parts (structure) of the Word Document, click here.  For a list of “How do I…” code samples, click here.

Here is example code on how to get the body of the document.

using DocumentFormat.OpenXml.Packaging;
using DocumentFormat.OpenXml.Wordprocessing;

using (WordprocessingDocument wordDocument = WordprocessingDocument.Open(Filename, false))
        var body = wordDocument.MainDocumentPart.Document.Body;


Here is an example of a more complex line of code that can be used to navigate the structure of the document using LINQ. In this case the document has a table in it and we are getting the first row and first cell of that row and the second element.


I hope this gives you an idea of how to get started. There are lots of good links, examples, etc here.

1 comment:

electronic signature software said...

This is something new which I haven't tried before reading your post. I am going to try this method. I want you to please share detailed description of how to do that. If possible then include screenshot for the same so that task becomes easy.