从C#中的Microsoft Word文件中读取内容

时间:2016-05-06 01:09:34

标签: c#

我试着在C#中读取.docx和.txt 来自ABC.docx的内容是:

测试1

的Test2

我的代码实际读取了ABC.docx,但有一个问题是当存储在sql server中的数据输出是这样的时候:

enter image description here

以下是我的代码:

 void WalkDirectoryTree(System.IO.DirectoryInfo root)
    {
        //System.IO.FileInfo[] files = null;

        System.IO.DirectoryInfo[] subDirs = null;

        //need to add-in more extension file such as .doc,  .ppt, .xlsx
        //files = root.GetFiles("*.txt");


        var files = root.GetFiles().Where(a => a.Extension.Contains(".docx") || a.Extension.Contains(".txt"));

        //  files = new string[] { "*.txt", "*.docx" }
        //.SelectMany(i => root.GetFiles(i, SearchOption.AllDirectories))
        //.ToArray();

        //if file is not null, read filename & file extension
        if (files != null)
        {
            foreach (System.IO.FileInfo fi in files)
            {
                StringBuilder text = new StringBuilder();
                Microsoft.Office.Interop.Word.Application word = new Microsoft.Office.Interop.Word.Application();
                object miss = System.Reflection.Missing.Value;
                //object path = @"I:\def.docx";
                object path = fi.FullName;
                object readOnly = true;
                Microsoft.Office.Interop.Word.Document docs = word.Documents.Open(ref path, ref miss, ref readOnly, ref miss, ref miss, ref miss, ref miss, ref miss, ref miss, ref miss, ref miss, ref miss, ref miss, ref miss, ref miss, ref miss);

                for (int i = 0; i < docs.Paragraphs.Count; i++)
                {
                    text.Append(" \r\n " + docs.Paragraphs[i + 1].Range.Text.ToString());
                }


                //Get the full patch of the file extension
                string[] lines = System.IO.File.ReadAllLines(fi.FullName);
                //TextReader reader = new FilterReader(fi.FullName);
                //StreamReader m = new StreamReader(fi.FullName);



                foreach (string line in lines)
                {

                    String[] substrings = fi.FullName.Split('\\');
                    string strFileName = string.Empty;
                    string strFileExtension = string.Empty;


                    if (substrings.Length > 0)
                    {
                        strFileName = substrings[ substrings.Length -1 ];
                        if( !string.IsNullOrEmpty(strFileName) )
                        {
                            string[] extensionSplit = strFileName.Split('.');
                            if (extensionSplit.Length > 0)
                            {
                                strFileExtension = extensionSplit[extensionSplit.Length - 1];
                            }
                        }
                    }
                    else
                    {
                        strFileName = fi.FullName;
                    }

                     InsertData(strFileName, line.Replace("'",""),  fi.FullName,strFileExtension);
                }
            }

            //After searched from root, continue search from subDirectories
            subDirs = root.GetDirectories();

            #region Exclude all the hidden files from drives
            foreach (System.IO.DirectoryInfo dirInfo in subDirs)
            {
                if ((dirInfo.Attributes & FileAttributes.Hidden) == 0)
                {
                    WalkDirectoryTree(dirInfo);
                }
            }
            #endregion
        }
    }

请建议如何在sql server内存储。 感谢。

1 个答案:

答案 0 :(得分:1)

将Word文档数据保存为数据库中的Base64字符串。

使用该文档的base64String,您不仅可以保存文档,还可以在稍后阶段打开它(通过将其转换回来)。

将此结果保存到数据库;

Public string GetDocumentBinary()
    {
        string docPath = "DocumentPath";
        byte[] binarydata = File.ReadAllBytes(docPath);
        base64 = System.Convert.ToBase64String(binarydata, 0, binarydata.Length);
        return base64;
    }

然后,当您需要显示文档时,将其转换回来将其保存到磁盘(可选);

Public void SaveBinaryAsDocument(string filePath, string base64String)
    {
        Byte[] bytes = Convert.FromBase64String(base64String);
        File.WriteAllBytes(filePath, bytes);
    }