如何逐句将Microsoft Word文档中的文本读入内存?

时间:2014-03-09 10:14:51

标签: c# ms-word add-in

我在C#中使用Microsoft Word加载项。我通过Selection函数从word文档中读取文本。但我不需要将所有文本都读入内存但我可以逐句将文本读入内存,因为我的word文档非常大。我知道,有Range功能,但这个功能可以分词。

2 个答案:

答案 0 :(得分:0)

此代码允许您从Word文档中读取每个段落。

我在提供的代码here

中进行了一些调整

还有this SO question使用来自mantascode链接的改编。

我真的不知道这是否会对你有帮助,因为

Word.Documents.Open()

已经将整个文件加载到内存中(对于大文件来说速度过慢)

读取doc一次并将结果存储在字符串中似乎是最快的。

using System;
using System.Globalization;

public class Program {
    private static void Main(string[] args) {
        var wordDocParagraphReader = new WordDocParagraphReader(@"E:\someDoc.docx");
        Console.WriteLine(wordDocParagraphReader.GetParagraph(0));
        Console.ReadLine();
        wordDocParagraphReader.Docs.Close();
        wordDocParagraphReader.Word.Quit();
    }
}

public class WordDocParagraphReader {
    public int ParagraphsCount { get; private set; }
    public Microsoft.Office.Interop.Word.Document Docs { get; private set; }
    public Microsoft.Office.Interop.Word.Application Word { get; private set; }


    public WordDocParagraphReader(object @path) {
        Word = new Microsoft.Office.Interop.Word.Application();
        object miss = System.Reflection.Missing.Value;
        object readOnly = true;
        Docs = Word.Documents.Open(ref path,
                                   ref miss,
                                   ref readOnly,
                                   ref miss,
                                   ref miss,
                                   ref miss,
                                   ref miss,
                                   ref miss,
                                   ref miss,
                                   ref miss,
                                   ref miss,
                                   ref miss,
                                   ref miss,
                                   ref miss,
                                   ref miss,
                                   ref miss);

        ParagraphsCount = Docs.Paragraphs.Count;
    }

    public string GetParagraph(int paragraphNumber) {
        if (paragraphNumber + 1 <= ParagraphsCount || paragraphNumber < 0) {
            return Docs.Paragraphs[paragraphNumber + 1].Range.Text.ToString(CultureInfo.InvariantCulture);
        }

        Console.WriteLine(String.Format("invalid paragraph requests {0} \n( the total paragraphs in file is {1})",
                                        paragraphNumber,
                                        ParagraphsCount));
        return string.Empty;
    }
}

答案 1 :(得分:0)

using word = Microsoft.Office.Interop.Word;    

word.Document worddoc = new word.Document();

for (int abc = 1; abc < worddoc.Sentences.Count; abc++)

{

MessageBox.Show("Sentence value "+worddoc.Sentences[abc].Text.ToString());

}

此代码会逐一为您提供所有句子

此代码适用于我,只需使用Word Interop创建并打开Word文档。