Question

我想解析PDF文件，有复选框，单选按钮，下拉菜单和文本框。我想解析PDF文件以获取每个控件的位置和值。我正在使用C＃和itextsharp

任何建议/想法都会有所帮助。

到目前为止我做了什么REF：http://simpledotnetsolutions.wordpress.com/2012/04/08/itextsharp-few-c-examples/

   public void ReadPDFformDataPageWise(string inputFile)
    {
        PdfReader reader = new PdfReader(inputFile);
        AcroFields form = reader.AcroFields;
        try
        {
            for (int page = 1; page <= reader.NumberOfPages; page++)
            {
                foreach (string key in form.Fields.Keys)
                {
                    switch (form.GetFieldType(key))
                    {
                        case AcroFields.FIELD_TYPE_CHECKBOX:
                            //Create Checkbox
                        case AcroFields.FIELD_TYPE_COMBO:
                        //Create Combo Box
                        case AcroFields.FIELD_TYPE_LIST:
                        //Create List
                        case AcroFields.FIELD_TYPE_RADIOBUTTON:
                        //Create Radio button
                        case AcroFields.FIELD_TYPE_NONE:
                        case AcroFields.FIELD_TYPE_PUSHBUTTON:
                        //Create Submit Button
                        case AcroFields.FIELD_TYPE_SIGNATURE:
                        //Create Signature
                        case AcroFields.FIELD_TYPE_TEXT:
                            //Create textbox/Qs header
                            int fileType = form.GetFieldType(key);
                            string fieldValue = form.GetField(key);
                            float[] a = form.GetFieldPositions(key);
                            string translatedFileName = form.GetTranslatedFieldName(key);
                           AcroFields.Item test=  form.GetFieldItem(key);

                            break;
                    }
                }
            }
        }
        catch
        {
        }
        finally
        {
            reader.Close();
        }
    }

Answer 1

为什么指的是非官方的例子，而不是官方网站？

请参阅http://itextpdf.com/examples/iia.php?id=121，了解如何列出AcroForm中的所有字段，获取其名称和类型。如果您有Checkboxes或Radio字段，则需要获得外观状态，如同一个示例所示。

如果您想知道每个字段的页码和位置，您需要以下示例：http://itextpdf.com/examples/iia.php?id=163

查找获取FieldPosition实例的方法。

如果您想了解有关为该字段设置的标志（密码字段，多行，...）的更多信息，请查看此示例：http://itextpdf.com/examples/iia.php?id=237

如果你说：先生。 Lowagie，我会投票给你答案，因为你给我的Java示例，我只接受C＃答案！请转到：http://sourceforge.net/p/itext/code/HEAD/tree/book/

我们已投入将所有书籍示例移植到C＃。您所要做的就是沿着目录树向下找到相应的示例。例如：http://sourceforge.net/p/itext/code/HEAD/tree/book/src/part2/chapter06/FormInformation.java

如果您说：您的示例中没有一个可用，那么您的表单可能是XFA表单而不是AcroForm。在那种情况下，没有“场地的位置”这样的东西;使用XFA，PDF文件充当XML模板和数据集的容器。此XML即时呈现，每个字段的位置取决于数据集。如果您的问题是关于XFA，请重新考虑。

请注意，这个答案花了我钱，因为我分享了你应该从阅读the book I wrote获得的知识，而不是问一个问题，这个问题表明你没有做太多努力自己寻找解决方案; - ）

我没有回答这个问题，但我理解为什么其他人会这样做。

从pdf文件中获取表单元素

1 个答案: