Question

例如，我想用三重双引号之间的文本解析python文件，并从这个文本制作html表。

文本块，例如

"""
Replaces greater than operator ('>') with 'NOT BETWEEN 0 AND #'
Replaces equals operator ('=') with 'BETWEEN # AND #'

Tested against:
    * Microsoft SQL Server 2005
    * MySQL 4, 5.0 and 5.5
    * Oracle 10g
    * PostgreSQL 8.3, 8.4, 9.0

Requirement:
    * Microsoft Access

Notes:
    * Useful to bypass weak and bespoke web application firewalls that
      filter the greater than character
    * The BETWEEN clause is SQL standard. Hence, this tamper script
      should work against all (?) databases

>>> tamper('1 AND A > B--')
'1 AND A NOT BETWEEN 0 AND B--'
>>> tamper('1 AND A = B--')
'1 AND A BETWEEN B AND B--'
"""

Html表必须是简单表包含5列

在"""和\n if new line is empty
在Tested against:和\n if new line is empty或Requirement:和\n if new line is empty
在Notes:和\n if new line is empty
在>>>和\n
在4 column end和\n

结果必须是：

替换大于运算符（＆＃39;＆gt;＆＃39;），而不是0和＃＆＃39; 用＆＃39; BETWEEN＃AND＃＆＃39;
- Microsoft SQL Server 2005
  - MySQL 4,5.0和5.5
  - Oracle 10g
  - PostgreSQL 8.3,8.4,9.0
  或
  - Microsoft Access
- 用于绕过弱的和定制的Web应用程序防火墙过滤大于字符
- BETWEEN子句是SQL标准。因此，这个篡改脚本应该适用于所有（？）数据库
篡改（＆＃39; 1和A＆gt; B - ＆＃39;）篡改（＆＃39; 1和A = B - ＆＃39;）
＆＃39; 1和不在0和B之间 - ＆＃39; ＆＃39; 1和B和B之间 - ＆＃39;

我可以使用哪种语法来提取它？我将使用VBScript.RegExp。

Set fso = CreateObject("Scripting.FileSystemObject")
txt = fso.OpenTextFile("C:\path\to\your.py").ReadAll

Set re = New RegExp
re.Pattern = """([^""]*)"""
re.Global = True

For Each m In re.Execute(txt)
  WScript.Echo m.SubMatches(0)
Next

Answer 1

你的问题非常广泛，所以我只是概述了解决这个问题的方法。否则，我必须为你编写整个脚本，这不会发生。

提取docquotes之间的所有内容。使用这样的正则表达式来提取docquotes之间的文本：
```
Set re1 = New RegExp
re1.Pattern = """""""([\s\S]*?)"""""""

For Each m In re1.Execute(txt)
  docstr = m.SubMatches(0)
Next
```
请注意，如果文件中包含多个docstring，并且希望处理所有文档字符串，则需要将re.Global设置为True。否则你只会得到第一场比赛。
使用第二个正则表达式删除前导和尾随空格：
```
Set re2 = New RegExp
re2.Pattern = "^\s*|\s*$"
re2.Global  = True  'find all matches

docstr = re2.Replace(docstr, "")
```
您不能使用Trim，因为该函数只处理空格而不处理其他空格。

将字符串拆分为2个连续的换行符以获取文档部分，或使用其他正则表达式来提取它们：

Set re3 = New RegExp
re3.Pattern = "([\s\S]*?)\r\n\r\n" +
              "Tested against:\r\n([\s\S]*?)\r\n\r\n" +
              ...

For Each m In re3.Execute(txt)
  descr  = m.SubMatches(0)
  tested = m.SubMatches(1)
  ...
Next

继续分解各个部分，直到您有要显示的元素。然后从这些元素构建HTML。

trible双引号和换行符之间的正则表达式提取

1 个答案: