从字符串中提取多个值

时间:2018-09-11 15:00:45

标签: powershell select-string

我们正在使用这种方法来找到单个关键字

Get-Content $SourceFile | Select-String -Pattern "search keyword value"

但是,我们必须提取 4 个值,即嵌入的英镑(£)值(可变货币金额)和文字子字符串,如下所示:

# Sample input
$String =' in the case of a single acquisition the Total Purchase Price of which (less the amount
funded by Acceptable Funding Sources (Excluding Debt)) exceeds £5,000,000 (or its
equivalent) but is less than or equal to £10,000,000 or its equivalent, the Parent shall
supply to the Agent for the Lenders not later than the date a member of the Group
legally commits to make the relevant acquisition, a copy of any financial due diligence
reports obtained by the Group in relation to the Acquisition Target, on a non-reliance
basis (subject to the Agent and any other relevant Reliance Party signing any required
hold harmless letter) and a copy of the acquisition agreement under which the
Acquisition Target is to be acquired;'

# Values to extract

$Value1 = ' in the case of a single acquisition the Total Purchase Price '

$Value2 = ' £5,000,000'

$Value3 = ' £10,000,000'

$Value4 = ' a copy of any financial due diligence
reports obtained by the Group in relation to the Acquisition Target, on a non-reliance
basis (subject to the Agent and any other relevant Reliance Party signing any required
hold harmless letter) and a copy of the acquisition agreement under which the
Acquisition Target is to be acquired;'

1 个答案:

答案 0 :(得分:0)

# Define the regex patterns to search for indidvidually, as elements of an array.
$patterns = 
    # A string literal; escape it, to be safe.
    [regex]::Escape(' in the case of a single acquisition the Total Purchase Price '),     
    # A regex that matches a currency amount in pounds.
    # (Literal ' £', followed by at least one ('+') non-whitespace char. ('\S')
    # - this could be made more stringent by matching digits and commas only.)
    ' £\S+',     
    # A string literal that *needs* escaping due to use of '(' and ')'
    # Note the use of a literal here-string (@'<newline>...<newline>'@)
    [regex]::Escape(@'
a copy of any financial due diligence
reports obtained by the Group in relation to the Acquisition Target, on a non-reliance
basis (subject to the Agent and any other relevant Reliance Party signing any required
hold harmless letter) and a copy of the acquisition agreement under which the
Acquisition Target is to be acquired;
'@)

# - Use Get-Content -Raw to read the file *as a whole*
# - Use Select-String -AllMatches to find *multiple* matches (per input string)
# - ($patterns -join '|') joins the individual regexes with an alternation (|)
#   so that matches of any one of them are returned.
Get-Content -Raw $SourceFile | Select-String -AllMatches -Pattern ($patterns -join '|') |
  ForEach-Object {
    # Loop over the matches, each of which contains the captured substring
    # in index [0], and collect them in an *array*, $capturedSubstrings
    # Note: You could use `Set-Variable` to create individual variables $Variable1, ...
    #       but it's usually easier to work with an array.
    $capturedSubstrings = foreach ($match in $_.Matches) { $match[0].Value }
    # Output the array elements in diagnostic form.
    $capturedSubstrings | % { "[$_]" }
  }

请注意,-Pattern通常接受值的数组,因此使用-Pattern $patterns 应该可以工作(尽管行为略有不同) ,但是从PowerShell Core 6.1.0开始,并不是由于bug

注意事项:假设您的脚本使用与$SourceFile相同的换行样式(CRLF与仅LF);如果两者不同,则需要做更多的工作,这将作为最后一个模式(多行模式)不匹配而浮出水面。

使用包含上面$String内容的文件,将产生:

[ in the case of a single acquisition the Total Purchase Price ]
[ £5,000,000]
[ £10,000,000]
[a copy of any financial due diligence
reports obtained by the Group in relation to the Acquisition Target, on a non-reliance
basis (subject to the Agent and any other relevant Reliance Party signing any required
hold harmless letter) and a copy of the acquisition agreement under which the
Acquisition Target is to be acquired;]