Question

这可能是一个哲学问题，但从速度和效率的角度来看，我想知道以下两个项目的不同之处。在PowerShell中，我有两个看起来像这样的对象：

 $ObjectA = @()
 1..10 | foreach-object{
     $obj = New-Object System.Object
     $obj | Add-Member -Type NoteProperty -Name index -Value $_
     $ObjectA += $obj
 }

 $ObjectB = @()
 5..15 | foreach-0bject{
     $obj = New-Object System.Object
     $obj | Add-Member -Type NoteProperty -Name index -Value $_
     $ObjectB += $obj
 }

现在，我想获得两者中存在的对象。我可以通过两种方式中的一种来做到这一点。

解决方案1：

 $ObjectA | foreach-object{
        $ind = $_
        $matching = $ObjectB | where {$_ -eq $ind}
        if (![string]::IsNullOrEmpty($matching)){
            ##do stuff with the match
        }
  }

解决方案2：

  $matches = Compare-Object $ObjectA $ObjectB -Property index | where {$_.SideIndicator -eq '=='} -PassThru
  $matches | foreach-object {
      ##do stuff with the matches.
  }

我的问题是，当我的对象数组变得非常大（30K +）时，哪一个从性能角度来看会更好？我不知道Compare-Object cmdlet如何在内部工作，所以我真的不知道。或者没关系？

提前致谢。

Answer 1

正如@Knows Not Much所指出的那样，Compare-Object通常提供比迭代集合和自己比较对象更好的性能。但另一个答案未能使用-ExcludeDifferent参数，而是迭代Compare-Object输出。这意味着对SideIndicator属性进行了许多无用的字符串比较。要获得最佳性能和更简单的代码，只需使用-IncludeEqual和-ExcludeDifferent：

$ObjectA = @()
1..10000 | %{
   $obj = New-Object System.Object
   $obj | Add-Member -Type NoteProperty -Name index -Value $_
   $ObjectA += $obj
}

$ObjectB = @()
1000..7000 | %{
   $obj = New-Object System.Object
   $obj | Add-Member -Type NoteProperty -Name index -Value $_
   $ObjectB += $obj
}

# Iterating over the result of Compare-Object takes 2.6 seconds.
Measure-Command { $matches_where_eq = Compare-Object $ObjectA $ObjectB -Property index -IncludeEqual | where {$_.SideIndicator -eq '=='} ; echo $matches_where_eq.count }

# Using -IncludeEqual and -ExcludeDifferent takes 2.1 seconds (80% of previous).
Measure-Command { $matches_ed_ie = Compare-Object $ObjectA $ObjectB -Property index -ExcludeDifferent -IncludeEqual; echo $matches_ed_ie.Count }

Answer 2

即使您采用大小为10000的数据集，您也可以轻松地看到比较对象更快。

我修改了你的代码，使其适用于powershell 3.0

cls
$ObjectA = @()
 1..10000 | %{
     $obj = New-Object System.Object
     $obj | Add-Member -Type NoteProperty -Name index -Value $_
     $ObjectA += $obj
 }

 $ObjectB = @()
 1000..7000 | %{
     $obj = New-Object System.Object
     $obj | Add-Member -Type NoteProperty -Name index -Value $_
     $ObjectB += $obj
 }

 Measure-Command {
  $count = 0
  $matches = Compare-Object $ObjectA $ObjectB -Property index -IncludeEqual | where {$_.SideIndicator -eq '=='}
}

echo $matches.length
echo $matches.Count

Measure-Command {
  $count = 0
  $ObjectA | %{
        $ind = $_
        $matching = $ObjectB | where {$_.Index -eq $ind.Index}
        if (![string]::IsNullOrEmpty($matching)){
            $count = $count + 1
        }
  }  
  echo $count
}

compare-object在不到5秒的时间内返回....但另一种方法永远陷入困境。

Answer 3

这将从一个数组中构建一个正则表达式搜索，然后对另一个数组执行正则表达式匹配。

解决方案3：

[regex]$RegMatch = '(' + (($ObjectA |foreach {[regex]::escape($_)}) –join "|") + ')'
$ObjectB -match $RegMatch

可能希望抛出一些逻辑来从较小的数据集构建正则表达式，然后针对它运行较大的集合以加快速度，但我很确定这将是最快的。

Where-Object或Compare-Object哪个更好？

3 个答案: