Question

使用PowerShell（我对编码相对较新），我试图获取一个包含26列的大型CSV文件，并尝试在某些字段具有重复数据时操纵数据...但保留所有如果该字段不重复，则为数据。

示例数据：

Name,DOB,Address,PhoneNo,FaveSport,FaveTeam,FavePlayer, 
Nick,1/1/01,123 4th,123-456-7890,,, 
Nick,1/1/01,,,Hockey,Red Wings,Lidstrom
Calvin,2/2/02,456 7th,555-867-5309,Football,Lions,Megatron
Mickey,3/3/03,999 Yankee Way,111-222-3333,,,
Mickey,3/3/03,,,Baseball,Yankees,Mantle

在上面的场景中，我想保留Nick的前四列和前三列的第二列，几乎是重复的一行。它总是以相同的方式，顶行有适当的前4列和第二行（如果有第二行 - 有时只有1像凯文，在这种情况下我们保持整行）有数据我们希望在最后3列。

因此，我们在完成后想要的数据是

Name,DOB,Address,PhoneNo,FaveSport,FaveTeam,FavePlayer,
Nick,1/1/01,123 4th,123-456-7890,Hockey,Red Wings,Lidstrom 
Calvin,2/2/02,456 7th,555-867-5309,Football,Lions,Megatron
Mickey,3/3/03,999 Yankee Way,111-222-3333,Baseball,Yankees,Mantle

我完全不知道如何将一行的前x列与另一行的x列进行比较以检查＆＃34;重复＆＃34;然后将第一行的前x个字段和第二行的最后x个字段写入新文档...

非常感谢任何帮助。试图成为我妻子的英雄，他现在必须通过在5k +行Excel文档上反复手动复制/粘贴来做到这一点。

Answer 1

您可以使用Hashtable存储第一行，然后如果出现另一个具有相同名称的行，则只复制具有实际值的列：

$Data = @'
Name,DOB,Address,PhoneNo,FaveSport,FaveTeam,FavePlayer, 
Nick,1/1/01,123 4th,123-456-7890,,, 
Nick,1/1/01,,,Hockey,Red Wings,Lidstrom
Calvin,2/2/02,456 7th,555-867-5309,Football,Lions,Megatron
Mickey,3/3/03,999 Yankee Way,111-222-3333,,,
Mickey,3/3/03,,,Baseball,Yankees,Mantle
'@|ConvertFrom-Csv

# Set up a hashtable to keep track of distinct player names
$Players = @{}

foreach($Row in $Data) {
    if(-not $Players.ContainsKey($Row.Name))
    {
        # First row with that player name
        $Players[$Row.Name] = $Row
    }
    else
    {
        # We've already read the first row for this guy
        foreach($Property in $Row.psobject.Properties)
        {
            # Check each property for whether it has a value
            if($Property.Value)
            {
                # Overwrite previous property value 
                $Players[$Row.Name]."$($Property.Name)" = $Property.Value
            }
        }
    }
}

# Print final results
$Players.Values |Format-Table

Answer 2

既然你只想要最后一列，并扩展@Mathias的伟大工作，你可以这样做：

$Data = @'
Name,DOB,Address,PhoneNo,FaveSport,FaveTeam,FavePlayer, 
Nick,1/1/01,123 4th,123-456-7890,,, 
Nick,1/1/01,,,Hockey,Red Wings,Lidstrom
Calvin,2/2/02,456 7th,555-867-5309,Football,Lions,Megatron
Mickey,3/3/03,999 Yankee Way,111-222-3333,,,
Mickey,3/3/03,,,Baseball,Yankees,Mantle
'@|ConvertFrom-Csv

# Set up a hashtable to keep track of distinct player names
$Players = @{}

# Make a named list of the columns you're wanting to keep from the second rows
$columns = @("FaveSport","FaveTeam","FavePlayer")

foreach($Row in $Data) {
    if(-not $Players.ContainsKey($Row.Name))
    {
        # First row with that player name
        $Players[$Row.Name] = $Row
    }
    else
    {
        # Check just the named columns that you want to keep the good values for
        foreach($item in $columns)
        {
            # Check each property for whether it has a value
            if (-not $Players[$Row.Name]."$($item)".Value){
                $Players[$Row.Name]."$($item)" = $Row.FavePlayer
            }
        }
    }
}

# Print final results
$Players.Values |Format-Table

基本上你只是检查并拉入你想要的列。

比较CSV中的两行，根据特定条件删除一行

2 个答案: