比较CSV中的两行,根据特定条件删除一行

时间:2016-05-19 01:08:26

标签: csv powershell compare row

使用PowerShell(我对编码相对较新),我试图获取一个包含26列的大型CSV文件,并尝试在某些字段具有重复数据时操纵数据...但保留所有如果该字段不重复,则为数据。

示例数据:

Name,DOB,Address,PhoneNo,FaveSport,FaveTeam,FavePlayer, 
Nick,1/1/01,123 4th,123-456-7890,,, 
Nick,1/1/01,,,Hockey,Red Wings,Lidstrom
Calvin,2/2/02,456 7th,555-867-5309,Football,Lions,Megatron
Mickey,3/3/03,999 Yankee Way,111-222-3333,,,
Mickey,3/3/03,,,Baseball,Yankees,Mantle

在上面的场景中,我想保留Nick的前四列和前三列的第二列,几乎是重复的一行。它总是以相同的方式,顶行有适当的前4列和第二行(如果有第二行 - 有时只有1像凯文,在这种情况下我们保持整行)有数据我们希望在最后3列。

因此,我们在完成后想要的数据是

Name,DOB,Address,PhoneNo,FaveSport,FaveTeam,FavePlayer,
Nick,1/1/01,123 4th,123-456-7890,Hockey,Red Wings,Lidstrom 
Calvin,2/2/02,456 7th,555-867-5309,Football,Lions,Megatron
Mickey,3/3/03,999 Yankee Way,111-222-3333,Baseball,Yankees,Mantle

我完全不知道如何将一行的前x列与另一行的x列进行比较以检查"重复"然后将第一行的前x个字段和第二行的最后x个字段写入新文档...

非常感谢任何帮助。试图成为我妻子的英雄,他现在必须通过在5k +行Excel文档上反复手动复制/粘贴来做到这一点。

2 个答案:

答案 0 :(得分:2)

您可以使用Hashtable存储第一行,然后如果出现另一个具有相同名称的行,则只复制具有实际值的列:

$Data = @'
Name,DOB,Address,PhoneNo,FaveSport,FaveTeam,FavePlayer, 
Nick,1/1/01,123 4th,123-456-7890,,, 
Nick,1/1/01,,,Hockey,Red Wings,Lidstrom
Calvin,2/2/02,456 7th,555-867-5309,Football,Lions,Megatron
Mickey,3/3/03,999 Yankee Way,111-222-3333,,,
Mickey,3/3/03,,,Baseball,Yankees,Mantle
'@|ConvertFrom-Csv

# Set up a hashtable to keep track of distinct player names
$Players = @{}

foreach($Row in $Data) {
    if(-not $Players.ContainsKey($Row.Name))
    {
        # First row with that player name
        $Players[$Row.Name] = $Row
    }
    else
    {
        # We've already read the first row for this guy
        foreach($Property in $Row.psobject.Properties)
        {
            # Check each property for whether it has a value
            if($Property.Value)
            {
                # Overwrite previous property value 
                $Players[$Row.Name]."$($Property.Name)" = $Property.Value
            }
        }
    }
}

# Print final results
$Players.Values |Format-Table

答案 1 :(得分:0)

既然你只想要最后一列,并扩展@Mathias的伟大工作,你可以这样做:

$Data = @'
Name,DOB,Address,PhoneNo,FaveSport,FaveTeam,FavePlayer, 
Nick,1/1/01,123 4th,123-456-7890,,, 
Nick,1/1/01,,,Hockey,Red Wings,Lidstrom
Calvin,2/2/02,456 7th,555-867-5309,Football,Lions,Megatron
Mickey,3/3/03,999 Yankee Way,111-222-3333,,,
Mickey,3/3/03,,,Baseball,Yankees,Mantle
'@|ConvertFrom-Csv

# Set up a hashtable to keep track of distinct player names
$Players = @{}

# Make a named list of the columns you're wanting to keep from the second rows
$columns = @("FaveSport","FaveTeam","FavePlayer")

foreach($Row in $Data) {
    if(-not $Players.ContainsKey($Row.Name))
    {
        # First row with that player name
        $Players[$Row.Name] = $Row
    }
    else
    {
        # Check just the named columns that you want to keep the good values for
        foreach($item in $columns)
        {
            # Check each property for whether it has a value
            if (-not $Players[$Row.Name]."$($item)".Value){
                $Players[$Row.Name]."$($item)" = $Row.FavePlayer
            }
        }
    }
}

# Print final results
$Players.Values |Format-Table

基本上你只是检查并拉入你想要的列。

相关问题