如果current为null,则复制上一个字段记录

时间:2015-03-05 21:46:34

标签: csv sed

我有一个像这样的表的csv或html

Number Name ReportTo Time
11111 John Medical 0500
22222 Jane Dental 0700
                       Medical 1100
44444 Steve HR 0900
55555 Julie Training 0800
                       Records 1400
                       Business 1700
66666 David Medical 0800

我想找到一种方法来填充此表并删除所有空白字段。 该表应如下所示:

Number Name ReportTo Time
11111 John Medical 0500
22222 Jane Dental 0700
22222 Jane Medical 1100
44444 Steve HR 0900
55555 Julie Training 0800
55555 Julie Records 1400
55555 Julie Business 1700
66666 David Medical 0800

this类似,但是与sed和左起 感谢

1 个答案:

答案 0 :(得分:1)

在不知道更多关于格式的情况下,这个awk应该这样做:

awk 'NF == 4 { p1 = $1; p2 = $2; print } NF == 2 { print p1, p2, $1, $2 }' filename

那是:

NF == 4 {   # in a line with four fields
  p1 = $1   # remember the first two
  p2 = $2
  print     # print the line unchanged
}
NF == 2 {   # in a line with two fields
            # put the remembered fields before them.
  print p1, p2, $1, $2
}

请注意,这假定整个文件由包含两个或四个字段的行组成;不适合此模式的行将被静默删除。如果您的文件包含这样的行,那么我对您的处理方式并不是很明显。

如果你真的想用sed做,那么

sed '/^[[:space:]]/ { G; s/^[[:space:]]*\(.*\)\n\(.*\)/\2 \1/; p; d; }; h; s/[[:space:]]\+/\n/2; s/\n.*//; x' filename

有效,但有点复杂:

/^[[:space:]]/ {                        # if a line begins with spaces, we
                                        # assume that front tokens are missing
  G                                     # get the remembered tokens from the
                                        # hold buffer
  s/^[[:space:]]*\(.*\)\n\(.*\)/\2 \1/  # put them before the tokens in this
                                        # line
  p                                     # print
  d                                     # and we're done.
}
h                                       # otherwise: Hold the line
s/[[:space:]]\+/\n/2                    # replace the second whitespace
                                        # sequence with \n
s/\n.*//                                # then remove it and everything after
                                        # it. This isolates the two first
                                        # fields in the line.
x                                       # swap that with the saved full line,
                                        # so the first two fields are in the
                                        # hold buffer for later use.

                                        # Dropping off the end leads to the
                                        # default action (printing), so the
                                        # full line is printed unchanged.
相关问题