数据附加问题

时间:2017-03-28 11:31:00

标签: python

我正在尝试使用 python 在现有字符串中添加单词组合。为了达到这个目的,我写了下面的代码。

import subprocess
from subprocess import Popen, PIPE

cat = subprocess.Popen(["hadoop", "fs", "-cat", "/user/cloudera/rank_t/*"], stdout=subprocess.PIPE)
dumpoff = Popen(["hadoop", "fs", "-put", "-", "/user/cloudera/DATA"],stdin=PIPE)
obrInd = "0"
line1 = ""
for line in cat.stdout:
    runnno= line.split('|')[0]
    code = line.split('|')[1]
    idval = line.split('|')[2]

    if (code == "OBR"):
        obrInd = runnno
    line =line + "|"+"OBR_"+obrInd  
    dumpoff.stdin.write(line)
    print(line)

我的示例数据:

1|ORC||4002C3|4002C3||||||20141231|||1962
2|OBR|1||4002C3|197 HP, RX 16/L|||20141|20141||||||||196248||RJ||3711028|||||F
3|OBX|1|ST|2263||NEGATIVE FOR INTRAEPITHELIAL L.||||||F|||20141231|RJ @#L
4|NTE|1|L|NEGATIVE FOR INTRAEPITHELIAL LESION AND .
5|OBX|2|ST|1158||NIL||||||F|||20141231|RJ@#L

预期输出:

1|ORC||4002C3|4002C3||||||20141231|||1962|
2|OBR|1||4002C3|197 HP, RX 16/L|||20141|20141||||||||196248||RJ||3711028|||||F|OBR_1
3|OBX|1|ST|2263||NEGATIVE FOR INTRAEPITHELIAL L.||||||F|||20141231|RJ @#L|OBR_1
4|NTE|1|L|NEGATIVE FOR INTRAEPITHELIAL LESION AND .|OBR_1
5|OBX|2|ST|1158||NIL||||||F|||20141231|RJ@#L|OBR_1

实际输出:

    1|ORC||4002C3|4002C3||||||20141231|||1962|
    2|OBR|1||4002C3|197 HP, RX 16/L|||20141|20141||||||||196248||RJ||3711028|||||F
    |OBR_1
    3|OBX|1|ST|2263||NEGATIVE FOR INTRAEPITHELIAL L.||||||F|||20141231|RJ @#L
    |OBR_1
    4|NTE|1|L|NEGATIVE FOR INTRAEPITHELIAL LESION AND .
    |OBR_1
    5|OBX|2|ST|1158||NIL||||||F|||20141231|RJ@#L
    |OBR_1

我想要附加的单词是在新行中追加,我希望它在同一行中追加。我做错了什么?

1 个答案:

答案 0 :(得分:4)

这是因为每个line最后都有一个\n。您可以使用.strip()删除字符串:

line = line.strip() + "|"+"OBR_"+obrInd  

line = line.strip('\n') + "|"+"OBR_"+obrInd  

如果你关心线的起点/终点处的空白区域。