awk找到并替换

时间:2014-05-19 08:29:14

标签: bash awk

我已编辑过问题。 我有xml文件(FILE1),看起来像:

<Sector sectorNumber="1">
        <Cell cellNumber="1" cellCreated="YES" cellIdentity="" cellRange="35000" numberOfTxBranches="1" hsCodeResourceId="0"/>
        <Cell cellNumber="2" cellCreated="YES" cellIdentity="" cellRange="35000" numberOfTxBranches="1" hsCodeResourceId="0"/>
        <Cell cellNumber="3" cellCreated="YES" cellIdentity="" cellRange="35000" numberOfTxBranches="1" hsCodeResourceId="0"/>
      </Sector>
      <Sector sectorNumber="2">
        <Cell cellNumber="1" cellCreated="YES" cellIdentity="" cellRange="35000" numberOfTxBranches="1" hsCodeResourceId="0"/>
        <Cell cellNumber="2" cellCreated="YES" cellIdentity="" cellRange="35000" numberOfTxBranches="1" hsCodeResourceId="0"/>
        <Cell cellNumber="3" cellCreated="YES" cellIdentity="" cellRange="35000" numberOfTxBranches="1" hsCodeResourceId="0"/>
      </Sector>
      <Sector sectorNumber="3">
        <Cell cellNumber="1" cellCreated="YES" cellIdentity="" cellRange="35000" numberOfTxBranches="1" hsCodeResourceId="0"/>
        <Cell cellNumber="2" cellCreated="YES" cellIdentity="" cellRange="35000" numberOfTxBranches="1" hsCodeResourceId="0"/>
        <Cell cellNumber="3" cellCreated="YES" cellIdentity="" cellRange="35000" numberOfTxBranches="1" hsCodeResourceId="0"/>
      </Sector>

我有另一个文件(FILE2),其中包含此xml文件的输入数据:

Cell11="42921"
Cell12="42925"
Cell13="42928"
Cell21="42922"
Cell22="42926"
Cell23="42929"
Cell31="42923"
Cell32="42927"
Cell33="42920"

我想要做的是,按顺序分配FILE2中的所有cellIdentity=""值。所以看起来应该是这样的:

<Sector sectorNumber="1">
        <Cell cellNumber="1" cellCreated="YES" cellIdentity="42921" cellRange="35000" numberOfTxBranches="1" hsCodeResourceId="0"/>
        <Cell cellNumber="2" cellCreated="YES" cellIdentity="42925" cellRange="35000" numberOfTxBranches="1" hsCodeResourceId="0"/>
        <Cell cellNumber="3" cellCreated="YES" cellIdentity="42928" cellRange="35000" numberOfTxBranches="1" hsCodeResourceId="0"/>
      </Sector>
      <Sector sectorNumber="2">
        <Cell cellNumber="1" cellCreated="YES" cellIdentity="42922" cellRange="35000" numberOfTxBranches="1" hsCodeResourceId="0"/>
        <Cell cellNumber="2" cellCreated="YES" cellIdentity="42926" cellRange="35000" numberOfTxBranches="1" hsCodeResourceId="0"/>
        <Cell cellNumber="3" cellCreated="YES" cellIdentity="42929" cellRange="35000" numberOfTxBranches="1" hsCodeResourceId="0"/>
      </Sector>
      <Sector sectorNumber="3">
        <Cell cellNumber="1" cellCreated="YES" cellIdentity="42923" cellRange="35000" numberOfTxBranches="1" hsCodeResourceId="0"/>
        <Cell cellNumber="2" cellCreated="YES" cellIdentity="42927" cellRange="35000" numberOfTxBranches="1" hsCodeResourceId="0"/>
        <Cell cellNumber="3" cellCreated="YES" cellIdentity="42920" cellRange="35000" numberOfTxBranches="1" hsCodeResourceId="0"/>
      </Sector> 

我使用了代码:

 awk 'NR==FNR{FS="=";a[NR]=$2;next}/cell/{c++;FS=OFS;$4="cellIdentity="a[c];}1' FILE2 FILE1 

但我明白了:

<Sector sectorNumber="1"> 
        <Cell cellNumber "1" cellCreated "YES" cellIdentity cellIdentity= "35000" numberOfTxBranches "1"  hsCodeResourceId "0" />
<Cell cellNumber="2" cellCreated="YES" cellIdentity="42925" cellRange="35000" numberOfTxBranches="1" hsCodeResourceId="0" />
<Cell cellNumber="3" cellCreated="YES" cellIdentity="42928" cellRange="35000" numberOfTxBranches="1" hsCodeResourceId="0" />
</Sector>

<Sector sectorNumber="2"> 
<Cell cellNumber="1" cellCreated="YES" cellIdentity="42922" cellRange="35000" numberOfTxBranches="1" hsCodeResourceId="0" />  
<Cell cellNumber="2" cellCreated="YES" cellIdentity="42926" cellRange="35000" numberOfTxBranches="1" hsCodeResourceId="0" />
<Cell cellNumber="3" cellCreated="YES" cellIdentity="42929" cellRange="35000" numberOfTxBranches="1" hsCodeResourceId="0" />
</Sector>

<Sector sectorNumber="3">   
<Cell cellNumber="1" cellCreated="YES" cellIdentity="42923" cellRange="35000" numberOfTxBranches="1" hsCodeResourceId="0" />
<Cell cellNumber="2" cellCreated="YES" cellIdentity="42927" cellRange="35000" numberOfTxBranches="1" hsCodeResourceId="0" />
<Cell cellNumber="3" cellCreated="YES" cellIdentity="42920" cellRange="35000" numberOfTxBranches="1" hsCodeResourceId="0" />
</Sector>

所以你可以看到我在第一行有了Cellidentitny的问题

<Cell cellNumber "1" cellCreated "YES" cellIdentity cellIdentity= "35000" numberOfTxBranches "1"  hsCodeResourceId "0" />

它没有设置第一行,所有其他行都没问题,我也不知道为什么。

2 个答案:

答案 0 :(得分:1)

试试这个:

awk 'FNR==NR
     {FS="=";a[NR]=$2;next}
     /cell/{c++;FS=OFS;
            $4="cellIdentity="a[c];}1' file2 file1

下面测试:

> cat file1
<Sector sectorNumber="1">
<Cell cellNumber="1" cellCreated="YES" cellIdentity="" cellRange="35000" numberOfTxBranches="1" hsCodeResourceId="0"/>
<Cell cellNumber="2" cellCreated="YES" cellIdentity="" cellRange="35000" numberOfTxBranches="1" hsCodeResourceId="0"/>
<Cell cellNumber="3" cellCreated="YES" cellIdentity="" cellRange="35000" numberOfTxBranches="1" hsCodeResourceId="0"/>
</Sector>
<Sector sectorNumber="2">
<Cell cellNumber="1" cellCreated="YES" cellIdentity="" cellRange="35000" numberOfTxBranches="1" hsCodeResourceId="0"/>
<Cell cellNumber="2" cellCreated="YES" cellIdentity="" cellRange="35000" numberOfTxBranches="1" hsCodeResourceId="0"/>
<Cell cellNumber="3" cellCreated="YES" cellIdentity="" cellRange="35000" numberOfTxBranches="1" hsCodeResourceId="0"/>
</Sector>
<Sector sectorNumber="3">
<Cell cellNumber="1" cellCreated="YES" cellIdentity="" cellRange="35000" numberOfTxBranches="1" hsCodeResourceId="0"/>
<Cell cellNumber="2" cellCreated="YES" cellIdentity="" cellRange="35000" numberOfTxBranches="1" hsCodeResourceId="0"/>
<Cell cellNumber="3" cellCreated="YES" cellIdentity="" cellRange="35000" numberOfTxBranches="1" hsCodeResourceId="0"/>
</Sector>
> cat file2
Cell11="42921"
Cell12="42925"
Cell13="42928"
Cell21="42922"
Cell22="42926"
Cell23="42929"
Cell31="42923"
Cell32="42927"
Cell33="42920"
> awk 'FNR==NR{FS="=";a[NR]=$2;next}/cell/{c++;FS=OFS;$4="cellIdentity="a[c];}1' file2 file1
<Sector sectorNumber="1">
<Cell cellNumber="1" cellCreated="YES" cellIdentity="42921" cellRange="35000" numberOfTxBranches="1" hsCodeResourceId="0"/>
<Cell cellNumber="2" cellCreated="YES" cellIdentity="42925" cellRange="35000" numberOfTxBranches="1" hsCodeResourceId="0"/>
<Cell cellNumber="3" cellCreated="YES" cellIdentity="42928" cellRange="35000" numberOfTxBranches="1" hsCodeResourceId="0"/>
</Sector>
<Sector sectorNumber="2">
<Cell cellNumber="1" cellCreated="YES" cellIdentity="42922" cellRange="35000" numberOfTxBranches="1" hsCodeResourceId="0"/>
<Cell cellNumber="2" cellCreated="YES" cellIdentity="42926" cellRange="35000" numberOfTxBranches="1" hsCodeResourceId="0"/>
<Cell cellNumber="3" cellCreated="YES" cellIdentity="42929" cellRange="35000" numberOfTxBranches="1" hsCodeResourceId="0"/>
</Sector>
<Sector sectorNumber="3">
<Cell cellNumber="1" cellCreated="YES" cellIdentity="42923" cellRange="35000" numberOfTxBranches="1" hsCodeResourceId="0"/>
<Cell cellNumber="2" cellCreated="YES" cellIdentity="42927" cellRange="35000" numberOfTxBranches="1" hsCodeResourceId="0"/>
<Cell cellNumber="3" cellCreated="YES" cellIdentity="42920" cellRange="35000" numberOfTxBranches="1" hsCodeResourceId="0"/>
</Sector>
>

答案 1 :(得分:1)

我建议使用支持XML而不是awk和bash脚本的编程语言来执行此操作。

例如,Python。代码可能会稍微长一点,但作为交换,它不会轻易破坏您的XML文件。

因此,如果您想按照它们在文本文件中出现的顺序逐行分配ID:

import re
from xml.dom import minidom
from itertools import izip

sector_doc = minidom.parse('sectors.xml')
cells = sector_doc.getElementsByTagName('Cell')

with open('cells.txt', 'r') as cell_file:
    lines = cell_file.readlines()

for line, cell in izip(lines, cells):
    m = re.search('Cell\d+="([^"]+)"', line)
    if m: cell.setAttribute('cellIdentity', m.group(1))

with open('sectors_out.xml', 'wb') as out_file:
    sector_doc.writexml(out_file)