替换字符串之间的空格

时间:2015-04-04 06:53:35

标签: file sed

我有一个像

这样的文件
>TCONS_00000066 +1
PPAAARTDLSPPQHVLHVYKRYGPPRQRRRPCPQTWWWQLPHRAAATHPRGEGPRASNPTRQQHFILVYNFSSFLSSWLSLSLLSSPFCYLYICDCHGNTEDEGPLMY*LVSSSLGAFVCKDFHLIDLLDLLFWIEAGYLHAVLHTILQSGRSDR*SRPKYRLTELSVCISVRTSSVINSKC*HN
>TCONS_00000066 +2
RRLLRAPTCHHPSTSSTYTSATVHRGSVDVLVRKHGGGSFLIEQQQLILEGKGPELLILHGNNTLYLCIISLRF*VHGYLCLSYLLPFAISIFVIAMEIQKTRGR*CIDL*VLVWGLSFARIFI*LIFLICYFGSKLATFMPCCIPYFSLVGQTDDRDRSID*PNFRFVYL*GQVLSSIQNVNII
>TCONS_00000066 +3
AGCCAHRLVTTPARPPRIQALRSTAAASTSLSANMVVAASSSSSSNSSSRGRAQSF*SYTATTLYTCV*FLFVSEFMAIFVSLIFSLLLSLYL*LPWKYRRRGAADVLTCEF*FGGFRLQGFSFD*SS*FVILDRSWLPSCRVAYHTSVWSVRPMIETEVSINRTFGLYICEDKFCHQFKMLT*
>TCONS_00000066 -1
YYVNILN**QNLSSQIYKPKVRLIDTSVSIIGLTDQTEVWYATRHEGSQLRSKITNQEDQSNENPCKRKPPN*NSQVNTSAAPRLLYFHGNHKYRDSKREKIRETKIAMNSETKRNYTQV*SVVAV*D*KLWALPLEDELLLLDEEAATTMFADKDVDAAAVDRSACIRGGRAGVVTSRCAQQPA
>TCONS_00000066 -2
IMLTF*IDDRTCPHRYTNRKFG*SILRSRSSV*PTRLKYGMQHGMKVASFDPK*QIKKINQMKILANESPQTRTHKSIHQRPLVFCISMAITNIEIAKGRR*ERQR*P*TQKRREIIHKYKVLLPCRIRSSGPFPSRMSCCCSMRKLPPPCLRTRTSTLPRWTVALVYVEDVLGW*QVGARSSRR
>TCONS_00000066 -3
LC*HFELMTELVLTDIQTESSVNRYFGLDHRSDRPD*SMVCNTA*R*PASIQNNKSRRSIK*KSLQTKAPKLELTSQYISGPSSSVFPWQSQI*R*QKGEDKRDKDSHELRNEEKLYTSIKCCCRVGLEALGPSPRG*VAAAR*GSCHHHVCGQGRRRCRGGP*RLYTWRTCWGGDKSVRAAAG
>TCONS_00000130 +1
LPARPRLQGALQRHRGGKPINQSINQWW*LGQLKTKKERSN*SSC*IVKWYAGEGGDSGSGGGGRGDGGGDGEPARRHHARRRPPRQELPLQVDEPVRANEEGWVQGSWHQAARHGTGRFLQRRAHPNRDHQFARTTA*NPLPNVHPSAGRAMEKKIKGKEEKMKSPCITN*FVMMQAAVRVRSSLIGSIR*ICFTKGATDRLSWLAVWVHIHTTQTQILTI*PFAKNIFTNEQLPKLISNLTLLLNAKSCGAEFRHLSAK*YGAECTLAR*LSLPSAVARHSAPADVALRCLSSAPHDLALSKKVRFEISFGSGSFVKLVFTKG*IVKICATQTHSQEDMNIK*SREGHGFSPGFVPFGCTCTEMIYVVGLTDTKEHM***MIFVLLCQSFTLVFLTCFLSSTVVLRIQ*PQLMRLKWILAN*AYSLIFWLMVIL
>TCONS_00000130 +2
FQLALAFRELCNGIAEVNQSTNQSINGGSWVNSKQRKKEAINHLVEL*NGMQAKVEIVVREGEVGETVVATVNQLAATTLVVGLHDKSFLYRSTNPYERMRRVGCRVLGIRQHATARDGSFNAELTQIETINLHVPPPKIPFPMFTLPLGVLWRKRSKAKKRK*SHHASQINL**CRLQCELGAH*LDQSDEFVLPKEQLTD*AG*LSGYIYTRHKHKF*QFNPLQKIFLQMNSYQNLFQI*PFCLTPNRVALNLDTSAPNSMALNVRWHADLVSHPPWHGIQRQLTWR*GV*VPRHMI*R*AKRSDLK*VLAAVHL*N*FLQRVKLSKFVRHKHTHKKT*TSSEAGRGTVSHLDLCHLVVLVQR*SMLLD*QTPRNTCSSK*FLFYFVKVLHLYS*PVSCLAQ*C*EFSNLS**D*NGYWPIKLIASSFGLWLYL
>TCONS_00000130 +3
SSSPSPSGSSATASRR*TNQPINQSMVVVGSTQNKERKKQLIILLNCEMVCRRRWR*WFGRGRSGRRWWRR*TSSPPPRSSSASTTRASSTGRRTRTSE*GGLGAGFLASGSTPRHGTVPSTPSSPKSRPSICTYHRLKSPSQCSPFRWACYGEKDQRQRRENEVTMHHKLICDDAGCSAS*ELTDWINPMNLFYQRSN*QIELASCLGTYTHDTNTNFDNLTLCKKYFYK*TATKTYFKSDPFA*RQIVWR*I*TPQRQIVWR*MYVGTLT*SPIRRGTAFSAS*RGAEVSKFRAT*FSVEQKGQI*NKFWQRFICKISFYKGLNCQNLCDTNTLTRRHEHQVKQGGARFLTWICAIWLYLYRDDLCCWIDRHQGTHVVVNDFCFTLSKFYTCIPDLFLV*HSSVKNSVTSVDEIKMDIGQLSL*PHLLAYGYTY
>TCONS_00000130 -1
ISITISQKMRL*A*LANIHFNLIN*GY*ILNTTVLDKKQVRNTSVKL*QSKTKIIYYYMCSLVSVNPTT*IISVQVQPNGTNPGEKPCPSLLHLMFMSSCECVCVAQILTI*PFVKTNFTNEPLPKLISNLTFLLNAKSCGAELRHLSATSAGAECRATADGRLSQRANVHSAPYYLALRCLNSAPHDLALSKRVRFEISFGSCSFVKIFFAKG*IVKICVCVVCICTQTASQLNLSVAPLVKQIHRIDPISELLTRTAACIITN*FVMHGDFIFSSLPLIFFSIARPAEG*TLGRGF*AVVRAN*WSRFG*ARR*RNRPVPWRAA*CQEPCTQPSSFARTGSSTCRGSSCRGGRRRAWWRRAGSPSPPPSPRPPPPEPLSPPSPAYHFTIQQDD*LLLSFFVLS*PNYHH*LIDWLIGLPPRCRCRAP*RRGRAG
>TCONS_00000130 -2

我想删除id行中字符串之间的空格。

新文件应该像

>TCONS_00000066_+1
PPAAARTDLSPPQHVLHVYKRYGPPRQRRRPCPQTWWWQLPHRAAATHPRGEGPRASNPTRQQHFILVYNFSSFLSSWLSLSLLSSPFCYLYICDCHGNTEDEGPLMY*LVSSSLGAFVCKDFHLIDLLDLLFWIEAGYLHAVLHTILQSGRSDR*SRPKYRLTELSVCISVRTSSVINSKC*HN
>TCONS_00000066_+2
RRLLRAPTCHHPSTSSTYTSATVHRGSVDVLVRKHGGGSFLIEQQQLILEGKGPELLILHGNNTLYLCIISLRF*VHGYLCLSYLLPFAISIFVIAMEIQKTRGR*CIDL*VLVWGLSFARIFI*LIFLICYFGSKLATFMPCCIPYFSLVGQTDDRDRSID*PNFRFVYL*GQVLSSIQNVNII
>TCONS_00000066_+3
AGCCAHRLVTTPARPPRIQALRSTAAASTSLSANMVVAASSSSSSNSSSRGRAQSF*SYTATTLYTCV*FLFVSEFMAIFVSLIFSLLLSLYL*LPWKYRRRGAADVLTCEF*FGGFRLQGFSFD*SS*FVILDRSWLPSCRVAYHTSVWSVRPMIETEVSINRTFGLYICEDKFCHQFKMLT*
>TCONS_00000066_-1
YYVNILN**QNLSSQIYKPKVRLIDTSVSIIGLTDQTEVWYATRHEGSQLRSKITNQEDQSNENPCKRKPPN*NSQVNTSAAPRLLYFHGNHKYRDSKREKIRETKIAMNSETKRNYTQV*SVVAV*D*KLWALPLEDELLLLDEEAATTMFADKDVDAAAVDRSACIRGGRAGVVTSRCAQQPA
>TCONS_00000066_-2
IMLTF*IDDRTCPHRYTNRKFG*SILRSRSSV*PTRLKYGMQHGMKVASFDPK*QIKKINQMKILANESPQTRTHKSIHQRPLVFCISMAITNIEIAKGRR*ERQR*P*TQKRREIIHKYKVLLPCRIRSSGPFPSRMSCCCSMRKLPPPCLRTRTSTLPRWTVALVYVEDVLGW*QVGARSSRR
>TCONS_00000066_-3
LC*HFELMTELVLTDIQTESSVNRYFGLDHRSDRPD*SMVCNTA*R*PASIQNNKSRRSIK*KSLQTKAPKLELTSQYISGPSSSVFPWQSQI*R*QKGEDKRDKDSHELRNEEKLYTSIKCCCRVGLEALGPSPRG*VAAAR*GSCHHHVCGQGRRRCRGGP*RLYTWRTCWGGDKSVRAAAG
>TCONS_00000130_+1
LPARPRLQGALQRHRGGKPINQSINQWW*LGQLKTKKERSN*SSC*IVKWYAGEGGDSGSGGGGRGDGGGDGEPARRHHARRRPPRQELPLQVDEPVRANEEGWVQGSWHQAARHGTGRFLQRRAHPNRDHQFARTTA*NPLPNVHPSAGRAMEKKIKGKEEKMKSPCITN*FVMMQAAVRVRSSLIGSIR*ICFTKGATDRLSWLAVWVHIHTTQTQILTI*PFAKNIFTNEQLPKLISNLTLLLNAKSCGAEFRHLSAK*YGAECTLAR*LSLPSAVARHSAPADVALRCLSSAPHDLALSKKVRFEISFGSGSFVKLVFTKG*IVKICATQTHSQEDMNIK*SREGHGFSPGFVPFGCTCTEMIYVVGLTDTKEHM***MIFVLLCQSFTLVFLTCFLSSTVVLRIQ*PQLMRLKWILAN*AYSLIFWLMVIL
>TCONS_00000130_+2
FQLALAFRELCNGIAEVNQSTNQSINGGSWVNSKQRKKEAINHLVEL*NGMQAKVEIVVREGEVGETVVATVNQLAATTLVVGLHDKSFLYRSTNPYERMRRVGCRVLGIRQHATARDGSFNAELTQIETINLHVPPPKIPFPMFTLPLGVLWRKRSKAKKRK*SHHASQINL**CRLQCELGAH*LDQSDEFVLPKEQLTD*AG*LSGYIYTRHKHKF*QFNPLQKIFLQMNSYQNLFQI*PFCLTPNRVALNLDTSAPNSMALNVRWHADLVSHPPWHGIQRQLTWR*GV*VPRHMI*R*AKRSDLK*VLAAVHL*N*FLQRVKLSKFVRHKHTHKKT*TSSEAGRGTVSHLDLCHLVVLVQR*SMLLD*QTPRNTCSSK*FLFYFVKVLHLYS*PVSCLAQ*C*EFSNLS**D*NGYWPIKLIASSFGLWLYL
>TCONS_00000130_+3
SSSPSPSGSSATASRR*TNQPINQSMVVVGSTQNKERKKQLIILLNCEMVCRRRWR*WFGRGRSGRRWWRR*TSSPPPRSSSASTTRASSTGRRTRTSE*GGLGAGFLASGSTPRHGTVPSTPSSPKSRPSICTYHRLKSPSQCSPFRWACYGEKDQRQRRENEVTMHHKLICDDAGCSAS*ELTDWINPMNLFYQRSN*QIELASCLGTYTHDTNTNFDNLTLCKKYFYK*TATKTYFKSDPFA*RQIVWR*I*TPQRQIVWR*MYVGTLT*SPIRRGTAFSAS*RGAEVSKFRAT*FSVEQKGQI*NKFWQRFICKISFYKGLNCQNLCDTNTLTRRHEHQVKQGGARFLTWICAIWLYLYRDDLCCWIDRHQGTHVVVNDFCFTLSKFYTCIPDLFLV*HSSVKNSVTSVDEIKMDIGQLSL*PHLLAYGYTY
>TCONS_00000130_-1
ISITISQKMRL*A*LANIHFNLIN*GY*ILNTTVLDKKQVRNTSVKL*QSKTKIIYYYMCSLVSVNPTT*IISVQVQPNGTNPGEKPCPSLLHLMFMSSCECVCVAQILTI*PFVKTNFTNEPLPKLISNLTFLLNAKSCGAELRHLSATSAGAECRATADGRLSQRANVHSAPYYLALRCLNSAPHDLALSKRVRFEISFGSCSFVKIFFAKG*IVKICVCVVCICTQTASQLNLSVAPLVKQIHRIDPISELLTRTAACIITN*FVMHGDFIFSSLPLIFFSIARPAEG*TLGRGF*AVVRAN*WSRFG*ARR*RNRPVPWRAA*CQEPCTQPSSFARTGSSTCRGSSCRGGRRRAWWRRAGSPSPPPSPRPPPPEPLSPPSPAYHFTIQQDD*LLLSFFVLS*PNYHH*LIDWLIGLPPRCRCRAP*RRGRAG
>TCONS_00000130_-2

我使用sedtr但未获得所需的输出。

2 个答案:

答案 0 :(得分:1)

使用tr,其确切目的是让其他人替换字符。

tr ' ' '_' < file

作为额外的奖励,您可以使用s选项挤出多个匹配项,如下所示:

tr -s ' ' '_' < file

其中有以下效果:

$ cat a
hello     world           this
is     a      sample       file
$ tr -s ' ' '_' < a
hello_world_this
is_a_sample_file

当然,如果要将更改保存在原始文件中,则必须将其输出到文件中并将其移回原始文件。

答案 1 :(得分:0)

好像你正试图用_符号替换空格。如果是,那么你可以考虑这个,

sed 's/[[:blank:]]\+/_/g' file

OR

sed 's/\(TCONS_[0-9]\{8\}\)[[:blank:]]\+/\1_/g' file

您需要捕获要保留的字符。因此,您要保留的字符为TCONS_ + 8digits。因此,在捕获组\(...\)内放置与此匹配的模式。并使用此[[:blank:]]\+模式匹配以下一个或多个空格。您必须要转义+,以便它会重复前一个令牌一次或多次,否则它会匹配文字+符号,因为基本的sed使用BRE Baisc正则表达式