UNIX:如何从右侧剪切列,其中一些并非所有字段的长度相同

时间:2016-02-11 23:35:40

标签: unix awk cut

我有一个数据列表,我需要从某些列中删除某些字符。

以下是清单:

(def static-scoped 21)
(def ^:dynamic dynamic-scoped 21)

(defn some-function []
  (println "static = " static-scoped)
  (println "dynamic = " dynamic-scoped))

(defn other-function []
  (binding [dynamic-scoped 42]
    (println "Established new binding in dynamic environment")
    (some-function)))

;; Trying to establish a new binding for the static-scoped
;; variable won t affect the function defined
;; above.
(let [static-scoped 42]
  (println "This binding won't affect the variable resolution")
  (other-function))

(println "calling some-function directly")
(some-function)

这里的问题是并非所有字段都是相同的大小。注意亚历山大格林(从底部第3位)没有中间的首字母。这使我无法在每列上统一使用awk。我的解决方案是剪切文件右侧的所有内容,以便字段分隔符不会弄乱所有内容。

那么如何使用cut命令从最右边的列开始并减少7列?

2 个答案:

答案 0 :(得分:1)

您可以使用剪切,因为您的数据具有固定宽度字段。

以下是我用ocr'd文本得到的内容:

$ cut -c 33-51,73-77 input
JR-II BISS CPSC BS 9445
SO-I  BISS CPSC BS 7993
JR-II BISS CPSC BS 0437
JR-I  BISS CPSC BS 2398
FR-II BISS CPSC BS 7149
JR-I  BISS CPSC BS OOOO
SO-II BISS CPSC BS 4354
JR-I  BISS CPSC BS 8268
FR-II BISS CPSC BS 1298
SO-I  BISS CPSC BS 0313
SO-II BISS CPSC BS ZOZI
SO-II BISS CPSC BS 0581

并匹配您在评论中写的要求:

  

我正在尝试做的就是获得第一个角色   使用JR,BISS,CPSC,INFO开始(从顶部条目开始)的列。   然后我需要右边电话号码的最后4位数字。

$ cut -c 32-33,38-39,43-44,48-49,64-64,73-77 input
 J B C B 9445
 S B C B 7993
 J B C B 0437
 J B C B 2398
 F B C B 7149
 J B C B OOOO
 S B C B 4354
 J B C B 8268
 F B C B 1298
 S B C B 0313
 S B C B ZOZI
 S B C B 0581

您需要调整实际数据的范围。

答案 1 :(得分:0)

以下内容符合我理解的要求,但我已使用选项卡作为输出的字段分隔符,以便您进行调整:

awk 'BEGIN {OFS="\t"} { 
   # Each line is assumed to have a variable number
   # of name fields plus 8 other tokens:
   nnames = NF-8;

   # from the right:
   tel=$NF; 
   subject2=$(NF-1); 
   subject1=$(NF-2);
   bs=$(NF-3); cpsc=$(NF-4); biss=$(NF-5); data=$(NF-6);

   name=$2;
   for (i=2; i<=nnames;i++) {name=name " " $(i+1)}

   # Adjustments
   data=substr(data,2); biss=substr(biss,2); cpsc=substr(cpsc,2); 
   subject1=substr(subject1,2)
   sub( /[^-]*-/,"", tel);

   print $1, name, data, biss, cpsc, bs, subject1 " " subject2, tel;
}'

输出:

JCG2380 GREEN, JULIE C  R-II    ISS PSC BS  NFO TECH    9445
JAG1936 GREEN, JOE A.   O-I ISS PSC BS  NFO TECH    7993
ACG4636 GREEN, ADAM C.  R-II    ISS PSC BS  OMP SCI 0437
SPG1696 GREEN, SEAN P.  R-I ISS PSC BS  OMP SCI 2398
SEG8835 GREEN, SHAWN E. R-II    ISS PSC BS  OMP SCI 7149
MCGo599 GREEN, MICHAEL C.   R-I ISS PSC BS  OMP SCI OOOO
GJG1887 GREEN, GREGORY J.   O-II    ISS PSC BS  NFO TECH    4354
NGG5479 GREEN, NICHOLAS G   R-I ISS PSC BS  NFO TECH    8268
ZTG7190 GREEN, ZACHARY T.   R-II    ISS PSC BS  NFO TECH    1298
AXG9097 GREEN, ALEXANDER    O-I ISS PSC BS  NFO TECH    0313
RJG6624 GREEN, ROBERT J.    O-II    ISS PSC BS  OMP SCI ZOZI
MWG1990 GREEN, MATTHEW W    O-II    ISS PSC BS  NFO TECH    0581
相关问题