将部分制表符分隔的数据写入MySQL数据库

时间:2013-05-14 07:10:17

标签: mysql regex bash

我有一个包含7列的MySQL数据库(chrposnumiAiBiC,{ {1}})和一个包含4000万行的文件,每行包含一个数据集。每行有4个制表符分隔列,而前三列始终包含数据,第四列最多可包含三个由分号分隔的iD

key=value

列信息中的key = value对没有特定的顺序。我也不确定钥匙是否会出现两次(我希望不会)。

我想将数据写入数据库。前三列没有问题,但是从info-columns中提取值让我感到困惑,因为key = value对是无序的,并不是每个键都必须在行中。 对于类似的数据集(带有有序的信息 - 列),我使用了一个与正则表达式相关的java-Programm,它允许我(1)检查和(2)提取数据,但现在我被困了。

如何解决此任务,最好是使用bash脚本还是直接在MySQL中解决?

2 个答案:

答案 0 :(得分:2)

您没有准确提及您想要如何编写数据。但下面的awk示例显示了如何获取每行中的每个ID和密钥。而不是printf,您可以使用自己的逻辑来编写数据

[[bash_prompt$]]$ cat test.sh; echo "###########"; awk -f test.sh log
{
  if(length($4)) {
    split($4,array,";");
    print "In " $1, $2, $3;
    for(element in array) {
      key=substr(array[element],0,index(array[element],"="));
      value=substr(array[element],index(array[element],"=")+1);
      printf("found %s key and %s value for %d line from %s\n",key,value,NR,array[element]);
    }
  }
}
###########
In 1 10203 3
found iD= key and dskf12586 value for 1 line from iD=dskf12586
found iA= key and 0.34 value for 1 line from iA=0.34
found iB= key and nerv value for 1 line from iB=nerv
found iC= key and 45 value for 1 line from iC=45
In 1 10203 4
found iB= key and nerv value for 2 line from iB=nerv
found iA= key and 0.44 value for 2 line from iA=0.44
found iC= key and 45 value for 2 line from iC=45
found iD= key and dsf12586 value for 2 line from iD=dsf12586
In 1 10213 1
found iD= key and dskf12586 value for 4 line from iD=dskf12586
found iB= key and nerv value for 4 line from iB=nerv
found iC= key and 49 value for 4 line from iC=49
found iA= key and 0.14 value for 4 line from iA=0.14
In 1 10213 2
found iA= key and 0.34 value for 5 line from iA=0.34
found iB= key and nerv value for 5 line from iB=nerv
found iD= key and cap1486 value for 5 line from iD=cap1486
In 1 10225 1
found iD= key and dscf12586 value for 6 line from iD=dscf12586

答案 1 :(得分:2)

来自@abasu的awk解决方案,其插入也解决了无序的键值对。

parse.awk:

NR>1 {
  col["iA"]=col["iB"]=col["iC"]=col["iD"]="null";

  if(length($4)) {
    split($4,array,";");
    for(element in array) {
      split(array[element],keyval,"=");
      col[keyval[1]] = "'" keyval[2] "'";
    }
  }
  print "INSERT INTO tbl VALUES (" $1 "," $2 "," $3 "," col["iA"] "," col["iB"] "," col["iC"] "," col["iD"] ");";
}

测试/运行:

$ awk -f parse.awk file
INSERT INTO tbl VALUES (1,10203,3,'0.34','nerv','45','dskf12586');
INSERT INTO tbl VALUES (1,10203,4,'0.44','nerv','45','dsf12586');
INSERT INTO tbl VALUES (1,10203,5,null,null,null,null);
INSERT INTO tbl VALUES (1,10213,1,'0.14','nerv','49','dskf12586');
INSERT INTO tbl VALUES (1,10213,2,'0.34','nerv',null,'cap1486');
INSERT INTO tbl VALUES (1,10225,1,null,null,null,'dscf12586');