Question

尝试将CSV文件转换为JSON

以下是两个示例行：

-21.3214077;55.4851413;Ruizia cordata
-21.3213078;55.4849803;Cossinia pinnata

我想得到类似的东西：

"occurrences": [
                 {
                "position": [-21.3214077, 55.4851413],
                "taxo": {
                    "espece": "Ruizia cordata"
                 },
                 ...
             }]

这是我的剧本：

    echo '"occurences": [ '

cat se.csv | while read -r line
  do
      IFS=';' read -r -a array <<< $line;
      echo -n -e '{ "position": [' ${array[0]}
      echo -n -e ',' ${array[1]} ']'
      echo -e ', "taxo": {"espece":"' ${array[2]} '"'
done
echo "]";

我得到了非常奇怪的结果：

   "occurences": [ 
 ""position": [ -21.3214077, 55.4851413 ], "taxo": {"espece":" Ruizia cordata
 ""position": [ -21.3213078, 55.4849803 ], "taxo": {"espece":" Cossinia pinnata

我的代码出了什么问题？

Answer 1

这项工作的正确工具是jq。

jq -Rsn '
  {"occurrences":
    [inputs
     | . / "\n"
     | (.[] | select(length > 0) | . / ";") as $input
     | {"position": [$input[0], $input[1]], "taxo": {"espece": $input[2]}}]}
' <se.csv

根据您的意见，

发出：

{
  "occurences": [
    {
      "position": [
        "-21.3214077",
        "55.4851413"
      ],
      "taxo": {
        "espece": "Ruizia cordata"
      }
    },
    {
      "position": [
        "-21.3213078",
        "55.4849803"
      ],
      "taxo": {
        "espece": "Cossinia pinnata"
      }
    }
  ]
}

顺便说一下，原始脚本的错误版本可能如下：

#!/usr/bin/env bash

items=( )
while IFS=';' read -r lat long pos _; do
  printf -v item '{ "position": [%s, %s], "taxo": {"espece": "%s"}}' "$lat" "$long" "$pos"
  items+=( "$item" )
done <se.csv

IFS=','
printf '{"occurrences": [%s]}\n' "${items[*]}"

注意：

使用cat管道进入循环（和good reasons not to）绝对没有意义;因此，我们使用重定向（<）直接打开文件作为循环的标准输入。
read可以传递目标变量列表;因此，不需要读入数组（或第一个读入字符串，然后生成一个heresting并从中读取到一个数组中）。最后的_确保丢弃额外的列（通过将它们放入名为_的虚拟变量中）而不是附加到pos。
"${array[*]}"通过将array的元素与IFS中的字符连接来生成字符串;因此，我们可以使用它来确保只有在需要时才会在输出中显示逗号。

printf

echo

the specification for echo itself

由于它通过字符串连接生成JSON，因此它本身仍然存在错误。不要使用它。

Answer 2

以下是有关该主题的文章：https://infiniteundo.com/post/99336704013/convert-csv-to-json-with-jq

它也使用JQ，但是使用split()和map()的方法却有些不同。

jq --slurp --raw-input \
   'split("\n") | .[1:] | map(split(";")) |
      map({
         "position": [.[0], .[1]],
         "taxo": {
             "espece": .[2]
          }
      })' \
  input.csv > output.json

但是，它不能处理分隔符转义。

Answer 3

由于jq解决方案无法处理CSV转义，第一行的列名，注释掉的行以及其他常见的CSV“功能”，因此我扩展了CSV Cruncher工具以允许读取CSV并将其编写为JSON。它不完全是“重击”，但jq也不是：）

这主要是一个CSV-as-SQL处理应用程序，因此并不完全无关紧要，但这是窍门：

./crunch -in myfile.csv -out output.csv --json -sql 'SELECT * FROM myfile'

它还允许输出为每行 JSON对象或正确的JSON数组。请参阅文档。

它处于Beta版质量，因此欢迎所有反馈或请求。

Answer 4

通常，如果jq具有inputs内置过滤器（从jq 1.5开始可用），则最好使用它而不是-s命令行选项。

无论如何这里都是使用inputs的解决方案。此解决方案也是无变量的。

{"occurrences":
  [inputs
   | select(length > 0)
   | . / ";"
   | {"position": [.[0], .[1]], 
      "taxo": {"espece": .[2]}} ]}

SSV，CSV以及所有

以上内容当然假设文件的每一行都有分号分隔的字段，并且没有与CSV文件相关的复杂性。

如果输入具有严格由单个字符分隔的字段，则jq处理该字段应该没有问题。否则，最好使用可以可靠地转换为jq可以直接处理的TSV（制表符分隔值）格式的工具。

Answer 5

如果您喜欢js，可以使用smk（https://www.npmjs.com/package/smk）

npm install -g smk

cat t1.txt | smk -a -f"(a) => JSON.stringify({\
     'occurrences': a.map((row) => row.split(';')).map((arr) => ({\
         'position': [parseFloat(arr[0]), parseFloat(arr[1])],
         'taxo': {\
             'espece': arr[2]\
         }\
     }))\
 }, null, 4)";

Answer 6

出于完整性考虑，Xidel与一些XQuery魔术也可以做到这一点：

<head>
  <title>Inputs and forms</title>
</head>

<body>
  <form>
    <span>This text changes color</span>
    <div>
      <input type="text" placeholder="Type your input here">
    </div>
    <div>
      <input type="button" value="Add Color" onclick="changeColor()">
    </div>
  </form>
</body>

xidel -s input.csv --xquery '
  {
    "occurrences":for $x in tokenize($raw,"\n") let $a:=tokenize($x,";") return {
      "position":[
        $a[1],
        $a[2]
      ],
      "taxo":{
        "espece":$a[3]
      }
    }
  }
'

Answer 7

接受的答案使用jq来解析输入。可以，但是jq无法处理转义，即使用Excel或类似工具生成的CSV输入的引用如下：

foo,"bar,baz",gaz

将导致错误的输出，因为jq将看到4个字段，而不是3个字段。

一种选择是使用制表符分隔的值而不是逗号（只要您的输入数据不包含制表符！），以及可接受的答案。

另一种选择是组合您的工具，并对每个部分使用最好的工具：一个CSV解析器，用于读取输入并将其转换为JSON，以及jq，用于将JSON转换为目标格式。

基于python的csvkit将智能地解析CSV，并带有工具csvjson，它将把CSV转换成JSON会做得更好。然后可以将其通过jq进行管道传输，以将csvkit的平面JSON输出转换为目标形式。

使用OP提供的数据，即可获得所需的输出，就像这样简单：

csvjson --no-header-row  |
  jq '.[] | {occurrences: [{ position: [.a, .b], taxo: {espece: .c}}]}'

请注意，csvjson自动将;检测为定界符，并且在输入中没有标题行时，会将json键分配为a，b和c。

Answer 8

如果您想发疯，可以使用jq编写解析器。这是我的实现，可以看作是@csv过滤器的逆函数。将此扔到您的.jq文件中。

def do_if(pred; update):
    if pred then update else . end;
def _parse_delimited($_delim; $_quot; $_nl; $_skip):
    [($_delim, $_quot, $_nl, $_skip)|explode[]] as [$delim, $quot, $nl, $skip] |
    [0,1,2,3,4,5] as [$s_start,$s_next_value,$s_read_value,$s_read_quoted,$s_escape,$s_final] |
    def _append($arr; $value):
        $arr + [$value];
    def _do_start($c):
        if $c == $nl then
            [$s_start, null, null, _append(.[3]; [""])]
        elif $c == $delim then
            [$s_next_value, null, [""], .[3]]
        elif $c == $quot then
            [$s_read_quoted, [], [], .[3]]
        else
            [$s_read_value, [$c], [], .[3]]
        end;
    def _do_next_value($c):
        if $c == $nl then
            [$s_start, null, null, _append(.[3]; _append(.[2]; ""))]
        elif $c == $delim then
            [$s_next_value, null, _append(.[2]; ""), .[3]]
        elif $c == $quot then
            [$s_read_quoted, [], .[2], .[3]]
        else
            [$s_read_value, [$c], .[2], .[3]]
        end;
    def _do_read_value($c):
        if $c == $nl then
            [$s_start, null, null, _append(.[3]; _append(.[2]; .[1]|implode))]
        elif $c == $delim then
            [$s_next_value, null, _append(.[2]; .[1]|implode), .[3]]
        else
            [$s_read_value, _append(.[1]; $c), .[2], .[3]]
        end;
    def _do_read_quoted($c):
        if $c == $quot then
            [$s_escape, .[1], .[2], .[3]]
        else
            [$s_read_quoted, _append(.[1]; $c), .[2], .[3]]
        end;
    def _do_escape($c):
        if $c == $nl then
            [$s_start, null, null, _append(.[3]; _append(.[2]; .[1]|implode))]
        elif $c == $delim then
            [$s_next_value, null, _append(.[2]; .[1]|implode), .[3]]
        else
            [$s_read_quoted, _append(.[1]; $c), .[2], .[3]]
        end;
    def _do_final($c):
        .;
    def _do_finalize:
        if .[0] == $s_start then
            [$s_final, null, null, .[3]]
        elif .[0] == $s_next_value then
            [$s_final, null, null, _append(.[3]; [""])]
        elif .[0] == $s_read_value then
            [$s_final, null, null, _append(.[3]; _append(.[2]; .[1]|implode))]
        elif .[0] == $s_read_quoted then
            [$s_final, null, null, _append(.[3]; _append(.[2]; .[1]|implode))]
        elif .[0] == $s_escape then
            [$s_final, null, null, _append(.[3]; _append(.[2]; .[1]|implode))]
        else # .[0] == $s_final
            .
        end;
    reduce explode[] as $c (
        [$s_start,null,null,[]];
        do_if($c != $skip;
            if .[0] == $s_start then
                _do_start($c)
            elif .[0] == $s_next_value then
                _do_next_value($c)
            elif .[0] == $s_read_value then
                _do_read_value($c)
            elif .[0] == $s_read_quoted then
                _do_read_quoted($c)
            elif .[0] == $s_escape then
                _do_escape($c)
            else # .[0] == $s_final
                _do_final($c)
            end
        )
    )
    | _do_finalize[3][];
def parse_delimited($delim; $quot; $nl; $skip):
    _parse_delimited($delim; $quot; $nl; $skip);
def parse_delimited($delim; $quot; $nl):
    parse_delimited($delim; $quot; $nl; "\r");
def parse_delimited($delim; $quot):
    parse_delimited($delim; $quot; "\n");
def parse_delimited($delim):
    parse_delimited($delim; "\"");
def parse_csv:
    parse_delimited(",");

对于您的数据，您希望将定界符更改为分号。

$ cat se.csv
-21.3214077;55.4851413;Ruizia cordata
-21.3213078;55.4849803;Cossinia pinnata
$ jq -R 'parse_delimited(";")' se.csv
[
  "-21.3214077",
  "55.4851413",
  "Ruizia cordata"
]
[
  "-21.3213078",
  "55.4849803",
  "Cossinia pinnata"
]

对于大多数输入一次解析一行来说，这将很好地工作，但是如果您的数据具有文字换行符，则您将需要将整个文件作为字符串读取。

$ cat input.csv
Year,Make,Model,Description,Price
1997,Ford,E350,"ac, abs, moon",3000.00
1999,Chevy,"Venture ""Extended Edition""","",4900.00
1999,Chevy,"Venture ""Extended Edition, Very Large""",,5000.00
1996,Jeep,Grand Cherokee,"MUST SELL!
air, moon roof, loaded",4799.00
$ jq -Rs 'parse_csv' input.csv
[
  "Year",
  "Make",
  "Model",
  "Description",
  "Price"
]
[
  "1997",
  "Ford",
  "E350",
  "ac, abs, moon",
  "3000.00"
]
[
  "1999",
  "Chevy",
  "Venture \"Extended Edition\"",
  "",
  "4900.00"
]
[
  "1999",
  "Chevy",
  "Venture \"Extended Edition, Very Large\"",
  "",
  "5000.00"
]
[
  "1996",
  "Jeep",
  "Grand Cherokee",
  "MUST SELL!\nair, moon roof, loaded",
  "4799.00"
]

Answer 9

这是一个可以完成此操作的python单行脚本/脚本：

cat my.csv | python -c 'import csv, json, sys; print(json.dumps([dict(r) for r in csv.DictReader(sys.stdin)]))

Answer 10

John Kerl的Miller工具具有以下内置功能：

mlr --c2j --jlistwrap cat INPUT.csv > OUTPUT.json

在bash中将CSV转换为JSON

10 个答案:

SSV，CSV以及所有