Hive正规表达式Serde是否可用于多层YAML数据?

时间:2019-03-08 21:29:30

标签: regex unix hive

我以多层Yaml格式自定义导出了多个表及其列。 示例提取用伪值修改的

schemas:
- name: exports
  tables:
  - name: sugar
    description: makes stuff sweet
    active_date: 2019-01-07 00:00:00
    columns:
    - name: color
      type: abcd
    - name: taste
      type: abcd
      description: xyz
      example: 21352352
    - name: structure
      type: abcd
      description: xyzasaa
      example: 10001
  - name: salt
    description: not that sweet.
      makes it salty.
    active_date: 2018-12-18 00:00:00
    columns:
    - name: strength
      type: abcdef
      description: easy to find
      example: 2018-03-03 12:30:00
    - name: color
      type: abcdeffa
      description: not sweet
      example: 21352352
    - name: quality
      type: abcd
      description: how much is needed
      example: 10001

我需要使用一些Serde将数据导入到Hive表中。我熟悉jsonSerde,但不幸的是不支持此格式,因此正在寻找一种替代方法。有人可以建议一种最佳方法吗? regexSerde可以完全帮助我实现的目标吗?

配置单元表数据可以通过以下方式之一表示:

<style type="text/css">
	table.tableizer-table {
		font-size: 12px;
		border: 1px solid #CCC; 
		font-family: Arial, Helvetica, sans-serif;
	} 
	.tableizer-table td {
		padding: 4px;
		margin: 3px;
		border: 1px solid #CCC;
	}
	.tableizer-table th {
		background-color: #104E8B; 
		color: #FFF;
		font-weight: bold;
	}
</style>
<table class="tableizer-table">
<thead><tr class="tableizer-firstrow"><th>Level 1(name)</th><th>Level 2(name)</th><th>Level 2 (type)</th><th>Level 2 (description)</th></tr></thead><tbody>
 <tr><td>sugar</td><td>color</td><td>abcd</td><td>&nbsp;</td></tr>
 <tr><td>sugar</td><td>taste</td><td>abcd</td><td>xyz</td></tr>
 <tr><td>sugar</td><td>structure</td><td>abcd</td><td>xyzasaa</td></tr>
 <tr><td>salt</td><td>strength</td><td>abcdef</td><td>easy to find</td></tr>
 <tr><td>salt</td><td>color</td><td>abcdeffa</td><td>not sweet</td></tr>
 <tr><td>salt</td><td>quality</td><td>abcd</td><td>how much is needed</td></tr>
</tbody></table>

---或---

<style type="text/css">
	table.tableizer-table {
		font-size: 12px;
		border: 1px solid #CCC; 
		font-family: Arial, Helvetica, sans-serif;
	} 
	.tableizer-table td {
		padding: 4px;
		margin: 3px;
		border: 1px solid #CCC;
	}
	.tableizer-table th {
		background-color: #104E8B; 
		color: #FFF;
		font-weight: bold;
	}
</style>
<table class="tableizer-table">
<thead><tr class="tableizer-firstrow"><th>Level 1(name.colum)</th><th>Level 2 (type)</th><th>Level 2 (description)</th></tr></thead><tbody>
 <tr><td>sugar.color</td><td>abcd</td><td>&nbsp;</td></tr>
 <tr><td>sugar.taste</td><td>abcd</td><td>xyz</td></tr>
 <tr><td>sugar.structure</td><td>abcd</td><td>xyzasaa</td></tr>
 <tr><td>salt.strength</td><td>abcdef</td><td>easy to find</td></tr>
 <tr><td>salt.color</td><td>abcdeffa</td><td>not sweet</td></tr>
 <tr><td>salt.quality</td><td>abcd</td><td>how much is needed</td></tr>
</tbody></table>

编辑:使用最简单的方法,我可以在下面进行提取:

$ grep -P '(?<=- name: ).*' export.yaml
- name: exports
  - name: sugar
    - name: color
    - name: taste
    - name: structure
  - name: salt
    - name: strength
    - name: color
    - name: quality

但是我如何建立缩进关系,所以结果就像:

sugar.color,sugar.taste,sugar.structure
salt.strength,salt.color,salt.quality

0 个答案:

没有答案
相关问题