我们使用AWS DMS将SQL Server数据库作为木地板文件转储到S3中。想法是在镶木地板上进行一些分析。满载完成后,将无法读取实木复合地板,因为它们在模式中具有UINT
个字段。 Spark拒绝使用Parquet type not supported: INT32 (UINT_8)
阅读它们。我们使用转换规则来覆盖UINT
列的数据类型。但是看起来它们没有被DMS引擎接收。为什么?
有很多规则,例如“将单位转换为int”(请注意,UINT1是1字节无符号DMS DataTypes):
{
"rule-type": "transformation",
"rule-id": "7",
"rule-name": "uintToInt",
"rule-action": "change-data-type",
"rule-target": "column",
"object-locator": {
"schema-name": "%",
"table-name": "%",
"column-name": "%",
"data-type": "uint1"
},
"data-type": {
"type": "int4"
}
}
S3 DataFormat=parquet;ParquetVersion=parquet_2_0
,而DMS Engine版本为3.3.2
但是仍然使用uint来获得实木复合地板方案。见下文:
id: int32
name: string
value: string
status: uint8
尝试使用火花阅读此类实木复合地板给了我
org.apache.spark.sql.AnalysisException: Parquet type not supported: INT32 (UINT_8);
at org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter.typeNotSupported$1(ParquetSchemaConverter.scala:100)
at org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter.convertPrimitiveField(ParquetSchemaConverter.scala:136)
为什么不触发DMS转换规则?
答案 0 :(得分:1)
在DMS上将数据直接从 UINT 转换为 INT 可以解决此问题。 您的映射规则应如下所示:
{
"rules": [
...
{
"rule-type": "transformation",
"rule-id": "2",
"rule-name": "unit1-to-int1",
"rule-action": "change-data-type",
"rule-target": "column",
"object-locator": {
"schema-name": "acessa",
"table-name": "%",
"column-name": "%",
"data-type": "uint1"
},
"data-type": {
"type": "int1"
}
},
{
"rule-type": "transformation",
"rule-id": "3",
"rule-name": "unit2-to-int2",
"rule-action": "change-data-type",
"rule-target": "column",
"object-locator": {
"schema-name": "acessa",
"table-name": "%",
"column-name": "%",
"data-type": "uint2"
},
"data-type": {
"type": "int2"
}
},
{
"rule-type": "transformation",
"rule-id": "4",
"rule-name": "unit4-to-int4",
"rule-action": "change-data-type",
"rule-target": "column",
"object-locator": {
"schema-name": "acessa",
"table-name": "%",
"column-name": "%",
"data-type": "uint4"
},
"data-type": {
"type": "int4"
}
},
{
"rule-type": "transformation",
"rule-id": "5",
"rule-name": "unit8-to-int8",
"rule-action": "change-data-type",
"rule-target": "column",
"object-locator": {
"schema-name": "acessa",
"table-name": "%",
"column-name": "%",
"data-type": "uint8"
},
"data-type": {
"type": "int8"
}
}
]}
答案 1 :(得分:0)
我能够在镶木地板文件上进行转换的唯一方法是通过指定要转换的确切列。例如:
{
"rules": [
...
{
"rule-type": "transformation",
"rule-id": "2",
"rule-name": "unit1-to-int1",
"rule-action": "change-data-type",
"rule-target": "column",
"object-locator": {
"schema-name": "acessa",
"table-name": "<table_name>",
"column-name": "<column_name>"
},
"data-type": {
"type": "int1"
}
}
]
}
在对象定位器中使用通配符%作为列名不起作用