AWS DMS SQL Server到s3镶木地板-更改数据类型转换规则和'不支持镶木地板类型:INT32(UINT_8)'

时间:2020-05-26 08:11:26

标签: apache-spark amazon-s3 aws-dms

我们使用AWS DMS将SQL Server数据库作为木地板文件转储到S3中。想法是在镶木地板上进行一些分析。满载完成后,将无法读取实木复合地板,因为它们在模式中具有UINT个字段。 Spark拒绝使用Parquet type not supported: INT32 (UINT_8)阅读它们。我们使用转换规则来覆盖UINT列的数据类型。但是看起来它们没有被DMS引擎接收。为什么?

有很多规则,例如“将单位转换为int”(请注意,UINT1是1字节无符号DMS DataTypes):

{
  "rule-type": "transformation",
  "rule-id": "7",
  "rule-name": "uintToInt",
  "rule-action": "change-data-type",
  "rule-target": "column",
  "object-locator": {
    "schema-name": "%",
    "table-name": "%",
    "column-name": "%",
    "data-type": "uint1"
  },
  "data-type": {
    "type": "int4"
  }
}

S3 DataFormat=parquet;ParquetVersion=parquet_2_0,而DMS Engine版本为3.3.2

但是仍然使用uint来获得实木复合地板方案。见下文:

id: int32
name: string
value: string
status: uint8

尝试使用火花阅读此类实木复合地板给了我

org.apache.spark.sql.AnalysisException: Parquet type not supported: INT32 (UINT_8);
    at org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter.typeNotSupported$1(ParquetSchemaConverter.scala:100)
    at org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter.convertPrimitiveField(ParquetSchemaConverter.scala:136)

为什么不触发DMS转换规则?

2 个答案:

答案 0 :(得分:1)

在DMS上将数据直接从 UINT 转换为 INT 可以解决此问题。 您的映射规则应如下所示:

{
"rules": [
    ...
    {
        "rule-type": "transformation",
        "rule-id": "2",
        "rule-name": "unit1-to-int1",
        "rule-action": "change-data-type",
        "rule-target": "column",
        "object-locator": {
            "schema-name": "acessa",
            "table-name": "%",
            "column-name": "%",
            "data-type": "uint1"
        },
        "data-type": {
            "type": "int1"
        }
    },
    {
        "rule-type": "transformation",
        "rule-id": "3",
        "rule-name": "unit2-to-int2",
        "rule-action": "change-data-type",
        "rule-target": "column",
        "object-locator": {
            "schema-name": "acessa",
            "table-name": "%",
            "column-name": "%",
            "data-type": "uint2"
        },
        "data-type": {
            "type": "int2"
        }
    },
    {
        "rule-type": "transformation",
        "rule-id": "4",
        "rule-name": "unit4-to-int4",
        "rule-action": "change-data-type",
        "rule-target": "column",
        "object-locator": {
            "schema-name": "acessa",
            "table-name": "%",
            "column-name": "%",
            "data-type": "uint4"
        },
        "data-type": {
            "type": "int4"
        }
    },
    {
        "rule-type": "transformation",
        "rule-id": "5",
        "rule-name": "unit8-to-int8",
        "rule-action": "change-data-type",
        "rule-target": "column",
        "object-locator": {
            "schema-name": "acessa",
            "table-name": "%",
            "column-name": "%",
            "data-type": "uint8"
        },
        "data-type": {
            "type": "int8"
        }
    }
]}

文档:https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Tasks.CustomizingTasks.TableMapping.html#CHAP_Tasks.CustomizingTasks.TableMapping.SelectionTransformation.Transformations

答案 1 :(得分:0)

我能够在镶木地板文件上进行转换的唯一方法是通过指定要转换的确切列。例如:

{
   "rules": [
   ...
   {
    "rule-type": "transformation",
    "rule-id": "2",
    "rule-name": "unit1-to-int1",
    "rule-action": "change-data-type",
    "rule-target": "column",
    "object-locator": {
        "schema-name": "acessa",
        "table-name": "<table_name>",
        "column-name": "<column_name>"
    },
    "data-type": {
        "type": "int1"
    }
   }
  ]
}

在对象定位器中使用通配符%作为列名不起作用

相关问题