BigQuery ML教程-“计算PREDICT失败:在输入中找到空值”。

时间:2018-09-02 10:20:59

标签: google-bigquery

我正在遵循BQML tutorial,该指南介绍了如何根据婴儿的性别,怀孕的时间以及有关母亲的人口统计信息来预测孩子的出生体重。

执行用于评估模型的SQL时,BigQuery会出现以下错误:

Failure in computing PREDICT: Null value found in input.

这是评估SQL:

 #standardSQL
SELECT
  *
FROM
  ML.EVALUATE(MODEL `bqml_tutorial.natality_model`,
    (
    SELECT
      weight_pounds,
      is_male,
      gestation_weeks,
      mother_age,
      CAST(mother_race AS STRING) AS mother_race
    FROM
      `bigquery-public-data.samples.natality`
    WHERE
      weight_pounds IS NOT NULL))

enter image description here

用于创建模型的SQL是:

#standardSQL
CREATE MODEL `bqml_tutorial.natality_model`
OPTIONS
  (model_type='linear_reg',
    input_label_cols=['weight_pounds']) AS
SELECT
  weight_pounds,
  is_male,
  gestation_weeks,
  mother_age,
  CAST(mother_race AS string) AS mother_race
FROM
  `bigquery-public-data.samples.natality`
WHERE
  weight_pounds IS NOT NULL
  AND RAND() < 0.001

很有趣,当进行预测时,它就可以正常工作。问题总是在尝试评估模型时出现。

有什么想法吗?

2 个答案:

答案 0 :(得分:1)

为帮助您理解问题,您可以在下面运行

#standardSQL
SELECT
  COUNTIF(weight_pounds IS NULL) weight_pounds_nulls,
  COUNTIF(is_male IS NULL) is_male_nulls,
  COUNTIF(gestation_weeks IS NULL) gestation_weeks_nulls,
  COUNTIF(mother_age IS NULL) mother_age_nulls,
  COUNTIF(mother_race IS NULL) mother_race_nulls
FROM (
  SELECT
    weight_pounds,
    is_male,
    gestation_weeks,
    mother_age,
    CAST(mother_race AS STRING) AS mother_race
  FROM `bigquery-public-data.samples.natality`
  WHERE weight_pounds IS NOT NULL
)

结果为

Row weight_pounds_nulls is_male_nulls   gestation_weeks_nulls   mother_age_nulls    mother_race_nulls    
1   0                   0               4749775                 0                   9874846    

因此,请在下面运行,以进行评估

#standardSQL
SELECT
  *
FROM
  ML.EVALUATE(MODEL `bqml_tutorial.natality_model`,
    (
      SELECT
        weight_pounds,
        is_male,
        gestation_weeks,
        mother_age,
        CAST(mother_race AS STRING) AS mother_race
      FROM `bigquery-public-data.samples.natality`
      WHERE weight_pounds IS NOT NULL
      AND gestation_weeks IS NOT NULL
      AND mother_race IS NOT NULL
     ))  

因此它将产生以下评估

Row mean_absolute_error mean_squared_error  mean_squared_log_error  median_absolute_error   r2_score                explained_variance   
1   0.957266870271064   1.6762698039982795  0.03411192361406951     0.73998132611964        0.047271288906207354    0.04732780918772106    

我认为您应该对PREDICT进行相同的调整

答案 1 :(得分:0)

BQML当前自动为您填充这些NULL。请重试使用原始数据(不使用非null过滤器)。