从字符串中提取子字符串

时间:2017-03-09 14:07:59

标签: google-bigquery

在google-bigquery中,我需要拉出域**和**之间的字符串,如下例所示 该字符串位于" Site_Data"列下

有人能帮助我吗? 10倍!

enter image description here

2 个答案:

答案 0 :(得分:4)

见下面的例子

  
#standardSQL
WITH yourTable AS (
  SELECT '756-1__6565656565656, tagtype**unmapped,domain**www.sport.com,userarriveddirectly**False' AS Site_Data
)
SELECT 
  REGEXP_EXTRACT(Site_Data, r'domain\*\*(.*)\*\*') AS x,
  Site_Data
FROM yourTable

答案 1 :(得分:2)

所有字符串都有这种格式吗?假设您始终需要**分隔符后面的第三个字符串,有几个不同的选项。

1)使用SPLIT,例如:

#standardSQL
WITH SampleData AS (
  SELECT '756-1__67648582789116,tagtype**unmapped,domain**www.sport.com,userarriveddirectly**False' AS site_data
)
SELECT SPLIT(site_data, '**')[OFFSET(2)] AS visit_type
FROM SampleData;

2)使用REGEXP_EXTRACT,例如:

#standardSQL
WITH SampleData AS (
  SELECT '756-1__67648582789116,tagtype**unmapped,domain**www.sport.com,userarriveddirectly**False' AS site_data
)
SELECT REGEXP_EXTRACT(site_data, r'[^\*]+\*\*[^\*]+\*\*([^\*]+)') AS visit_type
FROM SampleData;

更进一步,如果您要拆分域名和到达类型,可以再次使用SPLIT

#standardSQL
WITH SampleData AS (
  SELECT '756-1__67648582789116,tagtype**unmapped,domain**www.sport.com,userarriveddirectly**False' AS site_data
)
SELECT
  SPLIT(visit_type)[OFFSET(0)] AS domain,
  SPLIT(visit_type)[OFFSET(1)] AS arrival_type
FROM (
  SELECT SPLIT(site_data, '**')[OFFSET(2)] AS visit_type
  FROM SampleData
);