regexp_replace

时间:2016-12-30 14:55:33

标签: oracle11g

我的表格中有几行,如下所示:

row1:  abc    changed on   12 November, 2008 11:30 AM and its abc..region1  
row2:  defg   updated      14 January, 2012 08:20 PM         ......region2  
row3:  ghijkl corrected by 18 august, 2013 9:30 AM    ..something..region3 

我的要求如下:

  1. 以上所有日期均为EST时区,日期格式与上述完全相同且不会更改。
  2. 我想根据该行中的区域将这些行中的日期从EST更新到不同的时区,并且格式应更改为12 dec 2016 7:30 AM
  3. 所以我构建的查询是(以row1为例),如下所示:

    select regexp_replace(
         'abc changed on 12 November, 2008 11:30 AM and its abc..region1',
         '([0-9]{2})([[:blank:]])      (January|February|March|April|May|June|July|August|September|October|November|December)(,[[:blank:]])([0-9]{4})([[:blank:]])([0-9]{2}:[0-9]{2})([[:blank:]])(AM|PM)','\1-\3-\5 \7 \9',1,0,'i')
    

    输出:

    abc changed on 12-November-2008 11:30 AM and its abc..region1
    

    所以我对上面的查询感到满意,因为我得到了一个字符串 格式化的日期。即使这不是最后的约会 格式,我可以用这个日期传递给一些转换的函数 这个日期根据该地区做了一些处理和fianlly 返回日期类型。出于同样的目的,我在上面添加了to_date 查询:

     select regexp_replace(
       'abc changed on 12 November, 2008 11:30 AM and its abc..region1',
       '([0-9]{2})([[:blank:]])   (January|February|March|April|May|June|July|August|September|October|November|December)(,[[:blank:]])([0-9]{4})([[:blank:]])([0-9]{2}:[0-9]{2})([[:blank:]])(AM|PM)',
     substr('\1-\3-\5 \7 \9',1),
     1,0,'i')
    

    输出:

    abc changed on 12-November-2008 11:30 AM and its
    abc..region1 --> works fine till here 
    

    现在我添加to_date将日期字符串类型转换为实际日期 键入以对其进行一些处理:

    select regexp_replace(
       'abc changed on 12 November, 2008 11:30 AM and its abc..region1',
       '([0-9]{2})([[:blank:]])   (January|February|March|April|May|June|July|August|September|October|November|December)(,[[:blank:]])([0-9]{4})([[:blank:]])([0-9]{2}:[0-9]{2})([[:blank:]])(AM|PM)',
     to_date(substr('\1-\3-\5 \7 \9',1),'dd-mon-yyyy HH:MI AM'),
     1,0,'i')
    

    此查询给出了一个错误:

     ORA-01858: a non-numeric character found where a numeric was expected
    

    我检查了是否传递了错误的参数  to_date(),并解决了下面的问题,但效果很好。

     Select to_date('12-November-2008 11:30 AM','dd-mon-yyyy HH:MI AM') 
       from dual; 
    

    输出:

    12-Nov-2008 
    

    (我并不担心时间戳,因为在这个日期它将会是无论如何)

    为了避免混淆,我编号了上面正则表达式的子串:

    ([0-9]{2})-->1 ([[:blank:]])-->2
    (January|February|March|April|May|June|July|August|September|October|November|December)-->3
    

    (,[[:blank:]]) - > 4([0-9] {4}) - > 5([[:blank:]]) - > 6  ([0-9] {2}:[0-9] {2}) - > 7([[:blank:]]) - > 8(AM | PM) - > 9

    select regexp_replace(
     'abc changed on 12 November, 2008 11:30 AM and its abc..region1',
     '([0-9]{2})([[:blank:]])          (January|February|March|April|May|June|July|August|September|October|November|December)(,[[:blank:]])([0-9]{4})([[:blank:]])([0-9]{2}:[0-9]
    {2})([[:blank:]])(AM|PM)','\1-\3-\5 \7 \9',1,0,'i')
    

1 个答案:

答案 0 :(得分:2)

假设您的字符串始终以该特定格式包含日期(并且没有无效日期等),那么以下内容适用于您:

WITH sample_data AS (SELECT ' the date is 12 November, 2008 11:30 AM' str FROM dual UNION ALL
                     SELECT 'Here''s a date of 1 March, 2015 1:43 pm' str FROM dual UNION ALL
                     SELECT '1 February,2016 9:43 AM' str FROM dual UNION ALL
                     SELECT 'And again it''s 21 May, 2016 9:43 AM and a little bit extra' str FROM dual)
SELECT str,
       to_date(regexp_replace(str, '^.*?([[:digit:]]{1,2} [[:alpha:]]{3,9}, ?[[:digit:]]{4} [[:digit:]]{1,2}\:[[:digit:]]{2} (A|P)M).*$', '\1', 1, 1, 'i'), 'dd Month yyyy, hh:mi am') dt
FROM   sample_data;

 STR                                                        DT
---------------------------------------------------------- -------------------
 the date is 12 November, 2008 11:30 AM                    12/11/2008 11:30:00
Here's a date of 1 March, 2015 1:43 pm                     01/03/2015 13:43:00
1 February,2016 9:43 AM                                    01/02/2016 09:43:00
And again it's 21 May, 2016 9:43 AM and a little bit extra 21/05/2016 09:43:00

正则表达式可以按如下方式细分:

  1. ^.*? - 尽可能少地从行首开始匹配任何字符(新行除外),可能为0或更多。
  2. ([[:digit:]]{1,2} [[:alpha:]]{3,9}, ?[[:digit:]]{4} [[:digit:]]{1,2}\:[[:digit:]]{2} (A|P)M) - 这是我们正在寻找的模式,我们将用它来替换整个字符串(这是别名为\1,我们可以将其传递给替换字符串参数)。
  3. .*$ - 匹配字符串末尾的任何字符
  4. 模式的第二部分可以进一步细分为:

    1. [[:digit:]]{1,2} - 一位或两位数字
    2. - 单个空格字符
    3. [[:alpha:]]{3,9} - 三到九个字母(大写或小写)
    4. , ? - 逗号后跟0或1个空格
    5. [[:digit:]]{4} - 四位数字
    6. - 单个空格字符
    7. [[:digit:]]{1,2} - 一位或两位数字
    8. \: - 单个冒号字符
    9. [[:digit:]]{1,2} - 两位数
    10. - 单个空格字符
    11. (A|P)M - 字母A或P后跟M
    12. 这应该适合你:

      WITH sample_data AS (SELECT 'abc    changed on   12 November, 2008 11:30 AM and its abc..region1' str FROM dual UNION ALL
                           SELECT 'defg   updated      14 January, 2012 08:20 PM         ......region2' str FROM dual UNION ALL
                           SELECT 'ghijkl corrected by 18 august, 2013 9:30 AM    ..something..region3' str FROM dual)
      SELECT str,
             regexp_replace(str,
                            '(^.*?)(([[:digit:]]{1,2}) (January|February|March|April|May|June|July|August|September|October|November|December), (?[[:digit:]]{4} [[:digit:]]{1,2}\:[[:digit:]]{2} (A|P)M))(.*$)',
                            '\1\3-\4-\5\7', 1, 1, 'i') dt
      FROM   sample_data;
      
      STR                                                                 DT
      ------------------------------------------------------------------- --------------------------------------------------------------------------------
      abc    changed on   12 November, 2008 11:30 AM and its abc..region1 abc    changed on   12-November-2008 11:30 AM and its abc..region1
      defg   updated      14 January, 2012 08:20 PM         ......region2 defg   updated      14-January-2012 08:20 PM         ......region2
      ghijkl corrected by 18 august, 2013 9:30 AM    ..something..region3 ghijkl corrected by 18-august-2013 9:30 AM    ..something..region3