提取电子邮件标题的“收件人:”字段

时间:2013-10-25 16:42:24

标签: sql sql-server

我有一个查询,其中一列是电子邮件标题的字符串,例如:

From: Media Temple user (mt.kb.user@gmail.com)
Subject: article: How to Trace a Email
Date: January 25, 2011 3:30:58 PM PDT
To: user@example.com
Return-Path: <mt.kb.user@gmail.com>
Envelope-To: user@example.com
Delivery-Date: Tue, 25 Jan 2011 15:31:01 -0700
Received: from po-out-1718.google.com ([72.14.252.155]:54907) by cl35.gs01.grid ...
Received: by po-out-1718.google.com with SMTP id y22so795146pof.4 for <user@exa ...
Received: by 10.141.116.17 with SMTP id t17mr3929916rvm.251.1214951458741; Tue,...
Received: by 10.140.188.3 with HTTP; Tue, 25 Jan 2011 15:30:58 -0700 (PDT)
Dkim-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=d...
Domainkey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:da...
Message-Id: <c8f49cec0807011530k11196ad4p7cb4b9420f2ae752@mail.gmail.com>
Mime-Version: 1.0
Content-Type: multipart/alternative; boundary="----=_Part_3927_12044027.1214951...
X-Spam-Status: score=3.7 tests=DNS_FROM_RFC_POST, HTML_00_10, HTML_MESSAGE, HTM...
X-Spam-Level: ***
Message Body: This is a KnowledgeBase article that provides information on how ...

我想仅提取'To:'字段中包含的电子邮件地址,在上面的示例user@example.com中。

我该如何做到这一点?

4 个答案:

答案 0 :(得分:2)

您可以使用分割功能。我喜欢使用数字表的版本,但是there are many alternatives。首先,一个包含1,000,000行的数字表:

SET NOCOUNT ON;
DECLARE @UpperLimit INT;
SET @UpperLimit = 1000000;

WITH n(rn) AS
(
    SELECT TOP (@UpperLimit) ROW_NUMBER() OVER (ORDER BY s1.[object_id])
    FROM sys.all_columns AS s1, sys.all_objects ORDER BY s1.[object_id]
)
SELECT [Number] = rn - 1
INTO dbo.Numbers FROM n
WHERE rn <= @UpperLimit + 1;

CREATE UNIQUE CLUSTERED INDEX n ON dbo.Numbers([Number]);

现在是一个通用的,内联的表值分割函数,它将分隔的字符串转换为集合:

CREATE FUNCTION dbo.SplitString
(
    @List NVARCHAR(MAX),
    @Delim VARCHAR(255)
)
RETURNS TABLE
AS
    RETURN ( SELECT [Value] FROM 
      ( 
        SELECT 
          [Value] = LTRIM(RTRIM(SUBSTRING(@List, [Number],
          CHARINDEX(@Delim, @List + @Delim, [Number]) - [Number])))
        FROM dbo.Numbers WHERE Number <= LEN(@List)
        AND SUBSTRING(@Delim + @List, [Number], LEN(@Delim)) = @Delim
      ) AS x
    );
GO

然后很简单:

DECLARE @x NVARCHAR(MAX) = N'From: Media Temple user (mt.kb.user@gmail.com)
Subject: article: How to Trace a Email
Date: January 25, 2011 3:30:58 PM PDT
To: user@example.com
Return-Path: <mt.kb.user@gmail.com>
Envelope-To: user@example.com
...';

SELECT LTRIM(SUBSTRING(Value, 4, 4000)) 
  FROM dbo.SplitString(@x, CHAR(13)+CHAR(10))
  WHERE Value LIKE 'To: %@%';

表中的数据?好的,没问题:

DECLARE @a TABLE(id INT, email NVARCHAR(MAX));

INSERT @a VALUES
(1,N'From: Media Temple user (mt.kb.user@gmail.com)
Subject: article: How to Trace a Email
Date: January 25, 2011 3:30:58 PM PDT
To: user@example.com
Return-Path: <mt.kb.user@gmail.com>
Envelope-To: user@example.com
...'),
(2,N'From: Media Temple user (mt.kb.user@gmail.com)
Subject: article: How to Trace a Email
Date: January 25, 2011 3:30:58 PM PDT
To: differentUser@somewhereelse.com
Return-Path: <mt.kb.user@gmail.com>
Envelope-To: user@example.com
...');

SELECT a.id, LTRIM(SUBSTRING(x.Value, 4, 4000))
FROM @a AS a
CROSS APPLY dbo.SplitString(a.email, CHAR(13)+CHAR(10)) AS x
WHERE x.Value LIKE 'To: %@%';

现在,您可能需要使用分隔符 - 它可能只是CHAR(10),或者只是CHAR(13),或者它们可能处于不同的顺序 - 不确定,并且无法从您的代码中分辨出来这是什么......

答案 1 :(得分:1)

您可以使用XML功能拆分行并找到所需内容;

DECLARE @X XML

SELECT @X = CONVERT(XML, '<y><x>' + 
                REPLACE(REPLACE(value, '<', '&lt;'), CHAR(10), '</x><x>') + 
                 '</x></y>')
FROM test

SELECT [Value] = T.c.value('.','NVARCHAR(MAX)')
FROM @X.nodes('/y/x') T(c)
WHERE T.c.value('.','NVARCHAR(MAX)') LIKE 'To: %'

An SQLfiddle to test with

答案 2 :(得分:0)

试试这个:

select substring(@s, charindex(char(13)+char(10)+'To: ', @s) + 6, charindex(char(13), @s, charindex(char(13)+char(10)+'To: ', @s)+6) - (charindex(char(13)+char(10)+'To: ', @s)+6))

这是一个完整的测试脚本:

declare @s varchar(500)

set @s = 'Date: January 25, 2011 3:30:58 PM PDT
To: user@example.com
Return-Path: <mt.kb.user@gmail.com>
Envelope-To: user@example.com'

select substring(@s, charindex(char(13)+char(10)+'To: ', @s) + 6, charindex(char(13)+char(10), @s, charindex(char(13)+char(10)+'To: ', @s)+6) - (charindex(char(13)+char(10)+'To: ', @s)+6))

请注意,在正确的电子邮件中,标题必须根据规范RFC2822由CRLF(char(13)+ char(10))分隔,并且上述代码做出相同的假设。

如果您的电子邮件中有不同的行结尾,则可能必须将char(13)+char(10)的每次出现更改为仅char(13)char(10)。如果您这样做,请记住同时将+6调整为+5(因为它减少了一个字符)。

答案 3 :(得分:0)

如果电子邮件地址位于第一个'To:''Return-Path:'之间,您可以使用此地址( Fiddle demo ):

declare @s nvarchar(max) = 'From: Media Temple user (mt.kb.user@gmail.com)
                        Subject: article: How to Trace a Email
                        Date: January 25, 2011 3:30:58 PM PDT
                        To: user@example.com
                        Return-Path: <mt.kb.user@gmail.com>...'

select substring(@s, charindex('To:',@s)+3, 
             charindex('Return-Path:',@s)- charindex('To:',@s)-3)

--Results
user@example.com

更通用的版本:假设电子邮件地址位于第一个返回路径之前

;with cte as (
 select reverse(left(@s, charindex('Return-Path:',@s)-1)) rs
)
select reverse(left(rs, charindex(':oT', rs)-1)) 
from cte

在表格查询中,请将@s 替换为您的column name