解析通过上次收到的邮件发送IP:来自标题

时间:2016-06-20 22:35:19

标签: c# email parsing

我希望利用Received:from标头解析电子邮件标头中的最后一个IP地址。我期待找到最后收到的:从标题和识别任何IP。我的代码似乎无法正常工作,因为收到来自" {}等的许多特殊字符"。我也遇到问题,因为ip可能不在同一行。有没有办法轻松识别电子邮件标题中的最后一个发送IP,它可能在一个单独的行中?

这是我最初的工作:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Text.RegularExpressions;
using System.Net;
using System.IO;

namespace IP
{
    class Program
    {
        static void Main(string[] args)
        {
            int counter = 0;
            string line;
            System.IO.StreamReader file =
                new System.IO.StreamReader("C:\\ip.txt");

            while ((line = file.ReadLine()) != null)
            {
                const string x_orig_ip = "Received: from";
                line = line.Trim();
                if (line.StartsWith(x_orig_ip, StringComparison.OrdinalIgnoreCase))
                {
                    string sIpAddress = line.Substring(x_orig_ip.Length, line.Length - x_orig_ip.Length).Trim(new char[] { ' ', '\t', '[', ']', '(', ')' });
                    var ipAddress = System.Net.IPAddress.Parse(sIpAddress);
                    Console.WriteLine(ipAddress);
                    counter++;
                }
            }

            Console.ReadLine();
        }
    }
}

因此,从下面的标题中,我希望通过上次收到来获得101.123.148.12:来自条目:

Received: from test (subdomain.domain.com [192.168.0.1])
  Mon, 20 Jun 2016 10:46:57 -0400 (EDT)
Received: from test123 ([192.168.0.1])
  by test.test; Mon, 20 Jun 2016 10:46:57 -0400
Received: from test.engine.com (localhost [127.0.0.1])
  by test.testty.com (Postfix) with ESMTP id ABCDEF
  for <cpound@stackoverflow.com>; Sun, 19 Jun 2016 09:06:35 -0400 (EDT)
Received: from test.message.com (localhost [127.0.0.1])
    by from test.message.com (Authentication) with ESMTP
    Sun, 19 Jun 2016 09:06:35 -0400  
Authentication-Results: 
    spf=none smtp.mailfrom= smtp.helo
Received-SPF: none
    (192.168.0.1: No applicable sender policy available)
Received: from 192.168.0.1 (unknown [192.168.0.1])
  by with SMTP
Received: from unknown (HELO localhost)
  by 101.123.148.12 with ESMTPA; Sun, 19 Jun 2016 10:00:20 -0300
X-Originating-IP: 101.123.148.12
From: test@test.net
To: cpound@stackoverflow.com
Subject: Test
Date: Sun, 19 Jun 2016 09:56:41 -0300

1 个答案:

答案 0 :(得分:1)

你可以试试这个正则表达式:

var re = new RegEx(@"Received: (.|\n  )*([^\d](\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}))+", RegExOptions.Multiline);
var matches = re.Matches(headers);
if(matches.Count>0)
{
   var group = matches[matches.Count-1].Groups[3];
   string ip = group.Captures[group.Captures.Count-1].Value;
   // do something with ip...
}

其中标题是包含所有标题的字符串变量(不只是一行)。

它会将已接收标题中的所有IP地址提取到捕获组3.获取最后一次匹配的最后一次捕获以获得所需内容。

请注意,您通常不会考虑101.123.148.12,因为没有标题表明邮件是从 101.123.148.12 收到的,而是收到>的邮件,这是完全不同的。