正则表达式字符串拆分为数组

时间:2015-08-14 15:48:14

标签: php arrays regex string

您好我需要拆分一些字符串,但是我对正则表达式或php数组不是很有经验。 所以字符串是这样的:

use strict;
use warnings;
use utf8;
use File::BOM;
use feature 'say';

my $UTF;
my $data = "Hello, héhé, 中文.\nsecond line : my 2€"; # 中文 = zhong wen = chinese

# UTF16 BE + BOM but incorrect CRLF: "0D 0A 00" instead of "0D 00 0A 00"
open $UTF, ">:encoding(UTF-16)", "utf-16-std-be.txt" or die $!;
say $UTF $data;
close $UTF;

# same as UTF-16BE (no BOM, incorrect CRLF)
open $UTF, ">:encoding(ucs2)", "utf-ucs2.txt" or die $!;
say $UTF $data;
close $UTF;

# UTF16 BE, no BOM, incorrect CRLF
open $UTF, ">:encoding(UTF-16BE)", "utf-16-be-nobom.txt" or die $!;
say $UTF $data;
close $UTF;

# UTF16 LE, no BOM, incorrect CRLF
open $UTF, ">:encoding(UTF-16LE)", "utf-16-le-nobom-wrongcrlf.txt" or die $!;
say $UTF $data;
close $UTF;

# UTF16 LE, BOM OK but still incorrect CRLF
open $UTF, ">:encoding(UTF-16LE):via(File::BOM)", "utf-16-le-bom-wrongcrlf.txt" or die $!;
say $UTF $data;
close $UTF;

# UTF16 LE non raw incorrect 
# (crlf by default on windows) -> 0A => 0D 0A
open $UTF, ">:encoding(UTF-16LE):via(File::BOM)", "utf-16-le-bom-wrongcrlf2.txt" or die $!;
print $UTF $data, "\x0a"; # 0A is magically expanded to 0D 0A but wrong
close $UTF;

# UTF16 LE + BOM + LF 
# raw -> 0A => 0A
# could be correct on UNIX but I need CRLF
open $UTF, ">raw::encoding(UTF-16LE):via(File::BOM)", "utf-16-le-bom-wrongcrlf3.txt" or die $!;
say $UTF $data;
close $UTF;

# manual BOM, but CRLF OK
open $UTF, ">:raw:encoding(UTF-16LE):crlf", "utf-16-le-bommanual-crlfok.txt" or die $!;
print $UTF "\x{FEFF}";
say $UTF $data;
close $UTF;

#auto BOM, CRLF OK ?
#incorrect, says utf8 "\xA9" does not map to Unicode at c:/perl/Dwimperl-5.14/perl/lib/Encode.pm line 176.
# But I cannot see where the A9 comes from ??!
#~ open $UTF, ">:raw:encoding(UTF-16LE):via(File::BOM):crlf", "utf-16-le-autobom-crlfok1.txt" or die $!;
#~ print $UTF $data;
#~ say $UTF $data;
#~ close $UTF;

# WTF? \n becomes 0D 00 0D 0A 00
open $UTF, ">:encoding(UTF-16LE):crlf:via(File::BOM)", "utf-16-le-autobom-crlf2.txt" or die $!;
say $UTF $data;
close $UTF;

#CORRECT WAY?? : Automatic BOM, CRLF is OK
open $UTF, ">:raw:encoding(UTF-16LE):crlf:via(File::BOM)", "utf-16-le-autobom-crlfok3.txt" or die $!;
say $UTF $data;
close $UTF;

我需要在字符串中将其转换为数组:

A N K U N F T   11.08.15
*** N ***
11.08.15  xxx  xxx  X3 2830  14:25   17:50
18.08.15  xxx  xxx  X3 2830  18:40  F882129  dsdsaidsaia  F882129  xxxyxyagydaysd

我在regex101上做了以下事情:

for fnr:

date1 ->  11.08.15

date2-> 18.08.15

fnr1 -> X3 2830

h1 - > 17:50

fnr2 -> X3 2830

h2 -> 18:40

n1 -> dsdsaidsaia

n2 -> xxxyxyagydaysd

日期:

(\w{2}\s\d{4})

表示h:

(\n\s\d{2}\W\d{2}\W\d{2})

但是我不知道如何将date1从date2,fnr1与fnr2和h1中分离出来。

我在PHP中尝试了这个日期,并没有输出我想要的日期:

(\s{2}\d{2}\:\d{2}\n) 

有人能帮帮我吗?提前谢谢!

1 个答案:

答案 0 :(得分:0)

这将完全符合您的要求:

^((?:\d{2}\.?){3}).*?(\w{2}\s\d{4}).*?(\d{2}:\d{2})(?:.*?(\b[a-z]+\b).*?(\b[a-z]+\b))?$

它将每行的所有内容分成不同的捕获组。如果您有问题,请告诉我。

注意:请务必打开gm标记,以便^$使每行开始和结束;不是整个字符串。

Regex101