Perl和Postgresql UTF8目录问题

时间:2015-12-17 19:02:58

标签: perl postgresql utf-8

我在使用Perl 5.22和Postgresql(9.4)的Mac(10.11.2)上使用带有UTF8字符的目录名时遇到问题。 Postgresql中的文本编码设置为UTF8。

如果我的目录名称中包含非ascii UTF8字符,那么如果目录名由Perl脚本读入或插入Perl脚本中的字符串,我可以chdir()到该目录。如果我将此名称插入到PG表中并将其读回(SELECT dirname FROM utfdirs),我无法将该目录转换为该目录。但是,屏幕上打印的字符串是相同的,两个字符串上的Perl cmp测试报告它们是相同的,而guess_encoding()报告都是UTF8。

#!/opt/local/bin/perl5.22  
use strict;
use Cwd;
use DBI;
use Encode;
use Encode qw/from_to/;
use Encode::Detect;
use Encode::Guess;
use Encode::UTF8Mac;
#
Encode::Guess->add_suspects(qw/utf-8-mac/);
#
my $dbname = 'test';
my $dbh = DBI->connect("dbi:Pg:dbname=$dbname;host=localhost");
$dbh->do("SET client_min_messages TO WARNING");
#
my $homeDir = '/Users/jldasch';
chdir($homeDir) or die "Cannot cd to [$homeDir]\n";
opendir(D,".");
my @tdlist = sort grep(/(Lambda?)|(Delta?)/,readdir(D));
closedir(D);
$dbh->do("DELETE FROM utfdirs");
my $ins = $dbh->prepare("INSERT INTO utfdirs (dirname) VALUES (?)");
foreach my $d (@tdlist) {
    chdir($homeDir);
    my $ok = chdir($d) ? 1 : 0;
    my $fp = "${homeDir}/${d}";
    printf("%2d %s\n",$ok,$fp);
    $ins->execute($fp);
}
my $rset = $dbh->selectall_arrayref("SELECT dirname FROM utfdirs ORDER BY dirname");
my $i = 0;
foreach my $r (@$rset) {
    my $dbdir = $r->[0];
    my $pdir = ${homeDir} . '/' . $tdlist[$i++];
    print "$r->[0]  $pdir\n";
    my $encPerl = guess_encoding($pdir);
    my $encDb = guess_encoding($dbdir);
    print "Perl Encoding [$encPerl->{Name}]\n";
    print "Db   Encoding [$encDb->{Name}]\n";
    unless ( chdir($dbdir) ) {
    print "Cannot CD to DbDir [$dbdir]\n";
    print "DbDir and PerlDir Match\n" if ($dbdir eq $pdir)
}
exit;

输出:

bash-3.2$ ./utfstuff2.pl 
1 /Users/jldasch/DeltaΔ
1 /Users/jldasch/Lambdaλ
/Users/jldasch/DeltaΔ  /Users/jldasch/DeltaΔ
Perl Encoding [utf8]
Db   Encoding [utf8]
Cannot CD to DbDir [/Users/jldasch/DeltaΔ]
DbDir and PerlDir Match
/Users/jldasch/Lambdaλ  /Users/jldasch/Lambdaλ
Perl Encoding [utf8]
Db   Encoding [utf8]
Cannot CD to DbDir [/Users/jldasch/Lambdaλ]
DbDir and PerlDir Match

所以在我到目前为止检查的级别Perl告诉我字符串是相同的(cmp和guess_encoding()),它们打印相同,但它们不一样。

如何将Postgresql返回的UTF8字符串转换为可接受的字符串(在Perl中)作为chdir()的有效目录名?

1 个答案:

答案 0 :(得分:0)

有一个模块Encode :: UTF8Mac似乎可以解决这个问题。

my $macOkDir = Encode::decode('utf-8-mac',$dbDir)

- John Daschbach