如何在Perl中处理商标字符™

时间:2014-01-17 19:58:58

标签: perl encoding character-encoding

我正在使用Perl从SQLite数据库和WWW:Mechanize模块中获取数据来进行网页抓取。

我发布的数据(在数据库中)中有一些字符,在查看网站上的文字后,它有几个奇怪的字符:â¢,而不是

我在Perl程序的顶部设置了以下内容。我用它来防止终端中有关“宽字符”的警告。

binmode(STDOUT, ":utf-8");

我对编码/解码字符的了解并不多,所以任何帮助都会有用。

修改:在阅读Perl IO后,我找到了解决问题的stackoverflow answer

2 个答案:

答案 0 :(得分:5)

解码输入,编码输出。

use open ':std', ':encoding(UTF-8)';  # Outputs are UTF-8
BEGIN { binmode STDIN; }              # ...but not the raw CGI request.

use CGI qw( -utf8 );                  # Decode parameters
use DBI qw( );

{
   my $cgi = CGI->new();
   print $cgi->header(
      -type    => "text/plain",  # Just cause it's shorter.
      -charset => "UTF-8",       # Tell browser encoding used.
   );

   my $dbh = DBI->connect(
      "dbi:SQLite:dbname=/tmp/tmp.sqlite", "", "",
      {
         AutoCommit     => 1,
         RaiseError     => 1,
         PrintError     => 0,
         PrintWarn      => 1,
         sqlite_unicode => 1,   # Encode and decode for us.
      },
   );

   $dbh->do("CREATE TABLE Testing ( str TEXT )");

   my $from_html_parser = "\x{2122}";

   # Should be 2122, since the trademark symbol is U+2122.
   printf("from_html_parser = %v04X\n", $from_html_parser);

   print("$from_html_parser\n");

   $dbh->do("INSERT INTO Testing VALUES (?)", undef, $from_html_parser);

   my $from_database = $dbh->selectrow_array("SELECT * FROM Testing");

   # Should be 2122, since the trademark symbol is U+2122.
   printf("from_database = %v04X\n", $from_database);

   print("$from_database\n");
}

END { unlink("/tmp/tmp.sqlite"); }

答案 1 :(得分:0)

这些文档帮助了我:Perl IO

然后,通过几次Google搜索,我找到了stackoverflow answer来解决我的问题。