Question

我需要一些帮助来获取MySQL表来存储和输出以下语言的字符：

英文
法语
俄语
土耳其
德国

这些是我在数据中所知道的语言。它还使用如下数学符号：

b∈A。定义s（A）：=supn≥0rA（n）每个A⊆？ ∪{0}。

我使用htmlentities对文本进行编码。上面的？旨在显示为ℕ。当我查看PhpMyAdmin中的数据时，它会以这种方式显示。其他字符按预期编码。

该表设置为 utf8_unicode_ci ，并且网站的所有方面都已设置为 UTF-8 （包括通过.htaccess文件，PHP标头和元数据标记）。

请帮帮忙？

其他信息：

托管环境：

Linux, Apache
Mysql 5.5.38 
PHP Version 5.4.4-14

连接字符串：

ini_set('default_charset', 'UTF-8');
$mysqli = new mysqli($DB_host , $DB_username, $DB_password);
$mysqli->set_charset("utf8");
$mysqli->select_db($DB_name);

SHOW CREATE TABLE mydatabase.mytable输出：

CREATE TABLE `tablename` (
 `id` int(11) NOT NULL AUTO_INCREMENT,
 `created` datetime NOT NULL,
 `updated` datetime NOT NULL,
 `product` int(11) NOT NULL,
 `ppub` tinytext COLLATE utf8_unicode_ci NOT NULL,
 `pubdate` date NOT NULL,
 `numerous_other_tinytext_cols` tinytext COLLATE utf8_unicode_ci NOT NULL,
 `numerous_other_tinytext_cols` tinytext COLLATE utf8_unicode_ci NOT NULL,
 `text` text COLLATE utf8_unicode_ci NOT NULL,
 `keywords` tinytext COLLATE utf8_unicode_ci NOT NULL,
 `active` int(11) NOT NULL DEFAULT '1',
 `orderid` int(11) NOT NULL,
 `src` tinytext CHARACTER SET latin1 NOT NULL,
 `views` int(11) NOT NULL,
 PRIMARY KEY (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=17780 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci

从info_schema.SCHEMATA输出中选择SELECT DEFAULT_CHARACTER_SET_NAME：

DEFAULT_CHARACTER_SET_NAME
utf8 [->UTF-8 Unicode]
utf8mb4 [->UTF-8 Unicode]

使用的字体：

Arial

数据库中的文字样本：

Let &lt;em&gt;A&lt;/em&gt; be a subset of the set of nonnegative integers ℕ &cup; {0}, and let &lt;em&gt;r&lt;/em&gt;&lt;sub&gt;&lt;em&gt;A&lt;/em&gt;&lt;/sub&gt; (&lt;em&gt;n&lt;/em&gt;) be the number of representations of &lt;em&gt;n&lt;/em&gt; &ge; 0 by the sum &lt;em&gt;a&lt;/em&gt; + &lt;em&gt;b&lt;/em&gt; with &lt;em&gt;a, b&lt;/em&gt; &isin; &lt;em&gt;A&lt;/em&gt;.

网页上的输出：

Let <em>A</em> be a subset of the set of nonnegative integers ? ∪ {0}, and let <em>r</em><sub><em>A</em></sub> (<em>n</em>) be the number of representations of <em>n</em> ≥ 0 by the sum <em>a</em> + <em>b</em> with <em>a, b</em> ∈ <em>A</em>.

哪个成为

设A是非负整数集的子集？ ∪{0}，并且让rA（n）为n≥0的表示的数量乘以a + b与a，b∈A。

Answer 1

虽然您的数据库和表格配置为使用UTF-8，但您的某个列仍然不是：

CREATE TABLE `tablename` (
 `id` int(11) NOT NULL AUTO_INCREMENT,
 `created` datetime NOT NULL,
 `updated` datetime NOT NULL,
 `product` int(11) NOT NULL,
 `ppub` tinytext COLLATE utf8_unicode_ci NOT NULL,
 `pubdate` date NOT NULL,
 `numerous_other_tinytext_cols` tinytext COLLATE utf8_unicode_ci NOT NULL,
 `numerous_other_tinytext_cols` tinytext COLLATE utf8_unicode_ci NOT NULL,
 `text` text COLLATE utf8_unicode_ci NOT NULL,
 `keywords` tinytext COLLATE utf8_unicode_ci NOT NULL,
 `active` int(11) NOT NULL DEFAULT '1',
 `orderid` int(11) NOT NULL,
 `src` tinytext CHARACTER SET latin1 NOT NULL,  <--------- This one
 `views` int(11) NOT NULL,
 PRIMARY KEY (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=17780 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci

由于所有其他符号都经过HTML编码，因此它们将在所有字符集中生存，但不会ℕ，它没有命名的实体引用。

您需要转换列：

ALTER TABLE tablename MODIFY src TINYTEXT CHARACTER SET utf8;

注意：我注意到你喜欢数学符号。其中一些是在基本多语言平面之外，即。有代码点＆gt; 0xFFFF，例如mathematical letter variants (fraktur, double-struck, semantic italic etc.)。

如果你想支持它们，你需要将MySQL中的编码（表，列，连接）切换到utf8mb4，这是真正的UTF-8（MySQL中的utf8表示子集仅具有BMP的UTF-8，具有utf8mb4_unicode_ci校对。 Here is how to do the migration.

另外，我注意到你是HTML编码HTML。也许你有理由，但在我看来存储这个没有意义：

&lt;em&gt;A&lt;/em&gt;

如果要将其放入HTML文档，现在需要至少对其进行一次HTML解码，有时两次。我宁愿存储几乎所有人都做的事情：

<em>A</em>

这样，您将以最佳方式本地存储Unicode字符。

用于多种欧洲语言的MySQL字符集+数学符号

1 个答案: