Ruby比较两个相同的字符串返回false

时间:2016-05-11 18:01:46

标签: ruby

> [x[1].txt,x[0].txt]
[
    [0] "Put your weight on to the shoulders and upper back.",
    [1] "Put your weight on to the shoulders and upper back."
]
> [x[1].txt,x[0].txt].map &:class
[
    [0] String < Object,
    [1] String < Object
]
> x[1].txt == x[0].txt
false

怎么可能呢?

更新

读了一下后我发现了这个:

y = x.map{|z| z.txt.toutf8 }
[
    [0] "Put your weight on to the shoulders and upper back.",
    [1] "窶ィPut your weight on to the shoulders and upper back.",
    [2] "窶ィPut your weight on to the shoulders and upper back."
]

所以字符串不一样,但没有.toutf8它看起来完全一样,是什么原因?

最重要的是,如何去除这些字符?

1 个答案:

答案 0 :(得分:0)

字符串可能是不同的编码。要找出字符串的编码,请尝试以下方法:

[x[1].txt.encoding,x[0].txt.encoding]

如果结果是这种情况,则可能是来自界面(例如View,REST API端点或文件源)的问题,或者它可能是数据库的存储/转换问题。 / p>

如果您的字符串编码不匹配,您可以执行以下操作:

x.map {|text| text.encode!("UTF-8", invalid: :replace, undef: :replace).force_encoding("utf-8") }

如果您的编码已匹配,则可以使用此gsub调用从字符串中删除这些非ASCII字符:

x.map {|text| text.gsub!(/[^\001-\176]+/, "") }

完成此操作后,您将获得以下信息:

[
  [0] "Put your weight on to the shoulders and upper back.", 
  [1] "Put your weight on to the shoulders and upper back.", 
  [2] "Put your weight on to the shoulders and upper back."
]

正则表达式将删除ASCII代码1(八进制001)和ASCII代码126(八进制176)之间的任何字符。这有效地擦除了任何非ASCII字符(和ASCII 0)的字符串。

如果您需要“扩展ASCII”用于国际字符集,例如ISO-8859字符集或Windows 1252,甚至特定的Unicode字符,您可以通过更改要包括的数字来扩展范围以包括这些字符那些人物。