Question

有没有办法从UTF-8编码的文件中删除BOM？

我知道我的所有JSON文件都是用UTF-8编码的，但编辑JSON文件的数据输入人员将其保存为带有BOM的UTF-8。

当我运行我的Ruby脚本来解析JSON时，它失败并出现错误。我不想手动打开58+ JSON文件并在没有BOM的情况下转换为UTF-8。

Answer 1

使用ruby＆gt; = 1.9.2，您可以使用模式r:bom|utf-8

这应该有用（我没有和json一起测试）：

json = nil #define the variable outside the block to keep the data
File.open('file.txt', "r:bom|utf-8"){|file|
  json = JSON.parse(file.read)
}

如果BOM在文件中可用，则无关紧要。

Andrew评论道，File#rewind不能与BOM一起使用。

如果您需要回放功能，则必须记住该位置并将rewind替换为pos=：

#Prepare test file
File.open('file.txt', "w:utf-8"){|f|
  f << "\xEF\xBB\xBF" #add BOM
  f << 'some content'
}

#Read file and skip BOM if available
File.open('file.txt', "r:bom|utf-8"){|f|
  pos =f.pos
  p content = f.read  #read and write file content
  f.pos = pos   #f.rewind  goes to pos 0
  p content = f.read  #(re)read and write file content
}

Answer 2

所以，解决方案是通过gsub进行搜索并替换BOM！我强制将字符串编码为UTF-8，并强制将正则表达式模式编码为UTF-8。

我能够通过查看http://self.d-struct.org/195/howto-remove-byte-order-mark-with-ruby-and-iconv和http://blog.grayproductions.net/articles/ruby_19s_string

来推导出一个解决方案

def read_json_file(file_name, index)
  content = ''
  file = File.open("#{file_name}\\game.json", "r") 
  content = file.read.force_encoding("UTF-8")

  content.gsub!("\xEF\xBB\xBF".force_encoding("UTF-8"), '')

  json = JSON.parse(content)

  print json
end

Answer 3

您还可以使用File.read和CSV.read方法指定编码，但不指定read模式。

File.read(path, :encoding => 'bom|utf-8')
CSV.read(path, :encoding => 'bom|utf-8')

Answer 4

＆＃34; bom | UTF-8＆＃34;如果您只读取一次文件，编码效果很好，但如果您调用File＃rewind就会失败，就像我在我的代码中所做的那样。为解决这个问题，我做了以下几点：

def ignore_bom
  @file.ungetc if @file.pos==0 && @file.getc != "\xEF\xBB\xBF".force_encoding("UTF-8")
end

似乎运作良好。不确定是否有其他类似的类型字符需要注意，但它们可以很容易地构建到此方法中，可以在您倒带或打开时调用。

Answer 5

对我有用的utf-8 bom字节的服务器端清理：

csv_text.gsub!("\xEF\xBB\xBF".force_encoding(Encoding::BINARY), '')

有没有办法从UTF-8编码的文件中删除BOM？

5 个答案: