使用Java创建gettext二进制MO文件

时间:2012-02-06 12:14:35

标签: java gettext

我尝试创建一个实用程序来解析gettext po文件并生成二进制mo文件。解析器很简单(我的co。不使用模糊,复数等东西,只是msgid / msgstr),但生成器不起作用。

这是the description of the mo file,这里是the original generator source(它是C),并找到了一个php脚本(https://github.com/josscrowcroft/php.mo/blob/master/php-mo .php)也。

我的代码:

public void writeFile(String filename, Map<String, String> polines) throws FileNotFoundException, IOException {

  DataOutputStream os = new DataOutputStream(new FileOutputStream(filename));
  HashMap<String, String> bvc = new HashMap<String, String>();
  TreeMap<String, String> hash = new TreeMap(bvc);
  hash.putAll(polines);


  StringBuilder ids = new StringBuilder();
  StringBuilder strings = new StringBuilder();
  ArrayList<ArrayList> offsets = new ArrayList<ArrayList>();
  ArrayList<Integer> key_offsets = new ArrayList<Integer>();
  ArrayList<Integer> value_offsets = new ArrayList<Integer>();
  ArrayList<Integer> temp_offsets = new ArrayList<Integer>();

  for (Map.Entry<String, String> entry : hash.entrySet()) {
    String id = entry.getKey();
    String str = entry.getValue();

    ArrayList<Integer> offsetsItems = new ArrayList<Integer>();
    offsetsItems.add(ids.length());
    offsetsItems.add(id.length());
    offsetsItems.add(strings.length());
    offsetsItems.add(str.length());
    offsets.add((ArrayList) offsetsItems.clone());

    ids.append(id).append('\0');
    strings.append(str).append('\0');
  }
  Integer key_start = 7 * 4 + hash.size() * 4 * 4;
  Integer value_start = key_start + ids.length();

  Iterator e = offsets.iterator();
  while (e.hasNext()) {
    ArrayList<Integer> offEl = (ArrayList<Integer>) e.next();
    key_offsets.add(offEl.get(1));
    key_offsets.add(offEl.get(0) + key_start);
    value_offsets.add(offEl.get(3));
    value_offsets.add(offEl.get(2) + value_start);
  }

  temp_offsets.addAll(key_offsets);
  temp_offsets.addAll(value_offsets);


  os.writeByte(0xde);
  os.writeByte(0x12);
  os.writeByte(0x04);
  os.writeByte(0x95);

  os.writeByte(0x00);
  os.writeInt(hash.size() & 0xff);
  os.writeInt((7 * 4) & 0xff);
  os.writeInt((7 * 4 + hash.size() * 8) & 0xff);
  os.writeInt(0x00000000);
  os.writeInt(key_start & 0xff);

  Iterator offi = temp_offsets.iterator();
  while (offi.hasNext()) {
    Integer off = (Integer) offi.next();
    os.writeInt(off & 0xff);
  }
  os.writeUTF(ids.toString());
  os.writeUTF(strings.toString());

  os.close();
}

os.writeInt(key_start);似乎没问题,与原始工具的差异在theese字节后开始生成mo文件。

怎么了? (除了我可怕的英语..)

1 个答案:

答案 0 :(得分:2)

在将您的实施与文档进行比较时,我注意到两件事:

  1. 修订版,直接在幻数之后,应该是一个int。这似乎有效,可能是因为writeByte输出了一些填充。然而,使用writeInt会更清楚。
  2. & 0xFF来电中的writeInt部分可能不对。需要此操作将有符号字节转换为无符号整数值,对于正整数,不需要它。
  3. 要解析po文件,您还可以查看zanata/tennera project on github

    编辑: writeUTF调用也有问题,因为它使用两个字节长度为输出添加前缀,并使用javas modified utf encoding修改'\ 0'字节。您可以通过以下方式替换它:

    os.write(ids.toString().getBytes("utf-8"));
    os.write(strings.toString().getBytes("utf-8"));
    

    另一个编辑:我无法理解这段代码,有关字符串长度的问题与utf8字节和DataOutputStreambig-endian instead of little endian有关。我认为以下代码应该可以工作,区别在于msgfmt生成的文件包含一个可选的哈希表来加速访问:

    public static void writeInt(OutputStream os, int i) throws IOException {
        os.write((i) & 0xFF);
        os.write((i >>> 8) & 0xFF);
        os.write((i >>> 16) & 0xFF);
        os.write((i >>> 24) & 0xFF);
    }
    
    public static void writeFile(String filename, TreeMap<String, String> polines) throws IOException {
        OutputStream os = new BufferedOutputStream(new FileOutputStream(filename));
        ByteArrayOutputStream bos = new ByteArrayOutputStream();
        int size = polines.size();
        int[] indices = new int[size*2];
        int[] lengths = new int[size*2];
        int idx = 0;
        // write the strings and translations to a byte array and remember offsets and length in bytes
        for (String key : polines.keySet()) {
            byte[] utf = key.getBytes("utf-8");
            indices[idx] = bos.size();
            lengths[idx] = utf.length;
            bos.write(utf);
            bos.write(0);
            idx++;
        }
        for (String val : polines.values()) {
            byte[] utf = val.getBytes("utf-8");
            indices[idx] = bos.size();
            lengths[idx] = utf.length;
            bos.write(utf);
            bos.write(0);
            idx++;
        }
    
        try {
            int headerLength = 7*4;
            int tableLength = size*2*2*4;
            writeInt(os, 0x950412DE);                   // magic
            writeInt(os, 0);                            // file format revision
            writeInt(os, size);                         //number of strings
            writeInt(os, headerLength);                 // offset of table with original strings
            writeInt(os, headerLength + tableLength/2); // offset of table with translation strings
            writeInt(os, 0);                            // size of hashing table
            writeInt(os, headerLength + tableLength);   // offset of hashing table, not used since length is 0
    
            for (int i=0; i<size*2; i++) {
                writeInt(os, lengths[i]);
                writeInt(os, headerLength + tableLength + indices[i]);
            }
    
            // copy keys and translations
            bos.writeTo(os);
    
        } finally {
            os.close();
        }
    }