解码多次编码的String

时间:2016-10-14 13:52:29

标签: java encoding character-encoding pentaho url-encoding

我编写了Java代码来解码用" UTF-8"编码的字符串。该String被编码了三次。我在ETL中使用此代码。所以,我可以连续三次使用ETL步骤,但效率会有点低。我在互联网上进行了研究,但没有找到任何有希望的东西。在Java中有什么方法可以解码多次编码的String吗?

这是我的输入字符串" uri":

file:///C:/Users/nikhil.karkare/dev/pentaho/data/ba-repo-content-original/public/Development+Activity/Defects+Unresolved+%252528by+Non-Developer%252529.xanalyzer

这是我解码此字符串的代码:

import java.net.URLDecoder;
import java.nio.charset.StandardCharsets;
import java.io.*;

String decodedValue;
public boolean processRow(StepMetaInterface smi, StepDataInterface sdi) throws KettleException {
// First, get a row from the default input hop
//
Object[] r = getRow();
// If the row object is null, we are done processing.
//
if (r == null) {
    setOutputDone();
    return false;
}

// It is always safest to call createOutputRow() to ensure that your output row's Object[] is large
// enough to handle any new fields you are creating in this step.
//
Object[] outputRow = createOutputRow(r, data.outputRowMeta.size());

String newFileName = get(Fields.In, "uri").getString(r);

try{
    decodedValue = URLDecoder.decode(newFileName, "UTF-8");
}
catch (UnsupportedEncodingException e) {
throw new AssertionError("UTF-8 is unknown");
}
// Set the value in the output field
//
get(Fields.Out, "decodedValue").setValue(outputRow, decodedValue);

// putRow will send the row on to the default output hop.
//
putRow(data.outputRowMeta, outputRow);

return true;}

此代码的输出如下:

file:///C:/Users/nikhil.karkare/dev/pentaho/data/ba-repo-content-original/public/Development Activity/Defects Unresolved %2528by Non-Developer%2529.xanalyzer

当我在ETL中运行此代码三次时,我得到了我想要的输出,这就是:

file:///C:/Users/nikhil.karkare/dev/pentaho/data/ba-repo-content-original/public/Development Activity/Defects Unresolved (by Non-Developer).xanalyzer

2 个答案:

答案 0 :(得分:1)

只是for循环完成了这项工作:

String newFileName = get(Fields.In, "uri").getString(r);
decodedValue = newFileName;
for (int i=0; i<=3; i++){

try{
    decodedValue = URLDecoder.decode(decodedValue, "UTF-8");
}
catch (UnsupportedEncodingException e) {
throw new AssertionError("UTF-8 is unknown");
}
}

答案 1 :(得分:1)

网址编码用resp替换%()%25%28%29

String s = "file:///C:/Users/nikhil.karkare/dev/pentaho/data/"
    + "ba-repo-content-original/public/Development+Activity/"
    + "Defects+Unresolved+%252528by+Non-Developer%252529.xanalyzer";

// %253528 ... %252529
s = URLDecoder.decode(s, "UTF-8");
// %2528 ... %2529
s = URLDecoder.decode(s, "UTF-8");
// %28 .. %29
s = URLDecoder.decode(s, "UTF-8");
// ( ... )