如何在Flink中连接两个非密钥流并彼此共享状态?

时间:2019-01-02 06:57:14

标签: apache-flink flink-streaming

Flink版本1.6.1

在下面的示例中,我想连接两个未加密的流。但是似乎两个流不能正确共享状态。我不知道实现它的正确方法。

代码:

<cfscript> 
    theSheet = SpreadsheetNew("Order Details 1");
    SpreadsheetAddRow(theSheet, "NO, ,VENDOR, PART NUMBER, PART NAME, PSI, LEAD TIME,MONTH, YEAR, ,N-5, N-4, N-3, 
    N-2, N-1, N, N+1, N+2, N+3, N+4, PACKING MONTH, PRODUCTION MONTH ",5,1); 
    myFormat2=StructNew();
    myFormat2.bold=false;
    myFormat2=StructNew();
    myFormat2.bold=false;
    myFormat2.alignment="vertical_top";
    SpreadsheetFormatRow(theSheet,myFormat2,6);
    SpreadsheetMergeCells(theSheet,6,25,2,2);
    SpreadsheetMergeCells(theSheet,6,25,3,3);
    SpreadsheetMergeCells(theSheet,6,25,4,4);
    SpreadsheetMergeCells(theSheet,6,25,5,5);
    SpreadsheetMergeCells(theSheet,6,25,7,7);
    SpreadsheetMergeCells(theSheet,26,45,2,2);
    SpreadsheetMergeCells(theSheet,26,45,3,3);
    SpreadsheetMergeCells(theSheet,26,45,4,4);
    SpreadsheetMergeCells(theSheet,26,45,5,5);
    SpreadsheetMergeCells(theSheet,26,45,7,7);
    SpreadsheetAddRows(theSheet,getROW);
</cfscript>

}

输出:

public class TransactionJob {
public static void main(String[] args) throws Exception {
    final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
    DataStream<String> stream1 = env.fromElements("1", "2");
    DataStream<Integer> stream2 = env.fromElements(3, 4, 5);
    ConnectedStreams<String, Integer> connectedStreams = stream1.connect(stream2);
    DataStream<String> resultStream = connectedStreams.process(new StringIntegerCoProcessFunction());
    resultStream.print().setParallelism(1);
    env.execute();
}

private static class StringIntegerCoProcessFunction extends CoProcessFunction<String, Integer, String> implements CheckpointedFunction {
    private transient ListState<String> state1;
    private transient ListState<Integer> state2;

    @Override
    public void processElement1(String value, Context ctx, Collector<String> out) throws Exception {
        state1.add(value);
        print(value);
    }

    @Override
    public void processElement2(Integer value, Context ctx, Collector<String> out) throws Exception {
        state2.add(value);
        print(value.toString());
    }

    private void print(String value) throws Exception {
        StringBuilder builder = new StringBuilder();
        builder.append("input value is " + value + ".");
        builder.append("state1 has ");
        for (String str : state1.get()) {
            builder.append(str + ",");
        }
        builder.append("state2 has ");
        for (Integer integer : state2.get()) {
            builder.append(integer.toString() + ",");
        }
        System.out.println(builder.toString());
    }

    @Override
    public void snapshotState(FunctionSnapshotContext context) throws Exception {

    }

    @Override
    public void initializeState(FunctionInitializationContext context) throws Exception {
        ListStateDescriptor<String> descriptor1 =
                new ListStateDescriptor<>(
                        "state1",
                        TypeInformation.of(new TypeHint<String>() {
                        }));
        ListStateDescriptor<Integer> descriptor2 =
                new ListStateDescriptor<>(
                        "state2",
                        TypeInformation.of(new TypeHint<Integer>() {
                        }));
        state1 = context.getOperatorStateStore().getListState(descriptor1);
        state2 = context.getOperatorStateStore().getListState(descriptor2);
    }
}

我希望最后的输出是

input value is 4.state1 has state2 has 4,
input value is 2.state1 has 2,state2 has 4,
input value is 3.state1 has state2 has 3,
input value is 1.state1 has 1,state2 has 3,
input value is 5.state1 has state2 has 5,

但是实际上输出看起来像输入项已分区。 4和2在一个分区中,3和1在另一个分区中。我想访问input value is XX .state1 has 1,2 state2 has 3,4,5 processElement1中state1和state2中存储的所有数据。

1 个答案:

答案 0 :(得分:1)

您应该修改工作的开始,像这样:

final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.setParallelism(1);
...

这将使整个作业以1的并行度运行。您确实有

resultStream.print().setParallelism(1);

具有将打印接收器的并行度设置为1的效果,但是其余作业正在使用默认的并行度(显然大于1)运行。

或者,您可以通过相同的常量键来对两个流进行键控,然后使用键控状态。