为什么这个简单的Flink测试偶尔会失败?

时间:2017-12-23 19:53:21

标签: java apache-flink stream-processing

我确定这必须是一个Flink问题,因为经过测试的代码非常简单。

StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
// I don't need this for this particular example, but I use it in other place in my code.
env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);

SingleOutputStreamOperator<String> linesSource = env.readTextFile(inputFile).setParallelism(1);

SingleOutputStreamOperator<PositionEvent> mappedlines = linesSource.map(new Tokenizer());

SpeedRadar.run(mappedlines)
          .writeAsCsv(String.format("%s/%s", outputFolder, SPEED_RADAR_FILE));

SpeedRadar类的位置是:

public final class SpeedRadar {

    private static final int MAXIMUM_SPEED = 90;

    public static SingleOutputStreamOperator<SpeedEvent> run(SingleOutputStreamOperator<PositionEvent> stream) {
        return stream
                .filter((PositionEvent e) -> e.f2 > MAXIMUM_SPEED)
                .map(new ToSpeedEvent());
    }

我认为向您展示POJO和其他一些缺失的课程并不重要。问题是我正在读取像这样的csv文件中的行:130,1,65,0,3,0,49,100000并且我正在过滤第三个字段大于90的行。

这是我的简单测试用例:

public class SpeedRadarTests extends StreamingMultipleProgramsTestBase {

    private StreamExecutionEnvironment env;

    @Before
    public void createEnv() {
        env = StreamExecutionEnvironment.getExecutionEnvironment();
        env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);
        SpeedEventSink.values.clear();
    }

    @Test
    public void shouldDetectTwoOverSpeedEvents() throws Exception {

        String[] data = new String[]{
                "30,1,91,1,3,0,10,100000",
                "60,2,90,2,2,1,20,200000",
                "90,3,99,3,1,0,30,300000"
        };

        SingleOutputStreamOperator<PositionEvent> source
                = new PositionStreamBuilder(env).fromLines(data).build();

        SpeedRadar.run(source).addSink(new SpeedEventSink());
        env.execute();

        Map<String, SpeedEvent> events = SpeedEventSink.values;
        assertEquals(2, events.size());

    private static class SpeedEventSink implements SinkFunction<SpeedEvent> {

        static final Map<String, SpeedEvent> values = new HashMap<>();

        @Override
        public synchronized void invoke(SpeedEvent speedEvent) throws Exception {
            // I'm sure f1 is unique
            values.put(speedEvent.f1, speedEvent);
        }
    }

}

这就是我创建&#34;测试流&#34;:

的方法
public class PositionStreamBuilder {

    private StreamExecutionEnvironment env;
    private SingleOutputStreamOperator<PositionEvent> stream;

    public PositionStreamBuilder(StreamExecutionEnvironment env) {
        this.env = env;
    }

    public PositionStreamBuilder fromLines(String[] lines) {
        stream = env.fromElements(lines)
                .setParallelism(1)
                .map(new VehicleTelematics.Tokenizer());  // the same Tokenizer as before
        return this;
    }

    // more methods here

    public SingleOutputStreamOperator<PositionEvent> build() {
        return stream;
    }

}

问题是,有时,我不知道为什么,断言失败,因为Map只有一个元素。我按照Flink documentation中的步骤进行操作,唯一的区别是我没有将并行度设置为1(但无论如何,它不应该在此测试中产生影响)。

问题是,不仅这个测试失败了,有时其他不应该失败的测试失败。就像Flink有时会错过一个事件。

当我使用flink run运行代码时,我从未错过任何元素。

0 个答案:

没有答案