Elasticsearch查询:在两个字段上进行范围查询,但一个是可选字段

时间:2019-07-31 09:00:50

标签: elasticsearch

我需要根据时间戳搜索索引。

文档具有以下字段组合:

  • 开始时间结束时间

  • 开始时间(没有结束时间字段)

伪查询:。 对于给定的时间戳,我希望返回ID匹配的所有文档,并且:
 timestamp >= start_time && timestamp < end_time

但如果没有end_time字段,则查询必须为:
(not exists end_time) && (timestamp > start_time)

弹性查询。 这就是我生气的地方。我无法获得与上述伪查询相当的弹性查询。也许我正以错误的方式(完全可能)接近它。这是我所拥有的:

{
  "query": {
    "bool": {
      "must": [
        {
          "term": {
            "id_s": "SomeIdValue"
          }
        },
        {
          "bool": {
            "should": [
              {
                "must": [
                  {
                    "must_not": [
                      {
                        "exists": {
                          "field": "end_time_dt"
                        }
                      }
                    ]
                  },
                  {
                    "range": {
                      "start_time_dt": {
                        "lte": "2019-07-12T03:20:22"
                      }
                    }
                  }
                ]
              },
              {
                "filter": [
                  {
                    "range": {
                      "start_time_dt": {
                        "lte": "2019-07-12T03:20:22"
                      }
                    }
                  },
                  {
                    "range": {
                      "end_time_dt": {
                        "gte": "2019-07-12T03:20:22"
                      }
                    }
                  }
                ]
              }
            ]
          }
        }
      ]
    }
  }
}

但这给了我[must] query malformed, no start_object after query name

如何构造此查询?我在正确的轨道上吗?

提前谢谢!

3 个答案:

答案 0 :(得分:1)

您的查询在语法上是错误的。正确的查询是:

{{1}}

答案 1 :(得分:0)

逻辑上有一个小错误。理想情况下,比较应该像这样gte start_time_dt和lte end_time_dt。您进行了其他操作,以便转换为时间戳<= start_time &&时间戳> end_time。

正确的查询是

{
  "query": {
    "bool": {
      "filter": [
        {
          "term": {
            "id_s": "SomeIdValue"
          }
        },
        {
          "bool": {
            "should": [
              {
                "bool": {
                  "must_not": [
                    {
                      "exists": {
                        "field": "end_time_dt"
                      }
                    }
                  ],
                  "must": [
                    {
                      "range": {
                        "start_time_dt": {
                          "gte": "2019-07-12T03:20:22"
                        }
                      }
                    }
                  ]
                }
              },
              {
                "bool": {
                  "must": [
                    {
                      "range": {
                        "start_time_dt": {
                          "gte": "2019-07-12T03:20:22"
                        }
                      }
                    },
                    {
                      "range": {
                        "end_time_dt": {
                          "lte": "2019-07-12T03:20:22"
                        }
                      }
                    }
                  ]
                }
              }
            ]
          }
        }
      ]
    }
  }
}

希望这会有所帮助!

答案 2 :(得分:0)

我相信答案应该是class DataGenerator(keras.utils.Sequence): """Generates data for Keras.""" def __init__(self,csv_path ,batch_size=32, dim=(224, 224), n_channels=3, n_classes=8, shuffle=True): self.img_files = pd.read_csv(csv_path) self.labels = self.img_files.iloc[:, 7:16].values self.batch_size = batch_size self.dim = dim self.n_channels = n_channels self.n_classes = n_classes self.shuffle = shuffle self.on_epoch_end() def __len__(self): """Denotes the number of batches per epoch.""" return int(np.floor(len(self.img_files) / self.batch_size)) def __getitem__(self, index): indexes = self.indexes[index*self.batch_size:(index+1)*self.batch_size] # Find list of IDs img_files_temp = [self.img_files['Left-Fundus'][k] for k in indexes] X, y = self.__data_generation(img_files_temp) return X, y def on_epoch_end(self): """Updates indexes after each epoch.""" self.indexes = np.arange(len(self.img_files)) if self.shuffle == True: np.random.shuffle(self.indexes) def __data_generation(self, img_files_temp): X = np.empty((self.batch_size,self.dim[0], self.dim[1], self.n_channels)) y = np.empty((self.batch_size, self.n_classes)) for i, img_file in enumerate(img_files_temp): img = skimage.io.imread(os.path.join('../Customized-DataLoader/data/train_img', img_file)) img = skimage.transform.resize(img, output_shape=self.dim, mode='constant', preserve_range=True) X[i,:,:,:] = img index_of_label= int(self.img_files.index[self.img_files['Left-Fundus'] ==img_file][0]) if len(self.img_files.loc[index_of_label][7:16].values)!= 8: continue y[:,] = self.img_files.loc[index_of_label][7:16].values return X, y model = keras.applications.densenet.DenseNet121(include_top=False, input_shape=(224, 224, 3)) x = model.output x = Flatten()(x) x = Dense(512)(x) x = Activation('relu')(x) x = Dropout(0.5)(x) output1 = Dense(1, activation = 'sigmoid')(x) output2 = Dense(1, activation = 'sigmoid')(x) output3 = Dense(1, activation = 'sigmoid')(x) output4 = Dense(1, activation = 'sigmoid')(x) output5 = Dense(1, activation = 'sigmoid')(x) output6 = Dense(1, activation = 'sigmoid')(x) output7 = Dense(1, activation = 'sigmoid')(x) output8 = Dense(1, activation = 'sigmoid')(x) model = Model(model.inputs,[output1,output2,output3,output4,output5, output6, output7, output8]) # print(model.summary()) model.compile(optimizers.rmsprop(lr = 0.0001, decay = 1e-6), loss = ["binary_crossentropy","binary_crossentropy","binary_crossentropy","binary_crossentropy", "binary_crossentropy","binary_crossentropy","binary_crossentropy","binary_crossentropy"],metrics = ["accuracy"]) def generator_wrapper(generator): for batch_x,batch_y in generator: yield (batch_x,[batch_y[:,i] for i in range(8)]) train_file = pd.read_csv('train.csv') test_file = pd.read_csv('test.csv') train_generator = DataGenerator(csv_path = 'train.csv') valid_generator = DataGenerator(csv_path = 'test.csv') batch_size = 32 num_epochs = 1 STEP_SIZE_VALID =len(train_file)//batch_size STEP_SIZE_TRAIN=len(test_file)//batch_size model.fit_generator(generator=generator_wrapper(train_generator), steps_per_epoch=STEP_SIZE_TRAIN,validation_data=generator_wrapper(valid_generator), validation_steps=STEP_SIZE_VALID, epochs=5,verbose=1, workers =12,use_multiprocessing=True) 而不是must。我说这两个条件的原因:我认为OP意图是必须(不存在AND范围)

should
相关问题