如何防止附件与Elasticsearch和Tire一起存储在_source中?

时间:2012-08-08 21:11:40

标签: elasticsearch attachment tire

我在Elasticsearch中使用Tire gem索引了一些PDF附件。这一切都很好,但我会有很多GB的PDF,我们可能会将这些PDF存储在S3中以便访问。现在,base64编码的PDF存储在Elasticsearch _source中,这将使索引变得庞大。我希望将附件编入索引,但不存储,我还没有找到正确的咒语放入Tire的“映射”块以防止它。块现在就像这样:

mapping do
  indexes :id, :type => 'integer'
  indexes :title
  indexes :last_update, :type => 'date'
  indexes :attachment, :type => 'attachment'
end

我尝试了一些变体:

indexes :attachment, :type => 'attachment', :_source => { :enabled => false }

当我运行轮胎时它看起来不错:导入rake任务,但它似乎没有什么区别。有谁知道A)这是否可能?和B)怎么做?

提前致谢。

2 个答案:

答案 0 :(得分:4)

_source field settings包含应从源中排除的字段列表。我想如果轮胎出现这种情况应该这样做:

mapping :_source => { :excludes => ['attachment'] } do
  indexes :id, :type => 'integer'
  indexes :title
  indexes :last_update, :type => 'date'
  indexes :attachment, :type => 'attachment'
end

答案 1 :(得分:0)

@imotov的解决方案对我不起作用。当我执行curl命令

curl -X GET "http://localhost:9200/user_files/user_file/_search?pretty=true" -d '{"query":{"query_string":{"query":"rspec"}}}'

我仍然可以看到搜索结果中包含的附件​​文件的内容。

"_source" : {"user_file":{"id":5,"folder_id":1,"updated_at":"2012-08-16T11:32:41Z","attachment_file_size":179895,"attachment_updated_at":"2012-08-16T11:32:41Z","attachment_file_name":"hw4.pdf","attachment_content_type":"application/pdf","created_at":"2012-08-16T11:32:41Z","attachment_original":"JVBERi0xL .....

这是我的实施:

include Tire::Model::Search
include Tire::Model::Callbacks

def self.search(folder, params)
  tire.search() do
    query { string params[:query], default_operator: "AND"} if params[:query].present?
    filter :term, folder_id: folder.id
    highlight :attachment_original, :options => {:tag => "<em>"}
  end
end

mapping :_source => { :excludes => ['attachment_original'] } do
  indexes :id, :type => 'integer'
  indexes :folder_id, :type => 'integer'
  indexes :attachment_file_name
  indexes :attachment_updated_at, :type => 'date'
  indexes :attachment_original, :type => 'attachment'
end

def to_indexed_json
   to_json(:methods => [:attachment_original])
end

def attachment_original
  if attachment_file_name.present?
    path_to_original = attachment.path
    Base64.encode64(open(path_to_original) { |f| f.read })
  end    
end