字符串拆分基于模式和大小

时间:2018-03-08 02:45:01

标签: ruby string

我想拆分一个查询字符串,如:

"(first_name:zach AND last_name:woods) OR (first_name:thomas AND last_name:middleditch) OR (first_name:martin AND last_name:starr) OR "...

进入子字符串,每个字符串不超过5000个字符,我想在模式" OR "上拆分。

帮助将不胜感激。

2 个答案:

答案 0 :(得分:1)

如果您的查询与示例类似,则可以按OR拆分,然后遍历子字符串将它们连接在一起,直到达到5000个字符。

original_query = "(first_name:zach AND last_name:woods) OR ..."
split_arr = original_query.split(/(?<=OR)/) # Split but keeps delimiter OR
result = []
pattern = ""
split_arr.each do |query|
  if (pattern.length + query.length) > 5000 # If reached limit
    result.push(pattern)                    # Store the current pattern
    pattern = query                         # Start new substring   
  else                                      # Else
    pattern = pattern + " " + query         # Just add more query to current pattern
  end
end

result.push(pattern) if pattern.length > 0  # Check for the final case

puts result

然后,您将获得具有少于5000个字符的子串的数组result。但是,如果您的字符串是一个SQL查询(可能),那么子字符串在语法上是否正确取决于您的原始查询。

答案 1 :(得分:0)

在构建查询本身时最好有这些查询约束。

如果你仍想使用这种方法,一种方法是scan条件,并根据你喜欢的大小连接它们。

# Scan all matching conditions
conditions = str.scan(/first_name:[a-z]+ AND last_name:[a-z]+/)

# Final queries array
result = []

# Iterate over the conditions array as batch collection and build query
# Considering average size of each one as 35, batching group of 140 items
conditions.in_groups_of(140) { |group| group.reduce { |x, y| result << (x + (y.nil? ? '' : ' OR '+ y)) } }

结果数组将按大小分割查询。