Ignoring Multiple Whitespace Characters in a MongoDB Query

时间:2016-07-11 21:59:17

标签: python regex mongodb

I have a MongoDB query that searches for addresses. The problem is that if a user accidentally adds an extra whitespace, the query will not find the address. For example, if the user types 123 Fakeville St instead of 123 Fakeville St, the query will not return any results.

Is there a simple way to deal with this issue, perhaps using $regex? I guess the space would need to be ignore between the house number (123) and the street name (Fakeville). My query is set up like this:

@app.route('/getInfo', methods=['GET'])
def getInfo():
    address = request.args.get("a")
    addressCollection = myDB["addresses"]
    addressJSON = []
    regex = "^" + address

    for address in addressCollection.find({'Address': {'$regex':regex,'$options':'i'} },{"Address":1,"_id":0}).limit(3):
        addressJSON.append({"Address":address["Address"]})
    return jsonify(addresses=addressJSON)

2 个答案:

答案 0 :(得分:1)

Clean up the query before sending it off:

>> import re
>>> re.sub(r'\s+', ' ', '123  abc')
'123 abc'
>>> re.sub(r'\s+', ' ', '123    abc def   ghi')
'123 abc def ghi'

You'll probably want to make sure that the data in your database is similarly normalised. Also consider similar strategies for things like punctuation.

In fact, using a regex for this seems overly strict, as well as reinventing the wheel. Consider using a proper search engine such as Lucene or Elasticsearch.

答案 1 :(得分:0)

您可以尝试使用正则表达式的替代方法是使用MongoDB text indexes。通过在字段上添加文本索引,您可以使用$text运算符

执行文本搜索

例如:

db.coll.find(
            { $text:{$search:"123 Fakeville St"}},
            { score: { $meta: "textScore" } } )
       .sort( { score: { $meta: "textScore" } } ).limit(1)

这应该适用于以下条目:" 123 Fakeville St。"," 123 fakeville street"等等。只要地址的重要部分使其成为。

查看有关$text behaviour

的更多信息