Question

ENV：

使用MongoS的MongoDB（3.2.0）

收集：

用户

文本索引创建：

  BasicDBObject keys = new BasicDBObject();
  keys.put("name","text");

  BasicDBObject options = new BasicDBObject();
  options.put("name", "userTextSearch");
  options.put("unique", Boolean.FALSE);
  options.put("background", Boolean.TRUE);

  userCollection.createIndex(keys, options); // using MongoTemplate

文件：

{ “名称”： “莱昂内尔”}

查询：

db.users.find( { "$text" : { "$search" : "LEONEL" } } ) =＆gt;结果
db.users.find( { "$text" : { "$search" : "leonel" } } ) =＆gt; FOUND（搜索caseSensitive为false）
db.users.find( { "$text" : { "$search" : "LEONÉL" } } ) =＆gt; FOUND（使用diacriticSensitive搜索是错误的）
db.users.find( { "$text" : { "$search" : "LEONE" } } ) =＆gt;找到（部分搜索）
db.users.find( { "$text" : { "$search" : "LEO" } } ) =＆gt;未找到（部分搜索）
db.users.find( { "$text" : { "$search" : "L" } } ) =＆gt;未找到（部分搜索）

我知道为什么我使用查询“LEO”或“L”获得0结果？

不允许使用带有文本索引搜索的正则表达式。

db.getCollection('users')
     .find( { "$text" : { "$search" : "/LEO/i", 
                          "$caseSensitive": false, 
                          "$diacriticSensitive": false }} )
     .count() // 0 results

db.getCollection('users')
     .find( { "$text" : { "$search" : "LEO", 
                          "$caseSensitive": false, 
                          "$diacriticSensitive": false }} )
.count() // 0 results

Mongo文档：

Answer 1

与MongoDB 3.4一样，text search功能旨在支持对文本内容进行不区分大小写的搜索，并使用针对停用词和词干的特定于语言的规则。 supported languages的词干规则基于标准算法，这些算法通常处理常见的动词和名词但不知道专有名词。

没有对部分或模糊匹配的明确支持，但是产生类似结果的术语似乎可以正常工作。例如：＆＃34;品味＆＃34;，＆＃34;品味＆＃34;，品味＆＃34;一切都是为了＃t; tast＆＃34;。试试Snowball Stemming Demo页面，试验更多单词和词干算法。

您匹配的结果是同一个单词＆＃34; LEONEL＆＃34;的所有变体，并且仅根据大小写和变音符号而有所不同。除非＆＃34; LEONEL＆＃34;可以根据所选语言的规则缩短某些内容，这些是唯一可以匹配的变体类型。

如果你想进行有效的部分匹配，你需要采取不同的方法。对于一些有用的想法，请参阅：

Efficient Techniques for Fuzzy and Partial matching in MongoDB James Tan
Efficient Partial Keyword Searches

您可以在MongoDB问题跟踪器中观看/提升相关的改进请求：SERVER-15090: Improve Text Indexes to support partial word match。

Answer 2

由于Mongo当前默认情况下不支持部分搜索...

我创建了一个简单的静态方法。

import mongoose from 'mongoose'

const PostSchema = new mongoose.Schema({
    title: { type: String, default: '', trim: true },
    body: { type: String, default: '', trim: true },
});

PostSchema.index({ title: "text", body: "text",},
    { weights: { title: 5, body: 3, } })

PostSchema.statics = {
    searchPartial: function(q, callback) {
        return this.find({
            $or: [
                { "title": new RegExp(q, "gi") },
                { "body": new RegExp(q, "gi") },
            ]
        }, callback);
    },

    searchFull: function (q, callback) {
        return this.find({
            $text: { $search: q, $caseSensitive: false }
        }, callback)
    },

    search: function(q, callback) {
        this.searchFull(q, (err, data) => {
            if (err) return callback(err, data);
            if (!err && data.length) return callback(err, data);
            if (!err && data.length === 0) return this.searchPartial(q, callback);
        });
    },
}

export default mongoose.models.Post || mongoose.model('Post', PostSchema)

使用方法：

import Post from '../models/post'

Post.search('Firs', function(err, data) {
   console.log(data);
})

Answer 3

如果您想利用 MongoDB 全文搜索的所有优势并希望部分匹配（可能用于自动完成），Shrikant Prabhu 提到的基于 n-gram 的方法对我来说是正确的解决方案。显然，您的里程可能会有所不同，这在索引大型文档时可能不切实际。

在我的例子中，我主要需要部分匹配来处理文档的 title 字段（以及其他一些短字段）。

我使用了边缘 n-gram 方法。那是什么意思？简而言之，您将像 "Mississippi River" 这样的字符串变成像 "Mis Miss Missi Missis Mississ Mississi Mississip Mississipp Mississippi Riv Rive River" 这样的字符串。

受刘根的this code启发，我想出了这个方法：

function createEdgeNGrams(str) {
    if (str && str.length > 3) {
        const minGram = 3
        const maxGram = str.length
        
        return str.split(" ").reduce((ngrams, token) => {
            if (token.length > minGram) {   
                for (let i = minGram; i <= maxGram && i <= token.length; ++i) {
                    ngrams = [...ngrams, token.substr(0, i)]
                }
            } else {
                ngrams = [...ngrams, token]
            }
            return ngrams
        }, []).join(" ")
    } 
    
    return str
}

let res = createEdgeNGrams("Mississippi River")
console.log(res)

现在为了在 Mongo 中使用它，我在文档中添加了一个 searchTitle 字段，并通过使用上述函数将实际的 title 字段转换为边 n-gram 来设置其值。我还为 "text" 字段创建了一个 searchTitle 索引。

然后我使用投影从搜索结果中排除 searchTitle 字段：

db.collection('my-collection')
  .find({ $text: { $search: mySearchTerm } }, { projection: { searchTitle: 0 } })

Answer 4

无需创建索引，我们可以简单地使用：

db.users.find({ name: /<full_or_partial_text>/i})（不区分大小写）

Answer 5

我将@Ricardo Canelas的答案包装在猫鼬插件中on npm

进行了两项更改： -兑现承诺 -搜索类型为if Account.enabled_feature?('Mango') { id: 5, name: 'Mango' }

的任何字段

这是重要的源代码：

String

用法

// mongoose-partial-full-search

module.exports = exports = function addPartialFullSearch(schema, options) {
  schema.statics = {
    ...schema.statics,
    makePartialSearchQueries: function (q) {
      if (!q) return {};
      const $or = Object.entries(this.schema.paths).reduce((queries, [path, val]) => {
        val.instance == "String" &&
          queries.push({
            [path]: new RegExp(q, "gi")
          });
        return queries;
      }, []);
      return { $or }
    },
    searchPartial: function (q, opts) {
      return this.find(this.makePartialSearchQueries(q), opts);
    },

    searchFull: function (q, opts) {
      return this.find({
        $text: {
          $search: q
        }
      }, opts);
    },

    search: function (q, opts) {
      return this.searchFull(q, opts).then(data => {
        return data.length ? data : this.searchPartial(q, opts);
      });
    }
  }
}

exports.version = require('../package').version;

Answer 6

如果您使用变量存储要搜索的字符串或值：

它将与Regex一起使用，如下：

{ collection.find({ name of Mongodb field: new RegExp(variable_name, 'i') }

这里，我是忽略大小写选项

Answer 7

在 MongodB 中完整/部分搜索“纯”流星项目

我修改了 flash 的代码以将它与 Meteor-Collections 和 simpleSchema 一起使用，但没有使用 mongoose（意思是：删除 .plugin()-method 和 schema.path 的使用（尽管这看起来是一个 simpleSchema-attribute in flash 的代码，它没有为我解析)) 并返回结果数组而不是游标。

认为这可能对某人有所帮助，所以我分享了它。

export function partialFullTextSearch(meteorCollection, searchString) {

    // builds an "or"-mongoDB-query for all fields with type "String" with a regEx as search parameter
    const makePartialSearchQueries = () => {
        if (!searchString) return {};
        const $or = Object.entries(meteorCollection.simpleSchema().schema())
            .reduce((queries, [name, def]) => {
                def.type.definitions.some(t => t.type === String) &&
                queries.push({[name]: new RegExp(searchString, "gi")});
                return queries
            }, []);
        return {$or}
    };

    // returns a promise with result as array
    const searchPartial = () => meteorCollection.rawCollection()
        .find(makePartialSearchQueries(searchString)).toArray();

    // returns a promise with result as array
    const searchFull = () => meteorCollection.rawCollection()
        .find({$text: {$search: searchString}}).toArray();

    return searchFull().then(result => {
        if (result.length === 0) throw null
        else return result
    }).catch(() => searchPartial());

}

这将返回一个 Promise，因此可以这样调用它（即作为服务器端异步 Meteor-Method searchContact 的返回）。这意味着您在调用此方法之前将 simpleSchema 附加到您的集合。

return partialFullTextSearch(Contacts, searchString).then(result => result);

Answer 8

对我有用的快速而肮脏的解决方案：首先使用文本搜索，如果没有找到任何内容，然后使用正则表达式进行另一个查询。如果您不想进行两个查询-$or也可以，但是requires all fields in query to be indexed。

此外，您最好不要使用不区分大小写的rx，因为it can't rely on indexes。就我而言，我已经复制了使用过的字段的小写字母。

Answer 9

这里解释了基于n-gram的良好的模糊匹配方法（还介绍了如何使用前缀匹配为结果评分更高） https://medium.com/xeneta/fuzzy-search-with-mongodb-and-python-57103928ee5d

注意：基于n-gram的方法可能会扩展存储范围，并且mongodb集合的大小会增加。

Answer 10

我创建了一个附加字段，它将我要搜索的文档中的所有字段组合在一起。然后我只使用正则表达式：

user = {
    firstName: 'Bob',
    lastName: 'Smith',
    address: {
        street: 'First Ave',
        city: 'New York City',
        }
    notes: 'Bob knows Mary'
}

// add combined search field with '+' separator to preserve spaces
user.searchString = `${user.firstName}+${user.lastName}+${user.address.street}+${user.address.city}+${user.notes}`

db.users.find({searchString: {$regex: 'mar', $options: 'i'}})
// returns Bob because 'mar' matches his notes field

// TODO write a client-side function to highlight the matching fragments

Answer 11

import re

db.collection.find({"$or": [{"your field name": re.compile(text, re.IGNORECASE)},{"your field name": re.compile(text, re.IGNORECASE)}]})

MongoDB完整和部分文本搜索

11 个答案:

用法