MongoDB - 全文索引 - 全文搜索 - 词干

时间:2014-03-30 22:20:36

标签: mongodb indexing full-text-search

我注意到如果我在某个集合的全文搜索启用字符串字段中输入值'seasons',那么MongoDB在查询“season”时会找到此值。但如果我输入更复杂的东西,例如“老鼠”或“标准”,当我分别查询“鼠标”或“标准”时,它找不到这些值。这是正常的,是否有明确的规则,MongoDB能够阻止什么不是?

[test] 2014-03-30 18:25:09.551 >>> db.TestFullText7.find();
{
        "_id" : ObjectId("53389720063ab25d2d55c94c"),
        "dt" : ISODate("2014-03-30T22:13:52.717Z"),
        "title" : "mice",
        "txt" : "mice"
}
{
        "_id" : ObjectId("5338994c063ab25d2d55c94d"),
        "dt" : ISODate("2014-03-30T22:23:08.259Z"),
        "title" : "criteria",
        "txt" : "criteria"
}
{
        "_id" : ObjectId("533899c5063ab25d2d55c94e"),
        "dt" : ISODate("2014-03-30T22:25:09.551Z"),
        "title" : "seasons",
        "txt" : "seasons"
}
[test] 2014-03-30 18:25:13.295 >>> db.runCommand({"text" : "TestFullText7", "search" : "season"});
{
        "queryDebugString" : "season||||||",
        "language" : "english",
        "results" : [
                {
                        "score" : 2,
                        "obj" : {
                                "_id" : ObjectId("533899c5063ab25d2d55c94e"),
                                "dt" : ISODate("2014-03-30T22:25:09.551Z"),
                                "title" : "seasons",
                                "txt" : "seasons"
                        }
                }
        ],
        "stats" : {
                "nscanned" : 1,
                "nscannedObjects" : 0,
                "n" : 1,
                "nfound" : 1,
                "timeMicros" : 148
        },
        "ok" : 1
}
[test] 2014-03-30 18:25:22.406 >>> db.runCommand({"text" : "TestFullText7", "search" : "mouse"});
{
        "queryDebugString" : "mous||||||",
        "language" : "english",
        "results" : [ ],
        "stats" : {
                "nscanned" : 0,
                "nscannedObjects" : 0,
                "n" : 0,
                "nfound" : 0,
                "timeMicros" : 110
        },
        "ok" : 1
}
[test] 2014-03-30 18:25:30.986 >>> db.TestFullText7.getIndexes();
[
        {
                "v" : 1,
                "key" : {
                        "_id" : 1
                },
                "ns" : "test.TestFullText7",
                "name" : "_id_"
        },
        {
                "v" : 1,
                "key" : {
                        "_fts" : "text",
                        "_ftsx" : 1
                },
                "ns" : "test.TestFullText7",
                "name" : "$**_text",
                "weights" : {
                        "$**" : 1
                },
                "default_language" : "english",
                "language_override" : "language",
                "textIndexVersion" : 1
        }
]
[test] 2014-03-30 18:25:45.228 >>>

1 个答案:

答案 0 :(得分:1)

MongoDB使用Snowball词干库。不幸的是,这看起来是这个库的局限之一。

您可以看到英语词干分析器的页面 here。比较词汇+词干等效页面,你可以看到"鼠标"成为" Mous"和"小鼠"仍然存在"老鼠"。

您可以在代码库herehere

中看到MongoDB使用Snowball