将全文与其他索引相结合

时间:2016-02-07 23:32:52

标签: mongodb mongodb-query

我在创建日期有一个全文索引和一个索引。

我对日期的查询会很快(一秒钟内)返回一个漂亮的小44条记录:

> db.oneMillionDocumentsIndexed.count({created: {$lte: ISODate("2016-02-06T15:34:59.019Z")} })
44

但是,如果我将其与文本搜索结合起来,则查询速度非常慢:

> db.oneMillionDocumentsIndexed.count({
                                created: {$lte: ISODate("2016-02-06T15:34:59.019Z")}, 
                                $text: { $search: "raven" } })

似乎使用两个索引:

{
    "queryPlanner" : {
        "plannerVersion" : 1,
        "namespace" : "test.oneMillionDocumentsIndexed",
        "indexFilterSet" : false,
        "parsedQuery" : {
            "$and" : [
                {
                    "created" : {
                        "$lte" : ISODate("2016-02-06T15:34:59.019Z")
                    }
                },
                {
                    "$text" : {
                        "$search" : "raven",
                        "$language" : ""
                    }
                }
            ]
        },
        "winningPlan" : {
            "stage" : "FETCH",
            "filter" : {
                "created" : {
                    "$lte" : ISODate("2016-02-06T15:34:59.019Z")
                }
            },
            "inputStage" : {
                "stage" : "TEXT",
                "indexPrefix" : {

                },
                "indexName" : "$**_text",
                "parsedTextQuery" : {

                }
            }
        },
        "rejectedPlans" : [ ]
    },
    "serverInfo" : {
        "host" : "Plod",
        "port" : 27017,
        "version" : "3.0.7",
        "gitVersion" : "6ce7cbe8c6b899552dadd907604559806aa2e9bd"
    },
    "ok" : 1
}    

创建的日期搜索是否应该减少文档数量,从而加快查询速度?

虽然文件不是很小,但它们也不大。这是一个示例文档:

{
    "_id" : ObjectId("56b612a2b6c13d2bec221d22"),
    "created" : ISODate("2016-02-06T15:34:57.954Z"),
    "adoptability-integer" : 1885631649,
    "impoverisher-double" : 0.78982932576436,
    "auriga-short-string" : "unpunished",
    "pistillate-long-string" : "raven nationalistic supergalaxies shit candidacy vengefulness baghla inharmony breviaries subcoracoid facet numbles Achaian hyksos g¥ᄀtterdï¿¥ï¾ exsecant costliness assertively cufic neurotomy subfebrile reassess eruption calciphobous epithecium adipopectic eruption neurotomy impaste shrugging oxytone depredating abb¥ᄑ unfaithfulness clive amman meteorology dollond del cussed malversation Determinateness wadset busher precedent warder lithest tuberculinize kythera swiping hyperopic installation otosclerosis costly joyance neenah saliently bicepses myograph blackmur. salable radiational copaiva seisure animism franglais chalkboard astride preaortic machinelike criseyde easternmost theological. goloshes amber assertively universalism pterylological abortifacient entrepï¾¢t nordic intricate canvasser unscholastic caria marginal prakritic gal tambur seascouting branchiform vaticide hysteroidal. vario chefoo permanganic solidillu lashings permanganic denatured chartres Nonenergetically pabx coinheritance koulibiaca wrathless unrejoicing kodly confutable Juru changelessness ratite pol lightener pansy portadown unpeg iontophoresis Ruddily overcorrupt rondure midair mobocrat. Rals sind teaser hussism definiteness piperidine septicity procryptic salicaceous catalpa Stingy panegyrise Baddie wodan preoccasioned ndebele sanitizing mulga grantedly selectman dep overscruple mealies subsellia noncompressible lepidoptera nonequilateral vï¿¥ï¾ racemiform carob preaccredit parramatta. piatigorsky unmanifest eulogized bolometric circumnavigating stare. prewitt branchiform canadianizing untinselled crossruf anthozoic del dragrope pronative foulness incessancy sultanate debunker guncotton reindictment uninstalled pieter buying prestwick anguish dicrotism permissible. nonscarcity labialising underswamp nondegradation incubating unwillable dealer Rewinded jaggedness jasmine flatfootedness edgily choregraphic unpenetrating unwhited devotedly thornton irremediably reentry cordilleras inhospitable blenchingly hedgehop. nontribesman semiexhibitionist streetlike outgeneral Spatiality hyacinthides prometheus tingly tenacious Aerologist promonarchy nonsophistical uhuru unsprayable countrywoman proequality schickard. antagonize Cart undocumented heteroplastic cyclostome keratin specification tombless lambie extricating feticide reacceded redwing autokinetic ferias underpart dupr¥ᄑ preexperimental besancon dvm riksm'' unharmonised bradykinetic unforeseeableness ryukyu rootstalk aquarial uredospore kame nondissenting pachyderm southeasterner comminute excitant torturing reasoningly restabilize isotopy emergency boathouses plowmanship decidedness skeptophylaxis kelebe clive furred abuttals variometer indamine wreathe. guymon rubinstein monotriglyph inaction. bedazzle foreordinated proportioner pursy beryl slogging forbearer abirritant concur. nonleprous veriax overservility mirza relitigate richness dipteroi mischarged. inquisitress nav unimpressibility teratoma brilliantined untensing vlaardingen theorbo shostakovich appia maximally fingered ashkenazim soap unpick isocheimenal gingili synonymical interannular patronising knaggiest cleaver lassie interwound osculated unobliging portobello boxer impactive.Bladderwort wish aerothermodynamics lymphadenomata nonfundamental interdiffuse injector chaussure. polyphyletically irishising ayous sinecurist decant carbonized flickeringly stomatitic emily luteotrophin anginous Syllabic permeameter Carthal brachiator farinose justicelike azotized getaway electroencephalographically puglia unconfound appendiceal premedical vassal rubric overhearing Conative heartaching shammer staphylorrhaphy bulgar spilikin phagocytosing adenitis syntypic dissertate collyrium sonless anoxia archil mimosis irreversibly unhabituated scholiast rcs portadown mishima preimport bonavist jointedly aspergillus farinose condemnation chough blanc descanter mephistopheles ongoing unsurgical unclassifiableness namtar corniest disbudding disklike zap wheyface teetotally nonsubmission delian enrober canadian nasi hypermetabolism animadversion Unbantering recompile ineradicable blindly mren Schorlaceous viperous latish unstationed decastylos catalpa beflagged pellicular demark gassendi. macmonnies deserve subsidizer generous reassess colorfully unsummonable clave hderlin borges aechmagoras misbegotten uncontradictory unfelicitous plunderage presynsacral backband amagasaki unsavorily proenzyme ney slipslop unrhythmical Debenture rosy unreprehended sulfuryl outpeep fichtean jellylike anginous foil pixies columella nonsuggestion unwhited icier archbishop masan oireachtas coxcomb pseudosiphonic rubinstein cockerel fidel swingle submembranous despondent sarajevo camshaft inclusiveness reynard deducibility Counselling velveteen whaleback interventricular harquebuses sodomite chunk nondecayed disyllable nonfundamental funnelling pricing neuroanatomical evaporate palisades kamerun. zigzag meteorology agura puerperium misfield annulus sapper franklinton prenotion pyroxylin dustour fluming cereus nontangental metempirical Nonadjudication restated impactive.Bladderwort swingle frolic hadramaut buraydah uncarbonized sthenius uncreditableness undreading grattoir excitant bma mellers centurial broad intellectualist pursy apodemal inclusiveness laurence kentucky cyanic nonunified jason. swiping mismatch cereuses dress entrain mannikin insetting scratchy glaiketness query antipatriarchal rjcharging. fichtean lwm reidentification theurgically Baddie abut snowcreep vaud cretheus clubhouses homodyne rayah beguine coquettishly rabidness retime lithoid send epistyle undefendable christless narcomania extraprofessional paracelsic interrogatories eucrite cotswolds reverberantly recommendatory dorsally wobbly sheared malacca worminess oka railway farnham bendwise prediet bastioned tuberculocele deriver intelligential Cutty Artillerist calipatria torchier drillable currawong obviable remoteness. forte sentimentalist dealer nonempathically foreseeable talthybius reinjuries tannic hyperopic toolmaker pieridine noncontention panne baghla syndromic intermeasuring gait leaving osteoclasis. squillageed cadetship messieurs benet Player terseness chagrinning sterically birthmark subvertebral runesmith stomodaeum illiberalising sarmentose overlubricate weeds ecumenic unretaliative execrableness trichotomic schumann luxury nupe dirk ashkenazim zap iconoclast vulneraries pulaski hypergeusesthesia mismatch lymphangitides cubitus unpossessable rummage silviculturally bara quo. arizonian danielson granddaddy klemperer curling derivatively monadal ungrained counterproductive contendingly handled aegirine motilal. unfaulty anecdotal cyanate. bucolically leaving mephistopheles revibrated maculation glairier palmer harebell Laryngectomized primitivity.Mucous consensual comfortlessly slumberland preenrollment decastylos buying yggdrasil unslakable concordia Uprooter pï¾¢rto meloid. klemperer frambesia bohemia kruller carburettor limousin Accessarily debt ameliorate bootblack richardson salvarsan contumaciousness landscaper epigyny palisades redwing pyribenzamine totalitarian taxiplane aurum chasm criollos transannular friendship spitefully eliot Ennomus spessartine pomiferous Ethnarchy milkiness fractable amigen unexuberant.Repark clapt keyboard noninjurious unemotive corbelled sib sextet beheader appear kyathos cirrocumular semipagan teasingly coelenteron nihilism chitchatted dress bateau unrhythmical unimmerged lapelled archaeocyte depersonalise redispersal querurying memlinc strepitous. consociate dehypnotizing stardom novelise mimosa disklike invertase nonmarriageable agreeing tuberculinize graphologic paris hew airy outwove inconvenienced columella desc freight broodiest spermatorrhoea melodic rebeck. silverised jahangir everard foolishly gabby packer Mahound Emendatory infeasibility inkpot resubstantiated Isopectic revivifying crassulaceous unresigned.Greenboard hanyang guevara inspectable hyperbrachycephaly dicrotal armipotent dissever girdlelike alternator obs. heritable nondietetically sensationism medick chlorellaceous spotted flews mariner gait nontribesman unshrinkability regulated haunter sharer postliminy maeterlinck disaffiliating nonreflection disadvantageously creepy congenitalness puglia savanna. Codetta orb reenlightenment gen palaeozoology educatee niobous deject dysteleological pampre electroencephalographically harebrained execrableness achroite theorbo germinance anisocarpic jagellon antlia frenchiness splendid communalize andalusia unlofty archduchy apery forbade snit wintriest mendicity franglais depersonalise sibship unslapped totalitarian compatriot doll polkaed dyersville huntingdonshire loftily spectrality carafe gouverneur cureless unprecarious redevelop illiberalism. racialistic distributing cameo madrigalesque coalitionist snort cochleate overact ladysmith protostele Afforestation multimegaton proletarianness Amphithalamus abeokuta. Amerind subfreezing missilery secateurs. superstructure. chrysarobin seaworthiness snohomish necrobacillosis incinerated wrack sclaffer kamasutra postmyxedemic mortgagor impaste earliness underlapped bucktooth mortified birthmark unscrupled angiocardiography hemiacetal judgeless hussy channing reunified nondissenting hypercathartic vindicable unslapped extensionally lashings canniest cling motional homotherm overobesity clive retasting clipt rewound. unousted prosper australorp theocracy Interprofessional crocus Carthal unmoveable repouss¥ᄑ birthmark reasonableness wristwatch patronising g¥ᄀtterdï¿¥ï¾ jink vitus. stokes ultima phyllocladium mudfish trust caravaggio overtipple Amorphous Baddie milksopping mulier. indeciduate winkle acrimoniousness. cereuses altgeld gelatinoid contemporaneous traveling haphtaroth aet gogglebox nupe archeptolemus withdrawable Nonenergetically horsing coral. stint preludin keynoter cogitative persuadability godwin wardenry reborn patternless sorrentine vitria moror chumash nonguilt nonpacific realter regive unoratorial halothane skeptophylaxis quo. songfest Desperado mischarged. suberise teratoma apposer homoiothermal nonstyptical "
}

1 个答案:

答案 0 :(得分:11)

这里的主要案例是" text"搜索结果通常优先于查询中的其他过滤条件,因此有必要首先"首先"从"文本"获得结果组件,然后基本上"扫描"对于文件中的其他条件。

这种类型的搜索很难与"范围"进行优化。或任何类型的"不平等"匹配条件与文本搜索结果相结合,主要是由于MongoDB如何处理这个"特殊"索引类型。

要进行简短演示,请考虑以下基本设置:

db.texty.drop();

db.texty.insert([
    { "a": "a", "text": "something" },
    { "a": "b", "text": "something" },
    { "a": "b", "text": "nothing much" },
    { "a": "c", "text": "something" }
])

db.texty.createIndex({ "text": "text" })
db.texty.createIndex({ "a": 1 })

因此,如果你想看一下文本搜索条件以及另一个字段({ "$lt": "c" })的范围考虑,那么你可以按如下方式处理:

db.texty.find({ "a": { "$lt": "c" }, "$text": { "$search": "something" } }).explain()

使用解释输出如(重要部分):

           "winningPlan" : {
                    "stage" : "FETCH",
                    "filter" : {
                            "a" : {
                                    "$lt" : "c"
                            }
                    },
                    "inputStage" : {
                            "stage" : "TEXT",
                            "indexPrefix" : {

                            },
                            "indexName" : "text_text",
                            "parsedTextQuery" : {
                                    "terms" : [
                                            "someth"
                                    ],
                                    "negatedTerms" : [ ],
                                    "phrases" : [ ],
                                    "negatedPhrases" : [ ]
                            },
                            "inputStage" : {
                                    "stage" : "TEXT_MATCH",
                                    "inputStage" : {
                                            "stage" : "TEXT_OR",
                                            "inputStage" : {
                                                    "stage" : "IXSCAN",
                                                    "keyPattern" : {
                                                            "_fts" : "text",
                                                            "_ftsx" : 1
                                                    },
                                                    "indexName" : "text_text",
                                                    "isMultiKey" : true,
                                                    "isUnique" : false,
                                                    "isSparse" : false,
                                                    "isPartial" : false,
                                                    "indexVersion" : 1,
                                                    "direction" : "backward",
                                                    "indexBounds" : {

                                                    }
                                            }
                                    }
                            }
                    }
            },

这基本上是说" 首先为我提供文本结果,然后过滤其他条件" 提取的结果。所以很明显只有"文本"这里使用了索引,然后通过检查内容来过滤它返回的所有结果。

出于两个原因,这不是最佳的,因为数据可能最好受到"范围"的限制。条件而不是文本搜索的匹配。其次,即使其他数据有索引,也不会在此处用于比较。因此,为每个结果加载整个文档并测试过滤器。

然后你可以考虑一个"化合物"索引格式在这里,如果"范围"更具体的选择,然后将其作为索引键的前缀顺序包括:

db.texty.dropIndexes();
db.texty.createIndex({ "a": 1, "text": "text" })

但是这里有一个问题,因为当你再次尝试运行查询时:

db.texty.find({ "a": { "$lt": "c" }, "$text": { "$search": "something" } })

这会导致错误:

  

错误:错误:{           " waitedMS" :NumberLong(0),           " OK" :0,           " ERRMSG" :"错误处理查询:ns = test.textyTree:$和\ na $ lt \" c \" \ n TEXT:query = something,language = english,caseSensitive = 0,diacriticSensitive = 0,tag = NULL \ nSort:{} \ nProj:{} \ n planner返回错误:无法使用文本索引来满足$ text查询(如果文本索引是复合的,是否为所有前缀字段指定了等式谓词?)&# 34 ;,           "代码" :2   }

因此即使看起来似乎是最优的",MongoDB处理特殊"文本"的查询(以及真正的索引选择)的方式也是如此。索引,这是不可能的"排除"超出范围的可能性。

然而,您可以执行"平等"以非常有效的方式匹配:

db.texty.find({ "a": "b", "$text": { "$search": "something" } }).explain()

使用解释输出:

           "winningPlan" : {
                    "stage" : "TEXT",
                    "indexPrefix" : {
                            "a" : "b"
                    },
                    "indexName" : "a_1_text_text",
                    "parsedTextQuery" : {
                            "terms" : [
                                    "someth"
                            ],
                            "negatedTerms" : [ ],
                            "phrases" : [ ],
                            "negatedPhrases" : [ ]
                    },
                    "inputStage" : {
                            "stage" : "TEXT_MATCH",
                            "inputStage" : {
                                    "stage" : "TEXT_OR",
                                    "inputStage" : {
                                            "stage" : "IXSCAN",
                                            "keyPattern" : {
                                                    "a" : 1,
                                                    "_fts" : "text",
                                                    "_ftsx" : 1
                                            },
                                            "indexName" : "a_1_text_text",
                                            "isMultiKey" : true,
                                            "isUnique" : false,
                                            "isSparse" : false,
                                            "isPartial" : false,
                                            "indexVersion" : 1,
                                            "direction" : "backward",
                                            "indexBounds" : {

                                            }
                                    }
                            }
                    }
            },

因此使用索引并且可以显示"预过滤"通过其他条件的输出提供给文本的内容。

如果确实你保留了"前缀"将索引作为"文本"但要搜索的字段:

db.texty.dropIndexes();

db.texty.createIndex({ "text": "text", "a": 1 })

然后执行搜索:

db.texty.find({ "a": { "$lt": "c" }, "$text": { "$search": "something" } }).explain()

然后你会看到与上述相似的结果"平等"匹配:

            "winningPlan" : {
                    "stage" : "TEXT",
                    "indexPrefix" : {

                    },
                    "indexName" : "text_text_a_1",
                    "parsedTextQuery" : {
                            "terms" : [
                                    "someth"
                            ],
                            "negatedTerms" : [ ],
                            "phrases" : [ ],
                            "negatedPhrases" : [ ]
                    },
                    "inputStage" : {
                            "stage" : "TEXT_MATCH",
                            "inputStage" : {
                                    "stage" : "TEXT_OR",
                                    "filter" : {
                                            "a" : {
                                                    "$lt" : "c"
                                            }
                                    },
                                    "inputStage" : {
                                            "stage" : "IXSCAN",
                                            "keyPattern" : {
                                                    "_fts" : "text",
                                                    "_ftsx" : 1,
                                                    "a" : 1
                                            },
                                            "indexName" : "text_text_a_1",
                                            "isMultiKey" : true,
                                            "isUnique" : false,
                                            "isSparse" : false,
                                            "isPartial" : false,
                                            "indexVersion" : 1,
                                            "direction" : "backward",
                                            "indexBounds" : {

                                            }
                                    }
                            }
                    }
            },

这里的第一次尝试与filter放置在处理链中的区别很大,表明虽然不是"前缀"匹配(这是最优化的),内容实际上是从索引扫描出来的"之前"被发送到"文本"阶段。

所以"预过滤"但当然不是最优化的方式,这是由于" text"使用索引。因此,如果您只考虑索引上的普通范围:

db.texty.createIndex({ "a": 1 })
db.texty.find({ "a": { "$lt": "c" } }).explain()

然后解释输出:

            "winningPlan" : {
                    "stage" : "FETCH",
                    "inputStage" : {
                            "stage" : "IXSCAN",
                            "keyPattern" : {
                                    "a" : 1
                            },
                            "indexName" : "a_1",
                            "isMultiKey" : false,
                            "isUnique" : false,
                            "isSparse" : false,
                            "isPartial" : false,
                            "indexVersion" : 1,
                            "direction" : "forward",
                            "indexBounds" : {
                                    "a" : [
                                            "[\"\", \"c\")"
                                    ]
                            }
                    }
            },

然后至少要考虑indexBounds并且只查看那些落在这些范围内的索引部分。

这就是这里的差异。使用"化合物"结构应该通过缩小选择范围来节省一些迭代周期,但它仍然必须扫描所有索引条目以进行过滤,当然必须是"前缀"索引中的元素,除非您可以在其上使用相等匹配。

如果索引中没有复合结构,则始终返回文本结果"首先",然后将任何其他条件应用于这些结果。也是不可能"组合/交叉"查看"文本"的结果索引和"正常"索引由于查询引擎处理。这通常不是最佳方法,因此规划考虑很重要。

简而言之,理想情况下,复合与平等"匹配"前缀",如果没有,则在#34;之后包含在索引"文字定义。