Question

我想创建一个类似（例如）Tinder的应用程序。因此，我必须能够列出我周围的所有用户，这些用户也符合某些标准（年龄，宗教信仰等）。

实际上所有的用户都存储在mongoDB中，但是mongoDB看起来很糟糕，例如我做

db.runCommand( { dropDatabase: 1 } )

db.createCollection("users"); 

db.users.createIndex( { "locs.loc" : "2dsphere" } )


function randInt(n) { return parseInt(Math.random()*n); }
function randFloat(n) { return Math.random()*n; }

for(var j=0; j<10; j++) {  
  print("Building op "+j);
  var bulkop=db.users.initializeOrderedBulkOp() ;
  for (var i = 0; i < 1000000; ++i) {
    bulkop.insert(    
      {
        locs: [
          {
            loc : { 
              type: "Point", 
              coordinates: [ randFloat(180), randFloat(90) ] 
            }
          },
          {
            loc : { 
              type: "Point", 
              coordinates: [ randFloat(180), randFloat(90) ] 
            }
          }
        ]
      }  
    )
  };
  print("Executing op "+j);
  bulkop.execute();
}

然后

db.runCommand(
   {
     geoNear: "users",
     near: { type: "Point", coordinates: [ 73.9667, 40.78 ] },
     spherical: true,
     query: { category: "xyz" }
   }
)

它让我 4分返回

   "waitedMS" : NumberLong(0),
   "results" : [ ],
   "stats" : {
           "nscanned" : 10018218,
           "objectsLoaded" : 15000000,
           "maxDistance" : 0,
           "time" : 219873
   },
   "ok" : 1

因此，我必须使用其他东西但？我非常确定我需要一个内存索引，如sphinx（所以只需将所有记录存储在内存中，并对每个查询执行所有行的完整扫描）。实际上它的工作相当不错，但是狮身人面像索引的目标是索引文本文档，我不确定它是否符合我的需要。

Answer 1

在Sphinx / Manticore中，对1M文档的搜索速度会快得多。在我的服务器上（功能不是很强大）需要大约100ms，索引需要大约16M的RAM和大约31M的磁盘空间。

mysql> select id, geodist(lat,lng,73.9667,40.78, {in=deg,out=km}) dist, lat, lng from idx where dist < 5;
+--------+----------+-----------+-----------+
| id     | dist     | lat       | lng       |
+--------+----------+-----------+-----------+
| 456688 | 4.311642 | 74.005157 | 40.793140 |
| 679960 | 2.206543 | 73.979790 | 40.726372 |
| 904809 | 3.339423 | 73.936790 | 40.783146 |
+--------+----------+-----------+-----------+
3 rows in set (0.10 sec)

mysql> select count(*) from idx;
+----------+
| count(*) |
+----------+
|  1000000 |
+----------+
1 row in set (0.04 sec)

[snikolaev@dev01 ~]$ ls -lah idx_1m.sp*
-rw------- 1 snikolaev snikolaev  16M Apr 12 05:17 idx_1m.spa
-rw------- 1 snikolaev snikolaev 6.7M Apr 12 05:17 idx_1m.spd
-rw------- 1 snikolaev snikolaev    1 Apr 12 05:17 idx_1m.spe
-rw------- 1 snikolaev snikolaev  334 Apr 12 05:17 idx_1m.sph
-rw------- 1 snikolaev snikolaev 7.8M Apr 12 05:17 idx_1m.spi
-rw------- 1 snikolaev snikolaev    0 Apr 12 05:17 idx_1m.spk
-rw------- 1 snikolaev snikolaev    0 Apr 12 05:17 idx_1m.spl
-rw------- 1 snikolaev snikolaev    0 Apr 12 05:17 idx_1m.spm
-rw------- 1 snikolaev snikolaev    1 Apr 12 05:17 idx_1m.spp
-rw------- 1 snikolaev snikolaev    1 Apr 12 05:17 idx_1m.sps

所以我在你的案例中没有看到使用Sphinx / Manticore的任何问题：

如果您更喜欢批量数据加载xmlpipe / csvpipe将允许您加载来自mongodb的数据很容易
如果您需要实时加载数据，也可以通过RealTime索引
并且性能/资源消耗是在体面的水平

请注意，虽然它不是纯粹的内存解决方案，即一旦索引的数据将存储在磁盘上，但属性（在您的情况下为经度和经度）始终保留在内存中以获得更好的性能。 / p>

另一种选择（如果您正在寻找更多内存解决方案）将是RediSearch，它也可以进行地理搜索 - https://redis.io/commands/georadius 我不是专家，所以不能说它是否比Sphinx / Manticore更快。

数据库索引VS全表扫描内存索引

1 个答案: