请教大家一个 mongodb 大数据量下， count 优化的问题

推荐工具

Related Blogs

› Snail in a Turtleneck

This topic created in 853 days ago, the information mentioned may be changed or developed.

问题背景：
现在有个一个集合，每月大概会往里插入将近 600M 的数据，项目项目上线了将近 4 个月，有 2500W 数据，现在 count 一次，需要将近 1~2 分钟，即使使用了各类索引优化，还是没什么太大作用。

我想请教一下大家，遇到这中情况，该如何优化呢，或者说，有什么更合适的方案去在这种数据量下进行 count 查询呢？

谢谢！

Supplement 1 · Mar 13, 2024

大概的查询语句

集合schema

{
    //...省略其他字段
    "_id": ObjectId("65e9310b6c21fc6df939c43d"),
    "nodeType": "start",
    "executeTime": ISODate("2024-03-07T03:14:19.708Z"),
    "isDetail": false,
}

索引

{
    "executeTime": -1,
    "nodeType": 1,
    "isDetail": 1,
}

查询语句

db.collection.countDocuments({
    "executeTime": {
        $gte: new Date(new Date().getTime() - 30 * 24 * 60 * 60 * 1000),
        $lte: new Date()
    },
    "nodeType": "interactive",
    "isDetail": true,
}, {
  "executeTime": 1
  "nodeType": 1,
  "isDetail": 1,
  "_id": 1,
})

MongoDB

count

优化

13 replies • 2024-03-13 13:28:12 +08:00

Yuan2One

Mar 13, 2024 via Android

只查总数吗，存 redis 可以吗

14v45mJPBYJW8dT7

Mar 13, 2024

定时去 count 缓存，这么多数据应该不需要实时精确数量吧？

coderxy

Mar 13, 2024

这个要看你索引的定义，一般 count 条件语句的第一个字段非常重要，要用比较稀疏的字段，还是要具体分析的。你可以把你的字段和查询条件贴出来，具体分析一下看看。

defunct9

Mar 13, 2024

开 ssh ，让我上去看看

lilei2023

Mar 13, 2024

@defunct9 你这个光头，每次都要 ssh ！

Belmode

Mar 13, 2024

@Yuan2One #1 虽然查询频率很低，但是还是会有条件变化的，redis 不适合这种场景。

@rimutuyuan #2 查询频率很低，但是条件会变化，所以定时任务也不太行

@coderxy #3 查询的第一个条件是时间。

@defunct9 #4 老哥老面孔了

@lilei2023 #5 上面的老哥是来挽尊的，气氛担当~

Belmode

Mar 13, 2024

@coderxy #3 集合结构和查询语句

集合 schema
{
//...省略其他字段
"_id": ObjectId("65e9310b6c21fc6df939c43d"),
"nodeType": "start",
"executeTime": ISODate("2024-03-07T03:14:19.708Z"),
"isDetail": false,
}
索引
{
"executeTime": -1,
"nodeType": 1,
"isDetail": 1,
}

查询语句
db.collection.countDocuments({
"executeTime": {
$gte: new Date(new Date().getTime() - 30 * 24 * 60 * 60 * 1000),
$lte: new Date()
},
"nodeType": "interactive",
"isDetail": true,
}, {
"executeTime": 1
"nodeType": 1,
"isDetail": 1,
"_id": 1,
})

coderxy

Mar 13, 2024

@Belmode nodeType 存在的值多不多？多的话拿 nodeType 做联合索引第一位，isDetail=true 的情况多不多，多的话拿 isDetail 做第一位，反正索引要么是 nodeType_isDetail_executeTime 要么是 isDetail_nodeType_executeTime ，你再去试试，性能应该回好不少。你现在的索引用法，估计 explain 分析一下，seek 特别多。最终的效果可能相当于只有 executeTime 单索引的效果。