剖析 Golang Bigcache 的极致性能优化

V2EX = way to explore

V2EX 是一个关于分享和探索的地方

现在注册

已注册用户请登录

The Go Programming Language

› http://golang.org/

› Go Playground

Go Projects

› Revel Web Framework

这是一个创建于 467 天前的主题，其中的信息可能已经有所发展或是发生改变。

Bigcache 是用 Golang 实现的本地内存缓存的开源库，主打的就是可缓存数据量大，查询速度快。 在其官方的介绍文章《 Writing a very fast cache service with millions of entries in Go 》一文中，明确提出的 bigcache 的设计目标：

多：缓存的元素数量非常大，可以达到百万级或千万级。
快：对延迟有非常高的要求，平均延迟要求在 5 毫秒以内。redis 、memcached 之类的就不在考虑范围内了，毕竟用 Redis 还要多走一遍网络 IO 。
稳： 99.9 分位延迟应在 10 毫秒左右，99.999 分位延迟应在 400 毫秒左右。

目前有许多开源的 cache 库，大部分都是基于 map 实现的，例如 go-cache,ttl-cache 等。bigcache 明确指出，当数据量巨大时，直接基于 map 实现的 cache 库将出现严重的性能问题，这也是他们设计了一个全新的 cache 库的原因。

本文将通过分析 bigcache v3.1.0 的源码，揭秘 bigcache 如何解决现有 map 库的性能缺陷，以极致的性能优化，实现超高性能的缓存库。

8 条回复 • 2023-11-28 04:36:27 +08:00

gowk

2023-11-25 22:32:29 +08:00

感谢 OP 分享，文章深入浅出，结构清晰，语言流畅，让我对 bigcache 有了一个清晰的认识

qloog

2023-11-26 09:05:00 +08:00

非常好的一个本地 cache 库，但不能和 redis 比，redis 是分布式的。推荐下另一个本地 cache 库：github.com/dgraph-io/ristretto ，性能上比 bigcache 好一些。

cyhone

2023-11-26 09:43:19 +08:00

@qloog 感谢推荐，我研究下~

darrh00

2023-11-26 10:24:45 +08:00

@qloog
Vitess 也在用 ristretto 做 cache ？我怎么印象是用了 v2 某网友的一个 cache 库，我想不起来是哪个库了。

woniuge

2023-11-26 21:16:36 +08:00

@darrh00

https://github.com/Yiling-J/theine-go

Vitess 将使用 Theine 作为 plan cache
https://www.v2ex.com/t/974278

maypok86

2023-11-27 18:27:03 +08:00

Hi, I will write in English but I hope many people will find my opinion and review of cache libraries in Go useful.

So, cache libraries in Go are of two types
1. Cache libraries that use a good eviction policy but put extra pressure on gc
2. Cache libraries that don't use eviction policies (just delete the first inserted element) but don't put extra pressure on gc

Let's talk about the second category first. The main representatives are: fastcache( https://github.com/VictoriaMetrics/fastcache), bigcache( https://github.com/allegro/bigcache) and freecache( https://github.com/coocood/freecache). Ok, when should we use them? It seems that approximately never because the only advantage they give is the absence of pressure on gc, even when storing tens of millions of key-value pairs libraries of the first type will win. Even the article about the creation of bigcache makes me smile https://blog.allegro.tech/2016/03/writing-fast-cache-service-in-go.html . They just wrote a worse version of redis or memcached and got no benefits. The only case when I think that using such libraries is justified is when you write a release-cycle storage where data will be stored in ram, like VictoriaMetrics( https://github.com/VictoriaMetrics), for which fastcache was written (it is faster than bigcache by the way). But for the rest there is no point in such libraries, as a good eviction policy will give much more cache hits than trying to store gigabytes of additional data. Also, these libraries do not really reduce the load on gc because they work only with strings/slices of bytes, which forces to convert other types into them, which firstly takes a lot of time and secondly greatly increase the pressure on gc, which has to do all this, and also leads to a strong fragmentation of memory (external because of string allocation and internal, because after deleting items in memory allocated by these caches huge holes are formed).

Now let's talk about libraries of the first type.

There are two types of libraries: slow but simple libraries with global locking (the vast majority of them) like https://github.com/hashicorp/golang-lru and faster libraries that try to avoid global locking ristretto( https://github.com/dgraph-io/ristretto), theine( https://github.com/Yiling-J/theine-go) and I'm writing a much faster alternative to them otter( https://github.com/maypok86/otter). RTO (ristretto theine otter) is already suitable for many more users, as it gives a good eviction policy and is more user friendly. Ok, which one should I choose then? Let's take a look at them in order
1. Ristretto. DON'T USE IT AT ALL. It has a terrible hit ratio and the authors don't answer questions about it (and basically don't answer anything). You can see it here https://github.com/dgraph-io/ristretto/issues/346 and here https://github.com/dgraph-io/ristretto/issues/336. It also allows eviction policy updates to be lost after being added to the map, which is actually a memory leak. It also replaces keys with their hash, which can cause you to run into zombies. And other problems
2. Theine. A good library that I don't really have any complaints about, except that its speed degrades already 100000 items to cache level with LRU and global locking but in return it provides a great hit ratio
3. Otter. Do not use it yet, as it is not production-ready yet. Although the intermediate results are very impressive: otter is more than 5 times faster than ristretto and theine, and on most traces from ristretto's benchmarks outperforms all of them by hit ratio

Somehow, I hope it was useful because I meet a lot of misunderstandings on this topic

cyhone

2023-11-27 23:17:43 +08:00

@maypok86 Your perspective on posing the question is excellent. Evaluating a cache library based on cache hit rate is indeed more practical than simply looking at insertion and retrieval performance.

From the information you've listed, it seems you have a very deep understanding of cache libraries, which I greatly admire.

However, I believe that bigcache is not as entirely useless as you suggest. At least in the following aspects:

* In performance-sensitive scenarios, we need multi-level caching, and a local cache can help us reduce a lot of network IO requests. Redis cannot completely replace bigcache.
* In scenarios where a lot of data needs to be cached, this is where bigcache excels. Has Otter tested the maximum data load it can handle and the query performance at high data volumes? I am very interested in this as well.
* There are also many scenarios where the content of the cache is directly in the form of []byte.

Thank you very much for your comment; I have learned a lot. Your response has made me very interested in these three open-source libraries (RTO). I will take the time to study Otter and may ask you some related questions in the issues~

maypok86

2023-11-28 04:36:27 +08:00

1.) Yes, multi-level caching is common in highload projects, but usually such caching is based on a cache library with a eviction policy that keeps track of the most frequent items and reduces response time for them, and sharded redis or memcached that stores all other items already. Let's try to explain with the example of a backend microservice. Let's say we have a very highload service that uses bigcache and possibly redis and postgresql. The service was deployed a long time ago and has already accumulated the maximum size of bigcache, some of the elements are stored in redis, and for the rest we need to go to the database and other services. Here comes a developer who has just completed his task and wants to redeploy the service and there is a problem: at simple redeployment of the service bigcache will be cleared and redis and postgresql will be flooded with additional requests because of which the system may degrade (in this example not very much, because there is redis but we want to replace redis with bigcache :). The only good solution to this problem that I know of is to use a monotonous canary deploy gradually filling up the bigcache in the service pods but this is not a very nice thing to do. And kubernetes can sometimes restart pods.... In general, I see two main problems in trying to replace redis with bigcache: 1. difficulties with redeployment 2. data consistency, which is much more important. And in multilevel caching more often use libraries with eviction policy simply because such a cache with 10 million elements is able to produce hit ratio more than 80% even on such complex traces as search and database. And gc will survive such a load quite easily.

2.) And this is a bit more fun. As far as I know, go is usually used either for backend services, or cli (a cache with a huge number of elements is simply not needed), or some boxed solutions (like dgraph or jaeger for example). For backend services the choice seems to be obvious in the direction of redis and throwing away bigcache, at least consistency problems are already solved for you and io queries are already greatly reduced due to pipelining. You can also refer to this issue https://github.com/allegro/bigcache/issues/27. Ristretto is even a faster map :) I haven't investigated the maximum number of elements in otter and other caches with eviction policy but I can say for sure that gc in golang can digest 10 million elements in such caches without much delay. (I suspect that at 100 million gc will already be bad but it should be checked) dgraph for example quietly uses ristretto, and from other languages you can easily look at kafka and cassandra, which use caffeine( https://github.com/ben-manes/caffeine), which creates additional pressure on gc.

3.) I haven't encountered such a thing, but it can be (usually it's strings after all).

Conclusion: yes, you can use bigcache, but most likely you are doing something wrong (like allegro in the bigcache article) or you should already know very well what you are doing.

The universal advice is: just use RTO (theine is better at the moment) and if you do run into problems (I doubt it), try bigcache