I've got an AWS Aurora DB cluster running that is 99.9% focused on writes. At it's peak, it will be running 2-3k writes/sec.
我有一个运行的AWS Aurora数据库集群,99.9%集中在写入上。在它达到峰值时,它将以每秒2-3k的速度运行。
I know Aurora is somewhat optimized by default for writes, but I wanted to ask as a relative newcomer to AWS - what are some best practices/tips for write performance with Aurora?
我知道Aurora在默认情况下对写入有所优化,但我想问一下AWS的相对新手 - Aurora的写性能最佳实践/技巧是什么?
2 个解决方案
#1
12
From my experience, Amazon Aurora is unsuited to running a database with heavy write traffic. At least in its implementation circa 2017. Maybe it'll improve over time.
根据我的经验,Amazon Aurora不适合运行具有大量写入流量的数据库。至少在大约2017年的实施中。也许它会随着时间的推移而改善。
I worked on some benchmarks for a write-heavy application earlier in 2017, and we found that RDS (non-Aurora) was far superior to Aurora on write performance, given our application and database. Basically, Aurora was two orders of magnitude slower than RDS. Amazon's claims of high performance for Aurora are apparently completely marketing-driven bullshit.
我在2017年早些时候为一个写入量大的应用程序做了一些基准测试,我们发现RDS(非Aurora)在写入性能方面远远优于Aurora,因为我们的应用程序和数据库。基本上,Aurora比RDS慢两个数量级。亚马逊声称Aurora的高性能显然完全是以营销为导向的废话。
In November 2016, I attended the Amazon re:Invent conference in Las Vegas. I tried to find a knowledgeable Aurora engineer to answer my questions about performance. All I could find were junior engineers who had been ordered to repeat the claim that Aurora is magically 5-10x faster than MySQL.
2016年11月,我参加了在拉斯维加斯举行的Amazon re:Invent大会。我试图找到一位知识渊博的Aurora工程师来回答我关于性能的问题。我所能找到的只是初级工程师,他们被要求重复声称Aurora比MySQL快5-10倍。
In April 2017, I attended the Percona Live conference and saw a presentation about how to develop an Aurora-like distributed storage architecture using open-source components CEPH. There's a webinar on the same topic here: https://www.percona.com/resources/webinars/mysql-and-ceph, co-presented by Yves Trudeau, the engineer I saw speak at the conference.
2017年4月,我参加了Percona Live会议,并看到了如何使用开源组件CEPH开发类似Aurora的分布式存储架构的演示。这里有一个关于同一主题的网络研讨会:https://www.percona.com/resources/webinars/mysql-and-ceph,由我见过的工程师Yves Trudeau共同主持。
What became clear about using MySQL with CEPH is that the engineers had to disable the MySQL change buffer because there's no way to cache changes to secondary indexes, while also have the storage distributed. This caused huge performance problems for writes to tables that have secondary (non-unique) indexes.
使用MySQL与CEPH的关系是,工程师必须禁用MySQL更改缓冲区,因为无法将更改缓存到二级索引,同时还要分配存储。这会导致写入具有辅助(非唯一)索引的表的巨大性能问题。
This was consistent with the performance problems we saw in benchmarking our application with Aurora. Our database had a lot of secondary indexes.
这与我们在使用Aurora对应用程序进行基准测试时遇到的性能问题是一致的。我们的数据库有很多二级索引。
So if you absolutely have to use Aurora for a database that has high write traffic, I recommend the first thing you must do is drop all your secondary indexes.
因此,如果您绝对必须将Aurora用于具有高写入流量的数据库,我建议您必须做的第一件事就是删除所有二级索引。
Obviously, this is a problem if the indexes are needed to optimize some of your queries. Both SELECT queries of course, but also some UPDATE and DELETE queries may use secondary indexes.
显然,如果需要索引来优化您的某些查询,则会出现问题。当然,SELECT查询和一些UPDATE和DELETE查询都可以使用二级索引。
One strategy might be to make a non-Aurora read replica of your Aurora cluster, and create the secondary indexes only in the read replica to support your SELECT queries. I've never done this, but apparently it's possible, according to https://aws.amazon.com/premiumsupport/knowledge-center/enable-binary-logging-aurora/
一种策略可能是制作Aurora集群的非Aurora只读副本,并仅在只读副本中创建二级索引以支持SELECT查询。根据https://aws.amazon.com/premiumsupport/knowledge-center/enable-binary-logging-aurora/,我从未这样做过,但显然这是可能的。
But this still doesn't help cases where your UPDATE/DELETE statements need secondary indexes. I don't have any suggestion for that scenario. You might be out of luck.
但是,这仍然无助于UPDATE / DELETE语句需要二级索引的情况。我对这种情况没有任何建议。你可能运气不好。
My conclusion is that I wouldn't choose to use Aurora for a write-heavy application. Maybe that will change in the future.
我的结论是,我不会选择使用Aurora来进行大量写入。也许这将在未来发生变化。
#2
1
I had a relatively positive experience w/ Aurora, for my use case. I believe ( time has passed ) we were pushing somewhere close to 20k DML per second, largest instance type ( I think db.r3.8xlarge? ). Apologies for vagueness, I no longer have the ability to get the metrics for that particular system.
对于我的用例,我对Aurora有相对积极的体验。我相信(时间已过)我们正在推动接近每秒20k DML的最大实例类型(我认为db.r3.8xlarge?)。对于含糊不清的道歉,我不再能够获得该特定系统的指标。
What we did:
我们做了什么:
This system did not require "immediate" response to a given insert, so writes were enqueued to a separate process. This process would collect N queries, and split them into M batches, where each batch correlated w/ a target table. Those batches would be put inside a single txn.
该系统不需要对给定插入的“立即”响应,因此写入被排入单独的进程。此过程将收集N个查询,并将它们拆分为M个批次,其中每个批次与目标表相关联。这些批次将放在一个txn内。
We did this to achieve the write efficiency from bulk writes, and to avoid cross table locking. There were 4 separate ( I believe? ) processes doing this dequeue and write behavior.
我们这样做是为了通过批量写入实现写入效率,并避免跨表锁定。有4个独立的(我相信?)进程执行此出列和写入行为。
Due to this high write load, we absolutely had to push all reads to a read replica, as the primary generally sat at 50-60% CPU. We vetted this arch in advance by simply creating random data writer processes, and modeled the general system behavior before we committed the actual application to it.
由于这种高写入负载,我们绝对必须将所有读取推送到只读副本,因为主要通常占用50-60%的CPU。我们通过简单地创建随机数据编写器进程,并在我们将实际应用程序提交给它之前对一般系统行为进行建模,从而提前审查了这个阶段。
The writes were almost all INSERT ON DUPLICATE KEY UPDATE
writes, and the tables had a number of secondary indexes.
写入几乎都是INSERT ON DUPLICATE KEY UPDATE写入,并且表有许多二级索引。
I suspect this approach worked for us simply because we were able to tolerate delay between when information appeared in the system, and when readers would actually need it, thus allowing us to batch at much higher amounts. YMMV.
我怀疑这种方法对我们有用,因为我们能够容忍系统中信息出现之间的延迟,以及读者实际需要时的延迟,从而允许我们以更高的数量批量处理。因人而异。