阅读背景:

在AWS EC2上托管的备份PostgreSQL数据库,无需关闭或重新启动主服务器

来源:互联网 

I'm using PostgreSQL v9.1 for my organization. The database is hosted in Amazon Web Services (EC2 instance) below a Django web-framework which performs tasks on the database (read/write data). The problem is, to backup this database in a periodic fashion in a specified format (see Requirements).

我正在为我的组织使用PostgreSQL v9.1。数据库托管在Django Web框架下的Amazon Web Services(EC2实例)中,该框架在数据库上执行任务(读/写数据)。问题是,以指定格式定期备份此数据库(请参阅要求)。

Requirements:

  • A standby server is available for backup purposes.
  • 备用服务器可用于备份。

  • The master-db is to be backed up every hour. Once the hour is ticked, the db is quickly backed up in entirety and then copied to slave in a file-system archive.
  • master-db每小时备份一次。勾选小时后,数据库将完全快速备份,然后在文件系统存档中复制到从属服务器。

  • Along with hourly backups, I need to perform a daily backup of the database at midnight and a weekly backup on midnight of every Sunday.
  • 除了每小时备份,我还需要在午夜执行数据库的每日备份,并在每个星期日的午夜执行每周备份。

  • Weekly-backups will be the final backups of the db. All weekly-backups will be saved. Daily-backups of the last week will only be saved and Hourly-backups of the last day will only be saved.
  • 每周备份将是数据库的最终备份。将保存所有每周备份。仅保存上周的每日备份,并且仅保存最后一天的每小时备份。

But I have the following constraints too.

但我也有以下限制。

  • Live data comes into the server every day (rate of insertion is per 2 seconds).
  • 实时数据每天都会进入服务器(插入速率为每2秒)。

  • The database now hosting critical customer data which implies that it cannot be turned off.
  • 数据库现在托管关键客户数据,这意味着它无法关闭。

  • Usually, data stops coming into the db during nights, but there's a good chance that data might be coming into master-db during some nights for which I have no control over to stop the insertions (Customer-data will be lost)
  • 通常情况下,数据会在晚上停止进入数据库,但很有可能数据可能在某些晚上进入master-db,而我无法控制停止插入(客户数据将丢失)

  • If I use traditional backup mechanisms/software (example, barman), I've to configuring archiving mode in postgresql.conf and authenticate users in pg_hba.conf which implies I need a server-restart to turn it on which again, stops the incoming data for some minutes. This is not permitted (see above constraint).
  • 如果我使用传统的备份机制/软件(例如barman),我将在postgresql.conf中配置归档模式并在pg_hba.conf中对用户进行身份验证,这意味着我需要重新启动服务器才能再打开它,停止传入几分钟的数据。这是不允许的(见上述约束)。

Is there a clever way to backup the master-db for my needs? Is there a tool which can automate this job for me?

是否有一种聪明的方法来备份master-db以满足我的需求?有没有可以让我自动完成这项工作的工具?

This is a very crucial requirement as data has begun to appear into the master-db since few days and I need to make sure there's replication of master-db on some standby-server all the time.

这是一个非常关键的要求,因为几天后数据已经开始出现在master-db中,我需要确保在某些备用服务器上始终复制master-db。

1 个解决方案

#1


4  

Use EBS snapshots

If, and only if, your entire database including pg_xlog, data, pg_clog, etc is on a single EBS volume, you can use EBS snapshots to do what you describe because they are (or claim to be) atomic. You can't do this if you stripe across multiple EBS volumes.

如果且仅当您的整个数据库(包括pg_xlog,data,pg_clog等)位于单个EBS卷上时,您可以使用EBS快照执行您描述的操作,因为它们是(或声称是)原子的。如果跨多个EBS卷进行条带化,则无法执行此操作。

The general idea is:

一般的想法是:

  • Take an EBS snapshot using the EBS APIs using command line AWS tools or a scripting interface like the wonderful boto Python library.

    使用命令行AWS工具或脚本界面(如精彩的boto Python库)使用EBS API获取EBS快照。

  • Once the snapshot completes, use AWS API commands to create a volume from it and attach the volume your instance, or preferably to a separate instance, and then mount it.

    快照完成后,使用AWS API命令从中创建卷并将卷附加到您的实例,或者最好附加到单独的实例,然后安装它。

  • On the EBS snapshot you will find a read-only copy of your database from the point in time you took the snapshot, as if your server crashed at that moment. PostgreSQL is crashsafe, so that's fine (unless you did something really stupid like set fsync=off in postgresql.conf). Copy the entire database structure to your final backup, e.g archive it to S3 or whatever.

    在EBS快照中,您将从拍摄快照的时间点找到数据库的只读副本,就好像您的服务器此时崩溃一样。 PostgreSQL是崩溃安全的,所以这很好(除非你做了一些非常愚蠢的事情,比如在postgresql.conf中设置fsync = off)。将整个数据库结构复制到最终备份,例如将其存档到S3或其他任何内容。

  • Unmount, unlink, and destroy the volume containing the snapshot.

    卸载,取消链接和销毁包含快照的卷。

This is a terribly inefficient way to do what you want, but it will work.

这是一种非常低效的方式来做你想要的,但它会起作用。

It is vitally important that you regularly test your backups by restoring them to a temporary server and making sure they're accessible and contain the expected information. Automate this, then check manually anyway.

定期测试备份是非常重要的,方法是将备份还原到临时服务器并确保它们可访问并包含预期信息。自动执行此操作,然后手动检查。

Can't use EBS snapshots?

If your volume is mapped via LVM, you can do the same thing at the LVM level in your Linux system. This works for the lvm-on-md-on-striped-ebs configuration. You use lvm snapshots instead of EBS, and can only do it on the main machine, but it's otherwise the same.

如果您的卷是通过LVM映射的,则可以在Linux系统的LVM级别执行相同的操作。这适用于lvm-on-md-on-striped-ebs配置。您使用lvm快照而不是EBS,并且只能在主机上执行,但它是相同的。

You can only do this if your entire DB is on one file system.

只有整个数据库位于一个文件系统上时,才能执行此操作。

No LVM, can't use EBS?

You're going to have to restart the database. You do not need to restart it to change pg_hba.conf, a simple reload (pg_ctl reload, or SIGHUP the postmaster) is sufficient, but you do indeed have to restart to change the archive mode.

您将不得不重新启动数据库。你不需要重新启动它来改变pg_hba.conf,一个简单的重载(pg_ctl reload,或postmaster的SIGHUP)就足够了,但是你确实必须重新启动以更改存档模式。

This is one of the many reasons why backups are not an optional extra, they're part of the setup you should be doing before you go live.

这是备份不是可选附加功能的众多原因之一,它们是您在上线之前应该进行的设置的一部分。

If you don't change the archive mode, you can't use PITR, pg_basebackup, WAL archiving, pgbarman, etc. You can use database dumps, and only database dumps.

如果不更改存档模式,则无法使用PITR,pg_basebackup,WAL存档,pgbarman等。您可以使用数据库转储,也只能使用数据库转储。

So you've got to find a time to restart. Sorry. If your client applications aren't entirely stupid (i.e. they can handle waiting on a blocked tcp/ip connection), here's how I'd try to do it after doing lots of testing on a replica of my production setup:

所以你必须找时间重启。抱歉。如果你的客户端应用程序不是完全愚蠢的(即他们可以处理等待阻塞的tcp / ip连接),这就是我在生产设置的副本上进行大量测试后尝试这样做的方法:

  • Set up a PgBouncer instance
  • 设置PgBouncer实例

  • Start directing new connections to the PgBouncer instead of the main server
  • 开始将新连接指向PgBouncer而不是主服务器

  • Once all connections are via pgbouncer, change postgresql.conf to set the desired archive mode. Make any other desired restart-only changes at the same time, see the configuration documentation for restart-only parameters.
  • 一旦所有连接都通过pgbouncer,请更改postgresql.conf以设置所需的存档模式。同时进行任何其他所需的仅重启更改,请参阅仅重启参数的配置文档。

  • Wait until there are no active connections
  • 等到没有活动连接

  • SIGSTOP pgbouncer, so it doesn't respond to new connection attempts
  • SIGSTOP pg​​bouncer,因此它不响应新的连接尝试

  • Check again and make sure nobody made a connection in the interim. If they did, SIGCONT pgbouncer, wait for it to finish, and repeat.
  • 再次检查并确保没有人在此期间建立连接。如果他们这样做,SIGCONT pgbouncer,等待它完成,并重复。

  • Restart PostgreSQL
  • Make sure I can connect manually with psql
  • 确保我可以手动连接psql

  • SIGCONT pgbouncer

I'd rather explicitly set pgbouncer to a "hold all connections" mode, but I'm not sure it has one, and don't have time to look into it right now. I'm not at all certain that SIGSTOPing pgbouncer will achieve the desired effect, either; you must experiment on a replica of your production setup to ensure that this is the case.

我宁愿将pgbouncer明确地设置为“保持所有连接”模式,但我不确定它有一个,并且没有时间立即查看它。我完全不确定SIGSTOPing pgbouncer是否会达到预期的效果;您必须在生产设置的副本上进行试验,以确保是这种情况。

Once you've restarted

Use WAL archiving and PITR, plus periodic pg_dump backups for extra assurance.

使用WAL归档和PITR,以及定期的pg_dump备份以获得额外的保证。

See:

... and of course, the backup chapter of the user manual, which explains your options in detail. Pay particular attention to the "SQL Dump" and "Continuous Archiving and Point-in-Time Recovery (PITR)" chapters.

...当然还有用户手册的备份章节,它详细解释了您的选项。请特别注意“SQL转储”和“连续存档和时间点恢复(PITR)”章节。

PgBarman automates PITR option for you, including scheduling, and supports hooks for storing WAL and base backups in S3 instead of local storage. Alternately, WAL-E is a bit less automated, but is pre-integrated into S3. You can implement your retention policies with S3, or via barman.

PgBarman为您自动化PITR选项,包括调度,并支持用于在S3而不是本地存储中存储WAL和基本备份的挂钩。或者,WAL-E的自动化程度稍低,但预先集成到S3中。您可以使用S3或barman实施保留策略。

(Remember that you can use retention policies in S3 to shove old backups into Glacier, too).

(请记住,您可以在S3中使用保留策略将旧备份推送到Glacier中)。

Reducing future pain

Outages happen.

Outages of single-machine setups on something as unreliable as Amazon EC2 happen a lot.

像Amazon EC2一样不可靠的单机设置中断发生了很多。

You must get failover and replication in place. This means that you must restart the server. If you do not do this, you will eventually have a major outage, and it will happen at the worst possible time. Get your HA setup sorted out now, not later, it's only going to get harder.

您必须实现故障转移和复制。这意味着您必须重新启动服务器。如果你不这样做,你最终会发生重大中断,并且会在最糟糕的时候发生。让您的HA设置现在整理出来,而不是以后,它只会变得更难。

You should also ensure that your client applications can buffer writes without losing them. Relying on a remote database on an Internet host to be available all the time is stupid, and again, it will bite you unless you fix it.

您还应该确保您的客户端应用程序可以缓冲写入而不会丢失它们。依赖Internet主机上的远程数据库始终可用是愚蠢的,除非你修复它,否则它会咬你。


分享到: