Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Current »

Prerequisites

It is assumed you already have the MongoDB database tools and the PostgreSQL client/CLI (psql) installed.

https://www.mongodb.com/try/download/database-tools

https://www.postgresql.org/download/

Additionally, if you are using AWS or Kubernetes, it is assumed you have kubectl and AWS CLI installed.

Assumptions

It is assumed that you are very comfortable using a terminal/terminal commands and are familiar with CLIs. It is additionally assumed that the database in question is only a small to medium size and that the machine in question has local storage with a decent amount of space remaining (30+ GB). Otherwise you have to have access to AWS CLI/S3 or another bucket interface that has stdin/stdout access and can store large amounts of data. Finally, it is assumed the user knows or has access to all the necessary credentials for the databases or any other cloud service that must be accessed.

Overview

The process for backing up and restoring both databases is somewhat simple in concept. Assuming the database isn’t too large, one can directly dump the database into a local dump file using that database's specific tools. Then another instance of the same database type can be restored from that dump. There are some variations of this process that will be discussed later, including uploading the dump file directly to s3 as well as restoring from s3, however the overall procedure remains the same.

It is important to note that these commands can stream unencrypted data over the network, if that is a concern there are a few ways to solve it. If one is exclusively working in a AWS VPC ecosystem, using an ec2 node as the machine running the commands could keep all your data inside of your private AWS network. If you are port forwarding a database instance from kubernetes that should already be secured via kubectl’s encryption. Another method is that if ssl is configured for the database connection, simply add the environment variable PGSSLMODE=require for Postgres and the additional command parameter -ssl for Mongo. This will allow you to securely transfer data across the network.

PostgreSQL

Here it will be discussed the various ways to dump and restore a Postgres DB instance. The tools used to backup a Postgres DB are the pg_dump and psql utilities. Additionally AWS CLI may be used in discussions about S3 uploads of and restores from the dump file.

Everything should be done with the postgres user with root privileges.

Local Dump File

Here we will just be saving the dump file to local storage from a remote host.

This only works if your local device has enough storage space for the database dump file, otherwise later methods including but not limited to s3 uploads are necessary.

Dump a file from the remote database to your current working directory with (replacing all parameters with the necessary credentials):

pg_dump dbname -h hostname -p port -U username -W > dumpfile.sql

This will dump the single specified database into the file. If you would like to dump all information from a Postgres instance, including all databases run this instead:

pg_dumpall dbname -h hostname -p port -U username -W > dumpfile.sql

Super user access on the user is required to save to and restore from pg_dumpall

To restore the database run:

psql dbname -h hostname -p port -U username -W < dumpfile.sql

The database dbname will not be created by this command, so you must create it yourself from template0 before executing psql (e.g., with createdb -T template0 dbname)

Or if you run pg_dumpall, run:

psql -f dumpfile postgres

If the database is just slightly too large for your local storage to comfortably handle, please refer to https://www.postgresql.org/docs/current/backup-dump.html#BACKUP-DUMP-LARGE for ways to compress the dump, furthermore these methods can be used for uploading a compressed dump to a S3 bucket as well.

Pipe dump file directly to new database

If the database you want to restore to is already up and running, you can simply run the command:

pg_dump dbname -h hostname -p port -U username -W | psql dbname -h hostname -p port -U username -W

The database must still be created in the new DB before piping over the dump, additionally this method does not seem to work with pg_dumpall.

Backup and restore to and from S3

To dump to an s3 bucket (using your local machine as an intermediary pipe) run:

pg_dump dbname -h hostname -p port -U username -W | aws s3 cp - s3://my-bucket/dumpfile.sql

If the size of the file is larger than 5 GB, you will have to set the --expected-size parameter for the S3 bucket or you have the chance to fail on upload.

To restore a database from an s3 bucket, run:

aws s3 cp s3://mybucket/dumpfile.sql - | psql dbname -h hostname -p port -U username -W

MongoDB

Here it will be discussed the various ways to dump and restore a Mongo DB instance. The tools used to backup/restore a Mongo DB are the mongodump and mongorestore utilities. Similar to Postgres, AWS CLI may be used in discussions about S3 uploads of and restores from the mongo dump file.

If this is being done on a live server, specify the additional command parameter of --oplog on mongodump and --oplogReplay on mongorestore.

Local Dump File

Here we will just be saving the dump file to local storage from a remote host.

This only works if your local device has enough storage space for the database dump file, otherwise later methods including but not limited to s3 uploads are necessary.

Dump a file from the remote databases to your current working directory with (replacing all parameters with the necessary credentials):

mongodump --host=mongodb1.example.net --port=3017 --username=user --password="pass" --out mongodump

To restore the database run:

mongorestore --host=mongodb1.example.net --port=3017 --username=user --password="pass" mongodump

If the database is just slightly too large for your local storage to comfortably handle, please refer to https://www.mongodb.com/docs/database-tools/mongodump/#std-option-mongodump.--gzip for ways to compress the dump, furthermore these methods can be used for uploading to an S3 bucket as well.

If you only want to copy one database, specify the additional command line parameter --db=dbname to both mongodump and mongorestore. Unlike Postgres, mongo will automatically create the new database for you on restore.

Pipe dump file directly to new database

If the database you want to restore to is already up and running, you can simply run the command:

mongodump --host=mongodb1.example.net --port=3017 --username=user --password="pass" --archive \
| mongorestore --host=mongodb1.example.net --port=3017 --username=user --password="pass" --archive

If you only want to copy one database, specify the additional command line parameter --db=dbname for mongodump and --nsFrom 'SOURCE_DB.*' --nsTo 'TARGET_DB.*' for mongorestore. Unlike Postgres, mongo will automatically create the new database for you on restore.

Backup and restore to and from S3

To dump to an s3 bucket (using your local machine as an intermediary pipe) run:

mongodump --host=mongodb1.example.net --port=3017 --username=user --password="pass" --archive | aws s3 cp - s3://my-bucket/dumpfolder

If the size of the file is larger than 5 GB, you will have to set the --expected-size parameter for the S3 bucket or you have the chance to fail.

To restore a database from an s3 bucket, run:

aws s3 cp s3://mybucket/dumpfolder - | mongorestore --host=mongodb1.example.net --port=3017 --username=user --password="pass" --archive

Special Cases

Kubernetes

If your database is in a kubernetes cluster as a pod or deployment, you will have to port forward the instance in order to receive access to it. This is done with the command:

kubectl port-forward name-of-the-pod :port-pod-is-listening-to

This will return a output similar to:

Forwarding from 127.0.0.1:port-you-can-connec-to -> 27017
Forwarding from [::1]:port-you-can-connec-to -> 27017

Now you should be able to access the database locally using the given port and using localhost as the host parameter. For further information, please visit https://kubernetes.io/docs/tasks/access-application-cluster/port-forward-access-application-cluster/.

References and further reading

  • No labels