Published: 27. 9. 2019 Category: GNU/Linux

Faster data transfer with AWS EFS

AWS EFS (Elastic File System) is remotely accessed cloud file system with almost infinite storage, mounted via NFS 4.1. It is very handy in the situation when you need to share data between different servers and your application just uses standard operating syscalls for disk/data storage access. AWS claims that it is a high-availability solution and I do not remember any major outage in the past two years (for example S3 outage in 2017 affected a lot of global services like Netflix or Facebook). On the other hand, our monitoring sometimes reports short (a couple of minutes) NFS timeouts during night hours, but it happens very sporadically.

EFS development has gone via several milestones. From unencrypted remote storage to the storage with userspace encryption (encfs) and finally, data storage fully encrypted in rest. I had to migrate data several times during these milestones to be up to date and, of course, moving towards encryption.

I have used rsync and copying data was very slow, it was so slow, that it took days to transfer hundreds of gigabytes of data. What is wrong? I was googling, but I did not find any solution only other people's complains about slow write operation.

Solution for faster migration was not obvious and can be found in this presentation:

It is quite simple: you need to use just cp and create as many parallel copy processes as possible! This was quite surprising for me, because first, I am a big rsync-fanatic and I am usually okay with OS behavior with process handling and I do not need to enforce process creation manually (and on other hand, would you think, that forcing more copy processes will speed up transfer through network bottleneck?).

Based on the previous presentation, I have created this script utilizing GNU parallel utility called parallel_sync.sh:

#!/bin/bash
#
# Author: Martin 'BruXy' Bruchanov, bruchy at gmail.com
# GitHub: https://github.com/BruXy/bash-utils
#
#%A
# Parallel rsync
# ==============
#
# EFS data migration need run more copy processes concurently. Set number
# of processes in N_PROC variable.
#
# Usage example:
#
# parallel_sync.sh [source_dir] [dest_dir]
# parallel_sync.sh /mnt/source/ /mnt/efs/destination/
#
# The script was inspired by Amazon presentation:
# https://youtu.be/PlTuJx4VnGw?t=1840
#%B

SOURCE_DIR=$1
DEST_DIR=$2
N_PROC=20
CPY_CMD="cp -up"

# Input check
# -----------

[[ "$1" =~ ^-?-h(elp)?$ ]] || [ $# -ne 2 ] && {
    sed -ne '/^#%A/,/^#%B/s/\(^#\|^#%[AB]\)//p' "$0" >&2
    echo "$0: Please provide source and destination directories." >&2
    exit 1;
}
[ -d "$SOURCE_DIR" ] || {
    echo "$0: Source directory '$SOURCE_DIR' does not exist! " 1>&2; exit 1;
}
[ -d "$DEST_DIR" ] || {
    echo "$0: Destination directory '$DEST_DIR' does not exist! " 1>&2; exit 1;
}

# Is parallel present?
# --------------------

parallel --version > /dev/null 2>&1 || {
    echo "$0: Please install GNU parallel."
    exit 1
}

# Clone directory tree
# --------------------

echo "$0: Cloning directory tree from '$SOURCE_DIR' to '$DEST_DIR'."
find "$SOURCE_DIR" -type d | \
        sed -e "s:$SOURCE_DIR::" | \
        parallel -j $N_PROC "mkdir -p ${DEST_DIR}/{}"

# Copy data
# ---------

echo "$0: Running $N_PROC process of rsync in parallel."
find "$SOURCE_DIR" ! \( -type d \) | \
        sed -e "s:$SOURCE_DIR::" | \
        parallel -j $N_PROC "$CPY_CMD $SOURCE_DIR/{} $DEST_DIR/{}"

It has two steps after the initial input check. The first step is traversing a source directory structure and creates the same structure in a destination. The second step runs 20 parallel copy processes. You can put the number of them in N_PROC variable. I have kept this number relatively low, but I was doing this migration on live production machines, so I did not want to throttle them much. The result, transfer time decreased from days to hours!

Note: I also like if a script has some documentation at the beginning and because the user usually expects to get some information with --help/-h, I am using regex to check CLI options and sed to display this documentation between #%A/#%B tags. The rest is quite self-explanatory.

EFS benchmarking or write operation

I have tested several variants of EFS for write speed. The test was quite simple, copy 1-gigabyte block of zeros to the partition. I repeated this 5 times and calculated average speed:

dd if=/dev/zero of=/mnt/efs/test.raw bs=1G count=1
Enforced sync: dd if=/dev/zero of=/mnt/efs/test0.raw bs=1G count=1 conv=fdatasync

EFS has two different modes:

General Purpose performance mode for most file systems (default one, we are currently using this).
Max I/O performance mode is optimized for applications where tens, hundreds, or thousands of EC2 instances are accessing the file system — it scales to higher levels of aggregate throughput and operations per second with a tradeoff of slightly higher latencies for file operations.

I have also compared it with EBS (Elastic Block Storage) which is used as default system volume and two EBS volumes with AWS side encryption with software RAID1 (mirroring with mdadm).

AWS environment is quite tricky for measuring I/O performance, because a lot of services supports "burst" mode, so you can throttle I/O for maximum for a short time, but in a long-term usage the I/O performance is spread over a longer period.

Configuration	Speed
Encrypted EFS, Max I/O on c4.8xlarge	74.22 MB/s
Encrypted EFS, GP on c4.8xlarge	102.4 MB/s
EBS RAID1 on c4.8xlarge	509.5 MB/s
EBS RAID1 with write sync on c4.8xlarge	100.66 MB/s
EncFS, GP on c4.8xlarge	54.28 MB/s
EFS, GP on c4.8xlarge	143.4 MB/s
S3FS on c5.9xlarge	79.6 MB/s
S3FS with fdatasync on c5.9xlarge	76.7 MB/s
EFS MaxIOPS 1024 MB/s, Network 10 Gbit/s of c5.9xlarge	83.5 MB/
EFS MaxIOPS 1024 MB/s, Network 10 Gbit/s of c5.9xlarge, fdatasync	79.0 MB/s
c5.24xlarge MaxIOPS 1024 MB/s, Network 25 Gbit/s	78.22 MB/s
c5.24xlarge MaxIOPS 1024 MB/s, Network 25 Gbit/s with file sync enabled	79.34 MB/s

Notes: I did not measure reading speed because I did not find it critical for my application. For writing, EBS (with enabled encryption) was fastest, however when the sync is enforced its performance is quite similar to EFS. I tried a similar test with S3 via s3fs and in this case, it uses HTTPS over TCP. There was no difference between c5.9xlarge and c5.24xlarge even Amazon claims that 24xlarge is on the 25 GBit/s network (weird :P). In the end, we are using Encrypted EFS with general-purpose enabled, the application also keeps filenames and path in a separate database, so files are accessed directly without any overhead like traversing directories or searching a long list of files, etc.

Number of comments: 0

BruXy.REGNET.CZ

Faster data transfer with AWS EFS

EFS benchmarking or write operation