AWS Adventures, part 2 – high-availability FTP service

In our AWS migration, we found it necessary to run an FTP server. Yeah, I know — “FTP? In the 20-teens?”. Look, I get it — nobody wants to run an FTP server in this day and age. But it is still a convenient way for partner companies to transfer data to us via automation. This isn’t highly sensitive data; our main concern is keeping the FTP server isolated from our other services so that any vulnerabilities there don’t propagate to more critical systems.

At any rate, we found it surprisingly challenging to build a highly available FTP service in AWS.

Our environment is designed as a dual-AZ architecture. Everything we add to this stack is intended to be resilient against the failure of an AZ. So any service has to be located in both AZs.

Here’s a quick outline of the general approach:

  • two Elastic IPs
  • two ASGs, each in a different AZ, with one instance each (min = 1, max = 1; we’re just using this to auto-restart any failed instances)
  • separate launch configurations for each ASG to associate a specific EIP with the instance (hosts launched in AZ1 get one EIP and hosts launched in AZ2 get another EIP)
  • an EFS filesystem to share the FTP user database and the users’ “home” directories
  • Route 53 failover that can route a single hostname to either of the IPs in case the primary fails
  • CentOS 7 as the operating system
  • vsftpd for the FTP server, with passive port configuration configured to a limited port range
  • Security group that opens the ports needed for FTP

Let’s dive in a little deeper with the terraform scripts used to set up this environment.

Security groups

We need to define some security groups. The first will allow outside hosts to connect to our FTP service on port 21 and then again on ports 1024-1048 for passive FTP connections. Note that we will have to configure our FTP daemon to restrict its passive ports to this range.

We are only opening the FTP ports to hosts in a specific CIDR block (173.236.176.0/24), along with the CIDR blocks of Route 53’s health checkers. You’ll need to change the ftp-allowed-networks to a list of CIDR blocks that make sense for you.

The second security group will control access to the EFS filesystem that we are going to create. We will only allow hosts in our FTP security group to connect. You could add other security groups if you wanted; for example, you might have an automation security group with automation hosts that read the data FTP-ed in and process it. So you’d add that security group to the ingress rule in this security group.

EFS

We want a central place to manage user accounts for the FTP server, and we need a fixed location for all the files that are uploaded or downloaded. EFS is the perfect service for that. If you aren’t storing a ton of data, it won’t cost much at all. If you’re going to store terabytes of data, you will probably need to figure out a way to use S3, but that’s beyond the scope of this document (I’d probably start with s3fs-fuse, but I have no experience with that).

Note that we create a mount target in each AZ, and we assign those to the efs-ftp security group we created above.

On this EFS filesystem, you’ll want to create the following directory structure:

Elastic IPs

We define two elastic IPs thusly:

User data init scripts

The init script is going to do a few things:

  • Associate the appropriate Elastic IP (passed in from terraform’s data resource
  • Install vsftpd
  • Create a vsftpd user (all our virtual users will use this user’s privileges)
  • Configure vsftpd to use virtual users (so we don’t have to create actual linux users for FTP)
  • Set up symlinks to the shared vsftpd accounts databases, per-user config files, and the virtual users’ “home” directories
  • Configure selinux
  • Enable and start vsftpd

And here are the data resources that create a unique startup script for each AZ. The only difference in the variables we use for each one is the eip_allocation_id.

Launch configurations

Now we set up two launch configurations so that we can use the the appropriate rendered userdata script in each AZ.

Autoscaling groups

Now we set up two autoscaling groups, each referencing the appropriate launch configuration. Note that we are just using the default health checks (EC2 health checks), which probably are not quite adequate — if the OS is up and running, but the FTP daemon gets locked up, I’m not sure that the health check will catch this situation. Route 53 should still fail over to the secondary, but the ASG might not replace the failed instance.

You could write your own custom health monitor (maybe it would FTP a file and verify that it was transferred properly), which could use the AWS SDK to report the failed instance.

Route 53

We aren’t currently managing our zone in Route 53, so I don’t have a terraform script for this. But here’s how you set it up by hand. For this example, suppose you want to name your FTP server “ftp.ourcorp.com”.

  • Create a health check under Route 53 / Health checks:
    • name: ourcorp-ftp-az1
    • what to monitor: endpoint
    • specify endpoint by: IP address
    • protocol: TCP
    • IP address: your AZ1 Elastic IP address
    • port: 21
    • you can opt to create an alarm if you want
  • Create a second health check under Route 53 / Health checks:
    • name: ourcorp-ftp-az2
    • what to monitor: endpoint
    • specify endpoint by: IP address
    • protocol: TCP
    • IP address: your AZ2 Elastic IP address
    • port: 21
    • you can opt to create an alarm if you want
  • Create a primary record set under Route 53 / Hosted zones / ourcorp.com:
    • name: ftp.ourcorp.com
    • type: A
    • alias: no
    • ttl: 60
    • value: your AZ1 Elastic IP address
    • routing policy: failover
    • failover record type: primary
    • health check to associate: ourcorp-ftp-az1
  • Create a secondary record set:
    • name: ftp.ourcorp.com
    • type: A
    • alias: no
    • ttl: 60
    • value: your AZ2 Elastic IP address
    • routing policy: failover
    • failover record type: secondary
    • health check to associate: ourcorp-ftp-az2

Manage the users

The tools here are a little rough; it wouldn’t take too much to roll them up into a comprehensive script to add or delete a user. But I’ll leave that as an exercise to the user.

User accounts will go in a Berkeley DB, /mnt/vsftpd/conf/accounts.db. You have to use command-line tools from the db4-utils package to update the file.

Fortunately, pam_userdb_admin.pl serves as a good wrapper around these tools to help manage the database. I had to make a couple of configuration changes to it to work with this setup:

To create a user “foo” with password “bar”, you would do the following:

Congratulations if you made it this far. You should now have a working HA cluster of FTP servers. Try creating a test user and ftp-ing in (be sure to try from a host that is in the allowed CIDR blocks of the ftp security group!)

2 thoughts on “AWS Adventures, part 2 – high-availability FTP service

  1. Hi,

    Quick question : if you have a transfer going on and the AZ fails and thus you’re rerouted to the secondary server, you have to completely reinitiate the FTP session, right ?

    Or did you find some way that the 2 ftp servers can share the sessions’ state ?

    1. You are right. The current session would terminate, and probably not in a pretty way. The client might hang until it times out.

      It all depends on your use case as to whether or not this is adequate. In our case, we are receiving recurring uploads on a daily basis, and the stakes are pretty low. So if one transfer fails, it’s not a big deal.

Leave a Reply

Your email address will not be published. Required fields are marked *