We recently decided we want to upgrade one of our Aurora RDS clusters from
Continue reading Upgrading instance class on Aurora cluster with terraform
db.r4.large. Our entire environment is managed by Terraform. It was not clear from Terraform documentation what would happen if we just changed the instance size and applied the changes. Would Terraform be smart enough to upgrade the writer instance in AZ1, failing over to the reader in AZ2, and then when that was complete, upgrade the newly promoted writer instance, failing back to the new instance in AZ1?
I’m addicted to performance metrics on my server infrastructure. For years in our on-prem environment,
we monitored thousands of data points using ganglia, nagios, and zabbix. In our new AWS infrastructure,
we thought we’d look for some more sophisticated options.
Continue reading AWS Adventures, part 5 – monitoring infrastructure
CloudWatch is a great concept — super-easy to configure and inexpensive. And at first glance, it actually looks pretty nice. But after I spent about 30 minutes with it, I realized it wasn’t easy to use. The units used are especially hard to interpret. This is my best attempt to explain what the network values mean.
Continue reading AWS Adventures, Part 4 – CloudWatch network monitoring
For a number of years, we have streamed HLS video via CloudFront, using a
Wowza Streaming Engine server to convert our RTMP streams to HLS on the fly. CloudFront
provides almost infinite scalability for the HLS stream, since the static chunk files are
For high availability purposes, we want to use two independent WSE servers in two AWS
availability zones. But this has been problematic. The two servers are never 100% in
sync with their HLS chunking of the incoming live stream. This can cause the client
to get a bad response to a request, thereby dropping the live stream.
After a lot of experimentation, I have come up with a way to assemble a multi-AZ, high
availability cluster of WSE servers that can reliably stream HLS video from an incoming
Continue reading AWS Adventures, Part 3 – HA Wowza Live HLS
In our AWS migration, we found it necessary to run an FTP server. Yeah, I know — “FTP? In the 20-teens?”. Look, I get it — nobody wants to run an FTP server in this day and age. But it is still a convenient way for partner companies to transfer data to us via automation. This isn’t highly sensitive data; our main concern is keeping the FTP server isolated from our other services so that any vulnerabilities there don’t propagate to more critical systems.
At any rate, we found it surprisingly challenging to build a highly available FTP service in AWS.
Continue reading AWS Adventures, part 2 – high-availability FTP service
After a lot of reading about AWS and the failures that have happened over the years, I’ve come to the conclusion that to be truly resilient against complete AZ failure, you need to have enough capacity running in both AZs to handle the entire load of your application.
Continue reading AWS: hedging against AZ failure
We are in the middle of a massive migration to the AWS cloud. While we are excited by the prospects of ditching a lot of our hardware responsibilities, you can’t make a change this big without some pain.
So far, Snowball has been the biggest source of frustration.
Continue reading AWS Adventures, part 1 – Snowball