AWS Adventures, part 1 – Snowball

We are in the middle of a massive migration to the AWS cloud. While we are excited by the prospects of ditching a lot of our hardware responsibilities, you can’t make a change this big without some pain.

So far, Snowball has been the biggest source of frustration.

We have a 6.5TB archive of over 25 million web media files (images, audio, video, and closed caption files). The most cost-effective way to store them in AWS is on S3. We looked at the time it would take us to transfer those files over the network, and it was going to be a huge pain (10-15 days of transfer, assuming nothing goes wrong, requiring us to start over). Plus, our current co-lo provider would charge us a fair amount of money for the bandwidth.

So we opted to use Snowball, a program where AWS ships you a hardware device that you connect to your server and you can copy up to 50TB of data, ship it back to Amazon, and they load it into your S3 bucket. All this for $200. Sounds great, right?

First, know that Snowball is slow. We connected it to our network, which uses a Brocade gigabit switch with plenty of backplane. We were pulling from a replicated copy of our primary file server, so the machine we were reading from had virtually no load on it. We had hoped to get transfer rates in the neighborhood of 800Mbps. Nope. We got rates more along the lines of 80Mbps. So it took days to copy the files.

Verification is worse. The snowball client provides a verification step, but it was so slow that we just didn’t see the point. We couldn’t even tell how long it was going to take. Best guess would be another 5-7 days of verification. We opted to skip this step and send the device back to AWS (you only have the device for 10 days before you start incurring charges for additional days).

Shortly after it arrived at AWS, we got this from AWS support:

We are contacting you concerning your Snowball job JID280de124-d633-4652-939f-xxxxxxxxxxxx. We have received your Snowball in our data center and are attempting to process it but have encountered an error. We do not suspect that your data is at risk at this point and we will continue to debug the issue. There is no action required in your part at this time. We will notify you if we need more information from you, in the mean time you can look for progress in the Snowball console. We apologize for this delay.

Uh, OK (warm fuzzy feelings dissolving rapidly).

The following day, we got this:

Your job JID280de124-d633-4652-939f-xxxxxxxxxxxx with AWS Snowball has completed. For more information, including downloadable success and error logs, check details for this job in the AWS Snowball Management Console.

No explanation of what that error was or anything. The Snowball management console had a “failure log” but it just listed two files, and it was very unclear whether anything had actually even gone wrong with those files. So now we’re in a situation where we aren’t really confident in what has been loaded into S3. Oh, and BTW — it took AWS three days to import the data. That was for 6.5TB. Imagine if we had filled that Snowball with 50TB!!!

At this point, we knew we had to validate the integrity of the files ourselves, using checksums. We were expecting the success log to contain the MD5 checksum of each file. It didn’t. We found that we could get the MD5 checksum from the Etag of some of the objects, but for larger files, the Etag is not the same as the MD5 checksum (presumably these were files that were uploaded via multi-part upload).

So we actually had to GET the larger files to do some MD5 checksums on them ourselves. We ended up doing a random sampling to see if there was a substantial rate of errors. We encountered no errors. The smaller files we could validate via their MD5-based Etags.

So after all that, the files did get transferred properly, but the whole process left our confidence shaken, it wasted tons of time, and it cost us lots of unnecessary S3 requests.

Now for the real kicker — content-types. When we went to serve the contents of our S3 bucket through a CloudFront distribution, we we surprised to find that all of the files had a content-type of “application/octet-stream”. Files that we upload to S3 via the API automatically get the content-type set according to the file extension (right or wrong, we have become very accustomed to this sort of mapping, and many developers, including us, take it for granted). We filed a ticket with AWS support.

It took seven days for AWS support to come back with this:

I was shocked on a number of levels:

  • why doesn’t snowball support such basic functionality in the first place? Serving web content from an S3 bucket is a common use case. It doesn’t work if content-type is not set properly. Lots of sites have big archives that would require snowball transfer. Hence, it would follow that lots of customers might use snowball to transfer an archive of web content to S3 and be disappointed.
  • why isn’t this *documented* somewhere? We might have opted to just send the data over the wire instead of using snowball. If we had done that, we would have been done by now
  • why did it take a week for tech support to tell us that this is not supported? I can’t believe you don’t have multiple snowball customers every single day who are hitting this problem! It should be a fact that every AWS support tech knows.

So now we’re going to have to use the SDK or CLI to correct 25 million files’ content-type metadata. Granted, I think AWS is going to work with us on the cost associated with that, but this has been more hassle than it was worth.

Leave a Reply

Your email address will not be published. Required fields are marked *