Cheaply retrieve data from Amazon AWS Glacier

When it launched Amazon Glacier was applauded for providing a super-cheap long-term storage solution. While there are no surprises when uploading and storing files, retrieving them can get expensive. The pricing reflects the fact that Amazon needs to retrieve your files from tape, which is expensive and takes a long time. Several users reported high charges after retrieving their backups. To its defence, Amazon published a very detailed FAQ on this topic.

The key to getting your files back on the cheap is time. You can retrieve 5% of your total files for free each month, but that amount is calculated per hour, rather than per month or day. Example: You keep 500GB in Glacier. 500GB*5%/30/24=36MB/hour.

That’s great to know, but how can you keep retrieving 36MB for days or months without doing it manually? If you’re on OSX or Linux you can use mt-aws-glacier. It’s a Perl script to track and retrieve your Glacier files. Using mtglacier for slow retrieval isn’t straight forward. There is an issue to improve it, but for now let’s work with what’s available.

The tool has two commands to restore files. One initiates the job on Amazon’s side. After the job completes (takes about 4h), the files can be downloaded. With that in mind, we need to tell mtglacier to only request a few files and then sleep for another hour. There is no way to limit the file size to be requested, but the number of files can be limited. As long as you have many small files, this works great.

If your hourly allowance is 35MB and your average file size is 10MB, 3 files per hour is about right with some space for error.

Hope this helps.

Yahoo: Email not accepted for policy reasons

Yahoo failed as internet company for a reason. Try sending an email with a link to a bank website. E.g. CIMB (Popular across Asia)

Your email will be rejected by Yahoo. Just awesome…

Workaround: Use a shortlink to hide your URL. E.g. Now your phishing emails will arrive safely. 😉

Unit testing for Jupyter (iPython) notebooks

At Quantego, we do most high-level work that supports energy analysts in Jupyter Notebooks. This allows us to pull several Java and Python packages together for a highly productive work environment.

Sample notebooks are hosted on Github and distributed with our Docker images. Of course we prefer for our sample notebooks to work, when people run them. They also uncover potential problems, by running at a very high level and thus using almost all available features.

If you have a similar setup – for example in a data analytics-driven environment – the following could work for you as well:

  1. Make sure your notebooks run correctly when running “Run All”. When testing you may try different things and run cells out of order. For automatic testing to work, they should run all in sequence.
  2. Test locally with

    This will convert your notebooks to HTML. We’re not intersted in the output, only potential errors during conversion. This only works with Jupyter and iPython >=4. Previous versions simply ignore errors.
  3. Next you could just run the same command in an isolated Docker container or in a CI step

    A full working example for CircleCI can be found in our sample-repo.

Online iPython Notebook Viewer

We recently started using the slide function of iPython notebooks. Basically it allows you to partition your notebook onto different slides, slide fragments and subslides. Those can be exported to reveal.js

There is already a great viewer for notebooks on To save some steps in exporting, converting and adding reveal.js, I took the idea and added a slide viewer. Anyone can use it to link to their slides on Github, Gist or any other place. We even support Basic Auth. Check it out at:


New nameservers

About a year ago EditDNS was bought by Dyn Inc. They have in fact ruined the old site and tried to lure as many customers as possible to their site. They didn’t honor lifetime memberships at EditDNS and even charged money for migration. Their prices are absolutely unrealistic as well. Hosting your DNS with them costs more than hosting a whole server. Fortunately there are some alternatives left.

Currently the site’s nameservers are mirrored in four locations, which should provide plenty of redundancy.

Additionally the most popular nameservers run by ISP are monitored hourly to detect any anomalies.

Email Arrival Times

I was interested in arrival times of emails this morning. This should reflect the world’s work- and communication patterns. Sample size is around 80,000. Here are the results. Most messages arrive in the late morning and from Mon to Thu. Sat is rather quiet. Times normalized to UTC+1

Email-Privacy and the Law

After having sent and received as many as 13,196 emails in 2010, I started thinking about how well this kind of communication is actually protected. The problem has a technical and legal perspective. I’ve long focused on the technical side. SSL, good passwords and some hard drive encryption should offer reasonable protection. The legal perspective is also not too bad. At least in Austria.

As opposed to Germany, Austrian law gives emails a similar protection as letters, as long as they are in transit and haven’t been downloaded to a user’s personal computer (=letter is still closed).

Abschließend kann also gesagt werden, dass die passwortgesicherten e-Mails in Österreich dem Schutz nach § 118 StGB (Briefgeheimnis) unterliegen. In Deutschland scheitert man beim Briefgeheimnis (§ 202 dStGB) für e-Mails am Erfordernis der Körperlichkeit. Nach § 202a dStGB sind nur passwortgesicherte bzw auf dem Übertragungsweg verschlüsselte  e-Mails geschützt.
by Prof Dr. Thomas Hoeren, Briefgeheimnis im Strafrecht

This general protection has been substantially weakened by a variety of “anti-terror laws” that have been imposed in the US and Europe. In fact most big providers who want to display advertising already weaken your privacy in their terms of service. Moreover, once they surpass a certain number of users, they are usually obliged to install a backdoor for government bodies.

For that reason, I strongly encourage everyone to run a private email-server. If you share it with your friend, the costs won’t be more than a few EURs per year and it’s a good learning opportunity. Moreover, if your admin lives in Austria you can hold him accountable, as if he was opening up your love letters.

If you can somehow emphasize the educational point of view, Amazon might even give you a free server for some time.