avatar

Snapdragon Tech Blog

Musings of a systems administrator and open source developer

About

Welcome to my blog. I’m Manu and my passion is all things devops on Unix systems. Whenever I learn something useful, you will find it here.

Extend Pandas DataFrame with custom functions and attributes

At Quantego.com we love working with Pandas Dataframes. We use them to store and analyze results from simulation runs. On top of our data matrix and a multi-level index we also need to accommodate custom plotting functions and attributes from the previous simulation run. Subclassing pandas.DataFramefor this task was a no-brainer. The new version 0.16.1 (to be released in the next days) includes some fixes to make working with subclasses of complex data-frames (DF) easier.

Regex to find phone numbers in every format

I couldn’t find a truly universal regular expression (regex) to match phone numbers, no matter from which country and in which format. They all seemed to be limited in some way. Even named entity extraction APIs require you to set a country to find phone numbers. In the end I rolled my own regex. It simply looks for a certain amount of numbers and characters generally used to make phone numbers human-readable.

Python clipboard access

I was using Python and Jinja2 to generate some tables with 100+ rows for WordPress. This package saved me the extra step to open a file and copy+paste it from there. There should be many other uses to integrate it into semi-automated workflows. Check it out here.

Extract structured data from PDF invoices

Most invoices exist in electronic format. They are generated from structured data and need to be entered as structured data. It’s a shame that we still need humans to manually extract data points, like amount, date or issuer from it. In the last days, I tried a few online invoicing solutions, like shoeboxed, but none of them does a good job at automatically recognizing new invoices. Some do it manually and charge accordingly.

Upodder – command line podcast client

Today I refactored an old open source project and published it on Pypi. Was a great learning experience. Hadn’t done Python packaging before. Upodder reads a text file with RSS feeds and downloads podcasts to a local folder. It will keep old entries in a database and only downloads entries published in the last X days. Check it out at: https://pypi.python.org/pypi/upodder/

Scalable Docker Monitoring with Fluentd, Elasticsearch and Kibana 4

Docker is a great set of technologies. Once you are comfortable with using it, you are presented with a set of challenges, you didn’t have before. To name some: log consolidation: How to retrieve log files from dozens of containers? monitoring: How much RAM and CPU is each container using? There are a few articles on this topic out there. After reading them none of the solutions really hit me, but they all had some nice features which I chose to combine here.

Linksnappy Command Line Downloader (Python)

Simple Python script to download files via Linksnappy.

md2pdf – Command line Markdown to PDF converter with support for CSS stylesheets and custom fonts, Python

I like to write my notes and reports in Markdown and then send them out in PDF. Gimli worked OK for a while but rasterizes files and doesn’t work with UTF8-characters. I finally came across a similar project in Python and now I’m very happy with it. You can define a custom style sheet in your .profile and md2pdf will use it.

SSLv3 no longer supported

I had SSLv3 disabled for HTTP for quite some time. In the light of recent event, it is now also disabled for IMAP and SMTP. If you run into any trouble, let us know or update your clients.