Extend Pandas DataFrame with custom functions and attributes

At Quantego.com we love working with Pandas Dataframes. We use them to store and analyze results from simulation runs. On top of our data matrix and a multi-level index we also need to accommodate custom plotting functions and attributes from the previous simulation run.

Subclassing  pandas.DataFrame for this task was a no-brainer. The new version 0.16.1 (to be released in the next days) includes some fixes to make working with subclasses of complex data-frames (DF) easier. Here an example of what can be done. First define two new classes for  pandas.Series (single col DF) and  pandas.DataFrame . You can define new functions or attributes, as needed.

Notice  _constructor  and  _constructor_sliced . They make sure you get the correct class back, when slicing the DF.

Via  self  you have convenient access to all Pandas functions and can even roll your own.

Regex to find phone numbers in every format

I couldn’t find a truly universal regular expression (regex) to match phone numbers, no matter from which country and in which format. They all seemed to be limited in some way. Even named entity extraction APIs require you to set a country to find phone numbers.

In the end I rolled my own regex. It simply looks for a certain amount of numbers and characters generally used to make phone numbers human-readable. If you are looking to match longer or shorter numbers, you can just change the quantifiers. Some examples it will match:

While not matching:

And here the regex:

To use it in Python

 

Let me know, if this is useful for you or if you find space for improvement. Currently the biggest issue I see is that the matching ranges between numbers and total chars are unrelated. Due to many filling chars higher values are needed. Those can lead to false negatives. Best test it for yourself.

Python clipboard access

I was using Python and Jinja2 to generate some tables with 100+ rows for WordPress. This package saved me the extra step to open a file and copy+paste it from there.

There should be many other uses to integrate it into semi-automated workflows.

Check it out here.

Scalable Docker Monitoring with Fluentd, Elasticsearch and Kibana 4

Screen Shot 2014-11-20 at 14.38.27

Docker is a great set of technologies. Once you are comfortable with using it, you are presented with a set of challenges, you didn’t have before. To name some:

  • log consolidation: How to retrieve log files from dozens of containers?
  • monitoring: How much RAM and CPU is each container using?

There are a few articles on this topic out there. After reading them none of the solutions really hit me, but they all had some nice features which I chose to combine here. Continue reading

Linksnappy Command Line Downloader (Python)

Simple Python script to download files via Linksnappy.

Access Docker container attributes in Ansible

Ansible is a great automation solution. I mainly use it to provision servers and launch Docker instances on them. Sometimes I need container attributes, like PID or Port to configure Nginx or monitoring tools.

While the Ansible documentation gives you some hints, I didn’t find it 100% obvious on how to solve this. Basically all your newly-created containers will end up in a list called docker_containers. It has the same structure as docker inspect.

For the PID:

For the host port:

So you could add a PID-file for a container like this:

Also read the full docs here.

Advanced monit: Keep track of daemons, websites, RAIDs and partitons

Introduction

Are you already hosting your own mail- or webserver and do you enjoy the flexibility, control and freedom self-hosting gives you? Besides the many advantages like better privacy and the power to customize it gives you personally, you can also offer your services to other people. Even tough there are a large number of budget hosting companies, many customers are willing to pay for better support or the comfort to have you around for questions. Continue reading

Defragment Mac OSX from Recovery Mode

Despite some notions that SSDs or HFS drives don’t need defragmenting, I have often read and experienced myself that defragmenting your Mac every few years will clearly make it faster.

I had some trouble running iDefrag and would like to share a little trick I learnt. Basically it will refuse to run a full defrag, while your system drive is mounted. Restarting didn’t help. Here is what I did in the end:
Continue reading

High-performance SSH: Install HPN-SSH on OSX with keychain integration

freeimage 4019076 web

I use SSH for pretty much anything from VPN, server administration, database connections or iPython work on remote machines. When working from weird places and with weird internet connections SSH become painfully slow. I already use Mosh, but that also relies on ordinary SSH to initiate the connection.

Pittsburgh University has this OpenSSH-patch to remove some bottlenecks and make it 1000% faster (they claim). Continue reading

Lazy admin’s guide to automated updates (Part 2: Python pip)

Last week we discussed Linux Debian’s apt-get update mechanism and how to fully automate essential updates. This week I’d like to demonstrate how to do the same thing for Python. I admit that keeping Python packages up-to-date is probably not half as essential as keeping internet-facing server infrastructure updated. Nonetheless I like to work with the latest versions of packages, as they might fix problems or add features. Continue reading