Snapdragon Tech Blog

Musings of a systems administrator and open source developer

Find sites vulnerable to WordPress Content Injection Vulnerability

WordPress' update cycle is reaching the speed of Windows XP. Even Google is sending out warnings, urging site owners to update. For me they were not accurate, but there are still many vulnerable sites out there. One could – for example – use Nerdydata to search the internet’s source code for vulnerable WP versions. A simple search across their “Popular sites” dataset reveals close to 300 matches. Regex used: ver=4.7(\.1)?'

Optimize Spamassassin Detection Rate

Email is a terrible way to communicate and should be avoided where possible. Unfortunately it is also the lowest common denominator on the web and will continue to be for the near future. In the early days of the internet it was easy to run your own mailserver. Due to the absurd quantity of spam this task got increasingly harder and many tech-savvy people gave up and switched to Gmail or other services. This is a pity because a decentralized email infrastructure is harder to surveil, subpoena or shut down. I encourage everyone to run their own mail service if possible. In this guide I will summarize the steps needed to get an effective spamassassin(SA) setup.

Use Piwik for more Reliable Visitor Statistics (avoid referral spam and ad blockers)

Online marketers rely on statistics about their visitors to constantly adapt offers and learn about traffic sources. Google Analytics is the current tool of choice for most of them. Unfortunately GA is suffering from two major issues that won’t go away any time soon

Cheaply retrieve data from Amazon AWS Glacier

When it launched Amazon Glacier was applauded for providing a super-cheap long-term storage solution. While there are no surprises when uploading and storing files, retrieving them can get expensive. The pricing reflects the fact that Amazon needs to retrieve your files from tape, which is expensive and takes a long time. Several users reported high charges after retrieving their backups. To its defence, Amazon published a very detailed FAQ on this topic.

Yahoo: Email not accepted for policy reasons

Yahoo failed as internet company for a reason. Try sending an email with a link to a bank website. E.g. CIMB (Popular across Asia) http://www.cimb-bizchannel.com.my/index.php?ch=srvpack Your email will be rejected by Yahoo. Just awesome… Workaround: Use a shortlink to hide your URL. E.g. http://goo.gl/tPb19A. Now your phishing emails will arrive safely. 😉

Download Uber ride history to Python Pandas

With Uber rides this cheap and self-driving cars around the corner, I doubt that future generations will have their own cars. Except for extreme use cases, like commuting from the countryside. Personally I spent EUR 91 on Uber this year (2 months) and it got mit 260 km. That’s 0.31 EUR/km. There is an API to download your rides, but getting receipts/prices didn’t work for me, so I had to scrape them from the website directly.

New Release of invoice2data

Thanks to some awesome contributors, there is a new release for invoice2data. This Python package allows you to get structured data from PDF invoices. Major enhancements: powerful Yaml-based template format for new invoice issuers. improved date-parseing thanks to dateparser. improved PDF conversion thanks to new feature in xpdf better testing and CI option to add multiple keywords and regex to each field option to define currency and date format (day or month first?

Unit testing for Jupyter (iPython) notebooks

At Quantego, we do most high-level work that supports energy analysts in Jupyter Notebooks. This allows us to pull several Java and Python packages together for a highly productive work environment. Sample notebooks are hosted on Github and distributed with our Docker images. Of course we prefer for our sample notebooks to work, when people run them. They also uncover potential problems, by running at a very high level and thus using almost all available features.

Shell Function to Remove all Metadata from PDF

A handy function to remove all metadata from a PDF file. When done it will show all the remaining metadata for inspection. Needs pdftk and exiftool installed. Combines commands from here and here. Good job, guys. After adding this snippet in ~/.profile or copy and pasting it in the shell, you can just run

Incremental FTP backups

If you happen to only have FTP access to a server or account (CPanel) you’re looking after, LFTP is an efficient tool to keep incremental backups. This will make hard links of the previous backup and updated it, copying and storing only changed files.