Shell Function to Remove all Metadata from PDF

Aug 28, 2015 01:45 · 98 words · 1 minute read

A handy function to remove all metadata from a PDF file. When done it will show all the remaining metadata for inspection. Needs pdftk and exiftool installed.

Combines commands from here and here. Good job, guys.

clean_pdf() {
 pdftk $1 dump_data | \
  sed -e 's/\(InfoValue:\)\s.*/\1\ /g' | \
  pdftk $1 update_info - output clean-$1
 exiftool -all:all= clean-$1
 exiftool -all:all clean-$1
 exiftool -extractEmbedded -all:all clean-$1
 qpdf --linearize clean-$1 clean2-$1
 pdftk clean2-$1 dump_data
 exiftool clean2-$1
 pdfinfo -meta clean2-$1

After adding this snippet in ~/.profile or copy and pasting it in the shell, you can just run

clean_pdf my-unclean.pdf