• Tag Archives pdftk
  • Compress PDF for College Applications

    It’s always a rush: the deadline is approaching, there’s exams coming up, you are applying for a chance to study in your dream college next year.  As you near the end of the application form, they always want soft-copies of your certificates, in PDF.

    And of course, the online application always has some ridiculously low file size limit.  They often have file number limits too, meaning that you have to combine your scanned certificates all into a single PDF.

    Recently, I was applying for a Ph.D. where I must upload all my certificates, supporting documents, SOP, filled application, etc. as a single PDF file.  And of course there was an unrealistically small size limit.  For another application, I could upload three documents, but they must all be less than 2MB.  Yet another application required that the PDF file have all pages rotated to the correct orientation.

    How is a full A4 certificate or transcript, two pages, full color, scanned at 300dpi, supposed to fit in 2MB?!?  My PDF, all told, was 13MB, and about 5 pages (Bachelor’s certificates, MSc certificates, diplomas, etc).  And it had to fit in 2MB???  I assume that it should also be legible?

    Well, the answer is, yes, we can!  The answer to all your PDF worries is ImageMagick and PDF Tool Kit (aka pdftk) – wonderful life-saver programs present on most GNU/Linux systems.

    Not on Linux?  My friend, my friend, it’s time to free yourself!  Download Linux Mint and give it a spin!  You can even run LinuxMint from the DVD, just for the purpose of using PDF Tool Kit, if you’re that desperate.

    Still not on Linux?  Well, you can always use Adobe Acrobat Pro, but it is costly!  I happen to have a licensed version of Adobe software, which I do use on occasion.  However, more and more often, I don’t need to use it because there’s a perfectly fine free, open source alternative.

    Well, if you’ve come here looking for answers on “How do I compress my PDF?”, I shouldn’t keep beating around the bush.

    Compress your PDF of Scanned Documents:

    There are a number of solutions you’ll find via Google.  I’ve tried them all, with little success.  Finally, I found a command that actually works:

    convert -density 300×300 -quality 5 -compress  jpeg input.pdf output.pdf

    Change input.pdf and output.pdf to suit your needs.  I have found that (surprisingly!) using a quality of 5 usually is legible.  You can increase it up to 100.  I usually convert the big file once using this command, take a look at it, and increase the quality if the file is under the size limit, or decrease if it’s too big.

    There’s a lot more that ImageMagick can do for you.  To see some more examples, check out CatlingMindSwipe (including how to convert a JPG into a PDF).

    Re-organize / Rotate pages in a PDF file:

    Another task usually involved in submitting applications is rotating your PDF files, chopping out pages that you don’t want to send, or combining many PDF files into a single PDF.

    Yes, there are a number of online sites that will combine files for free, edit them, etc., but why would you use them?  There’s a better way to do this, right at your finger tips!

    PDF ToolKit (from now on, just pdftk) lets you do all this in a very nice, simple interface.  To know the full power, just type “man pdftk”, and be overwhelmed with the power suddenly in your control!  For a quick start, read on:

    Scenario: You scanned your certificate, and saved it as a PDF, but it’s sideways.  You need to rotate it.

    pdftk A=input.pdf cat AE output output.pdf

    Explanation: You read the input file to A, then you put A in the cat’s dish, then you rotate it towards the East (picture the document sitting on a compass rose), and then out output it to output.pdf.  It might sound strange, but cat actually doesn’t represent a feline, it’s short for concatenate, or join-files-together.  Here, you’re joining A (rotated to the E) with nothing, so it’s just A.  You can also rotate a document West (W), or to the South (S).

    Scenario: You need to upload a bunch of certificates, but can only upload one file.  Let’s use that cat!

    pdftk A=input1.pdf B=input2.pdf C=input3.pdf cat A B C output output.pdf

    Now cat makes sense!  Basically, you’re just joining A, B, and C into one file, output.pdf.

    You can have as many input files as you need (I’ve gone all the way to L).  If you get strange errors, and you’re pretty sure that the command was correct, then one of your PDFs might be malformed.  Try removing one input file at a time from the command until it works, then you know which one was the culprit.  If you are joining huge documents, you might get out of memory issues: just join them into two bunches, and than join the two bunches with another command.

    Scenario: You need to extract some pages of a book to send your writing sample.  For the example, let’s say you need to extract pages 2-10.

    pdftk A=input.pdf cat A2-10 output output.pdf

    That’s it!  Quite simple!  We can make it more complicated… say that, on top of that, you need to rotate the pages 90° to the west (you scanned them the wrong way):

    pdftk A=input.pdf cat A2-10W output output.pdf

    Scenario: You need to do all the above!

    pdftk A=writing_sample.pdf B=scanned_Diploma.pdf C=transcript.pdf cat A2-10W BE C output My_Documents.pdf

    This takes your scanned Writing_Sample.pdf, extracts pages 2-10, rotates them to the West so that your reviewers can read it online without cricking their necks, then rotates your Diploma East for the same reason, and tacks on your transcripts, all into a single file that you can upload called My_Documents.pdf!

    If My_Documents.pdf is too big, try compressing it using the convert command (above).

    Good Luck with your applications!  What’s your favorite way to handle PDFs?