brew install qpdf brew install ocrmypdf
This blog post is mainly for myself but if it helps you, great. In general, three pesky pdf manipulations I find myself involved with are:
- ocr-ing documents
- combining documents fast
- extracting pages from documents
Homebrew - terminal
I’m just assuming you are on a Mac of sorts and have access to the terminal. My favorite terminal implementation is iterm2. You can customize it by googling for a guide. Here’s a good one.
Make sure you also install Homebrew, the easy package manager. Just copy the install homebrew code in the terminal.
Three main functions: installing, updating, removing.
For installation, just copy the install function of a package. For this post’s packages those are qpdf and ocrmypdf.
Most packages will be some sort of interface to the real package, those are called ‘casks’. For example, a lightweight pdf reader I like using is called skim. It’s install command is
--cask skim brew install
After a while you want to maybe update (every so often). That command is simply
brew upgrade
To remove packages, just
<packagename>
brew uninstall --cask <packagename> brew uninstall
Great, now we have qpdf
, ocrmypdf
and maybe skim
. Next, step, navigate to the folder where your document is located (must be absolute path).
/TO/FOLDER/JUST/COPY/THIS cd PATH
You can check the files with the ls
function.
ls
OCR-ing text
Very easy, the basic command is on the left. I often find I have to redo suboptimal previous ocr-ing, so that is where the force argument comes from.
--force-ocr ocrmypdft inputdocument.pdf outputdocument.pdf
Bam, presto!
Selecting pages from a file
You can clip page ranges, e.g., pages 1-3, or just single pages, e.g. 1, or a combination. Note: these pages are the actual pages of the document, not the internal numbering.
--pages . 1-3 -- result.pdf qpdf input.pdf
Combining files
To combine, we make use of similar syntax. Here we combine the first page from file1.pdf with the second-third and seventh page of file2.pdf
--empty --pages file1.pdf 1 file2.pdf 1-3,7 -- result.pdf qpdf