Velvet Star Monitor

Standout celebrity highlights with iconic style.

general

How to find searchable PDFs

Writer Andrew Henderson

I have a folder with many PDFs. Some are no doubt searchable. Can I search for and identify only those which are searchable?

Adobe gives an error message if a PDF is an image, asking if you want to convert it to searchable text. I do not know if that is generic or specific to Adobe. I suppose a more complete question would have been how do I set aside the file if an image is encountered? I will read up on man pdfinfo to see if I find anything in there to help.

3

1 Answer

On a particular folder you can use pdfgrep:

pdfgrep --recursive --count .

The lines with zero at the end are not searchable (the dot is a regex that matches to any character). Also,

pdfgrep -r -c . | grep -oP "\:\d*$" | sed 's/^\:0$/Not searchable/g;s/^\:[1-9][0-9]*$/Searchable/' | sort | uniq -c

will give you some stats about how many are searchable or not.

Your Answer

Sign up or log in

Sign up using Google Sign up using Facebook Sign up using Email and Password

Post as a guest

By clicking “Post Your Answer”, you agree to our terms of service, privacy policy and cookie policy