Using Command Line Switches to Save a PDF as Text - Can it be done?

Is this possible? If so, then does anyone know how to do this?

29k 8 8 gold badges 64 64 silver badges 93 93 bronze badges asked Jul 28, 2009 at 19:09 bryan beverly bryan beverly

Ugh, please pay more attention to your tagging in the future. Categorize your question. Don't try to summarize it. Each tag should stand on it's own.

Commented Jul 28, 2009 at 19:13

Not sure which OS you are running, but there is a tool called "pdftotext" that seems to do what you want. It's available in Linux, but there may be comparable tools for other operating systems.

Commented Jul 28, 2009 at 19:16

I'm sorry, I neglected to mention the operating system. This is Windows. I have heard of this tool; unfortunately buying a solution is not an option - hence we are left with building one. Thanks though!

– bryan beverly Commented Jul 28, 2009 at 19:20 Not sure what "buying" means for you. pdftotext is free. Commented Jul 28, 2009 at 19:22 Yes - I just checked the approved software list - Python 2.5.2 is approved – bryan beverly Commented Jul 29, 2009 at 18:33

5 Answers 5

It is a npm package and you need to install nodejs (and npm) to use it.

It can be used as a command line tool:

npm install -g easy-pdf-parser pdf2text test.pdf > test.txt 

And this tool will sort text lines by their y coordinates, so it works great at most case. And it also works well with unicode and cross platform.

answered Jul 14, 2018 at 9:14 luochen1990 luochen1990 3,797 1 1 gold badge 23 23 silver badges 40 40 bronze badges

I don't understand why you'd not want to use free software (not freeware), pdftotext is the ideal solution. However, if you just want to actually open and save the PDF in an automated fashion using the Windows GUI, you could use vbscript and the sendkeys command.

Just use pdftotext though, it would be much more reliable and won't cost you a whole box.

answered Jul 28, 2009 at 20:55 Gareth Davidson Gareth Davidson 4,887 2 2 gold badges 27 27 silver badges 45 45 bronze badges

Thanks - I was thinking about doing the 'SENDKEYS' method but wanted to see if there was something quick and dirty. Yes, our environment places tight restrictions on acceptable software. Using Python (which is approved) may also be an option.

– bryan beverly Commented Jul 29, 2009 at 18:32

+1 - for one should always use the google "site:http: //stackoverflow.com" <> to save some 5 minutes of crap ware advertisements, because clever guys like you have posted already the answer . Commented Jun 7, 2012 at 9:40

The problem with using open source software is that it might not be compatible with your closed-source project, or even if it is then your employer may have blanket policies that prohibit use of open source to avoid potential misunderstandings of the licenses and thus legal problems. I worked for a company who had this policy because although they knew it was almost always fine, they didn't want to take the risk of being forced to release proprietary source code

Commented Jul 4 at 0:49

Don't use CMD; use AutoIt. Very easy to do and takes a few lines

Run("file.pdf") winwait("Adobe") send(?);; whatever commands necessary to save as text send("") send("!") 
7,491 6 6 gold badges 57 57 silver badges 89 89 bronze badges answered Jun 25, 2012 at 8:34 39 1 1 bronze badge

I think the below VBscript should do the trick. It will take all .pdf files in a given folder location and save them as .txt files. One major bummer is it only works if your machine is not locked since it uses the SendKeys command. If anyone has a solution that works while a computer is locked, please send it my way!

Set objFSO = CreateObject("Scripting.FileSystemObject") objStartFolder = "PATH_OF_ALL_PDFS_YOU_WANT_TO_CONVERT_HERE" Set objFolder = objFSO.GetFolder(objStartFolder) Set colFiles = objFolder.Files For Each objFile In colFiles extension = Mid(objFile.Name, Len(objFile.Name) - 3, 4) file = Mid(objFile.Name, 1, Len(objFile.Name) - 4) fullname = objFSO.BuildPath(objStartFolder, objFile.Name) fullname_txt = objFSO.BuildPath(objStartFolder, file + ".txt") Set objFSO = CreateObject("Scripting.FileSystemObject") If extension = ".pdf" And Not objFSO.FileExists(fullname_txt) Then WScript.Echo fullname Set WshShell = WScript.CreateObject("WScript.Shell") WshShell.Run """" + fullname + """" WScript.Sleep 1000 WshShell.SendKeys "%" WScript.Sleep 100 WshShell.SendKeys "f" WScript.Sleep 100 WshShell.SendKeys "h" WScript.Sleep 100 WshShell.SendKeys "x" WScript.Sleep 300 WshShell.SendKeys "" count = 0 'this little step prevents the loop from moving on to the next .pdf before the conversion to .txt is complete Do While i = 0 And count < 100 On Error Resume Next Set fso = CreateObject("Scripting.FileSystemObject") Set MyFile = fso.OpenTextFile(fullname_txt, 8) If Err.Number = 0 Then i = 1 End If count = count + 1 WScript.Sleep 20000 Loop End If Next