One solution, then, is to start asking for your data in XPS instead of PDF. The downside is that 98.5% of your recipients won’t have a clue how to produce XPS files.
The Microsoft Word Team Can Get It Right
If this truly is a case of Microsoft purposely doing a bad job dealing with Adobe PDF, the animosity is apparently not spread throughout the entire Microsoft Office team. If you copy the data from PDF and paste it to Word, the table appears in a row-and-column format, although the final zero in the grand total wraps in that cell of the table, as shown in Figure 5 (below).
A seemingly ridiculous solution, then, is to copy the PDF, paste to Word, copy from Word, and paste to Excel. The final zero goes back into the grand total cell. The result shown in Figure 6 (below) is passable.
Why Can’t Excel Get This Right?
For answers, I turned to Keith Bradbury of InvestInTech.com. Keith’s company is the inventor of Able2Extract, a program that can take PDF files and turn them back into Excel workbooks, Access tables, Word documents, and more. InvestInTech is the expert in the field of getting data from PDF to Excel or Access; I’ve been using its software to painlessly extract Excel data from PDF files since the early 2000s.
Keith’s team looked at my examples. They pointed out that each Copy (or Ctrl+C) in any program sends several formats of data to the Windows Clipboard such that the receiving program can choose which to use. For example, copy some data from WordPad, and the application sends six versions of that data to the clipboard (text, RTF, UTF8, and so on).
When you copy from Adobe, the program sends two versions of the data, one in plain text and one in rich text format. Keith’s team can only guess, but apparently, Microsoft Word is reading the rich text format from the clipboard and Excel is reading the plain text, with less-than-stellar results.
Use a Third-Party Tool
Let’s be honest. I think a lot of people convert their Excel files to PDF because they don’t want the recipient to be able to reuse the data in Excel. Send me a 450 page document in PDF and they figure I won’t have the time or inclination to take each page from PDF to Word and back to Excel. I actually had this exact problem once and converted the whole document back to an Excel database using Able2Extract. You can test out Able2Extract 7 with its free seven-day trial.
The software is simple to use. Open the PDF file with Able2Extract. The first step is to identify which area(s) you want to import. If the document contains titles or page numbers in the header, you could choose to crop the page to ignore those sections of the document. Able2Extract offers to convert the data to Excel, Word, PowerPoint, Publisher, HTML, or AutoCad, as shown in Figure 7.
The result is shown in Figure 8. Able2Extract preserved the original font colors and numeric formatting. The process was not perfect; the font size changed from 11 point to 9 point and the right-justified headings moved back to the left. However, once I had the data in a table in Excel, it is easy enough to fix those issues.
Able2Extract also offers a pro version that will use OCR to recognize characters in PDF documents that have been created using a scanner. Download a free trial from http://www.investintech.com/.
It really should not be this hard to get data out of a PDF file and back into Excel. If you have a one-page table, the PDF-to-Word-to-Excel solution will work suitably well. If you have a several-page document with many different tables or repeating headers, then going to a third-party solution such as Able2Extract makes sense.
Bill Jelen, a CFO contributing editor, runs MrExcel.com and is the author of 32 books about Microsoft Excel, including Charts & Graphs for Excel 2010. Post questions for future columns in the “Suggest a Topic” section of the Spreadsheet Community Center on the right.