Spreadsheets: Moving Data from PDF to Excel

Converting a PDF "data dump" into Excel or Access is made easier with a third-party tool.


Editor’s Note:

To read more about spreadsheets, share a tip with other readers, or suggest a topic for Bill Jelen to cover in an upcoming column, click here to visit CFO.com’s Spreadsheet Tips page.

Reader Ray S. wins a copy of Excel for the CEO for this week’s question: “I am presented with a download of information in a PDF or Adobe format. This ‘data dump’ needs to be converted into either Excel or Access so as to be analyzed more easily. Can you provide best practices to accomplish this?”

Excel 2007 and Excel 2010 both support sending your Excel files to PDF. A constant frustration is Excel’s inability to later get that data back into Excel. Here is a simple demonstration of the problem:

Figure 1 shows a simple table in Excel. In Excel 2010, use File, Save & Send, Create PDF/XPS Document as shown in Figure 2. The table will be accurately rendered in the PDF document as shown in Figure 3.

 

Fig. 1

MrExcel12Fig1

 

Fig. 2

MrExcel12Fig2

 

Fig. 3

MrExcel12Fig3

 

Select the text in the PDF document by using Edit, Select All, or by dragging the mouse. Next, copy from the PDF document using Edit, Copy, or Ctrl+C. Then switch to Excel. Paste the copied PDF data to Excel and you will find that the original table is now unwound into a relatively useless single column as shown in Figure 4 (below, left). How can it be that the Microsoft Excel team cannot
round-trip a simple table from Excel to an Excel-created PDF and back to Excel? Is this simply the Microsoft Excel team being obstinate?

Fig. 4

MrExcel12Fig4

Fails with PDF; Works with XPS

Notice that the Create PDF/XPS command actually offers to create either the market-dominant PDF format or the upstart XPS format. XPS is the new format designed by Microsoft to compete with the PDF format. An unscientific search of Google indicates that PDF has a 98.5% market share compared with 1.5% for XPS. The Excel team smartly makes “PDF” the default choice. However, if you would instead save the file as XPS, copy the table from XPS, and paste back to Excel, you will see that the table retains its original shape, font color, and numeric formatting. Could it be that the Excel team is trying to boost the popularity of XPS instead of PDF?

One thought on “Spreadsheets: Moving Data from PDF to Excel

  1. I want to copy a pdf file to excel, but it only copies the first page even though I select all, highlight all pages.
    Please assist –
    ps I Don’t want to convert the file to excel

Discuss

Your email address will not be published. Required fields are marked *