What is the best way to convert HTML into Excel

15,953

Solution 1

The HTML table is relatively easy. You can download the page, parse the HTML (there are various HTML parsing libraries available), extract the table and convert it into CSV (which Excel can load), or directly create an Excel file, e.g. using Java POI, as suggested above.

The Flash part is significantly harder. There are quite a few tools available to capture flash to an image, you'd need to use one of them. This can be tricky, as Flash might be interactive, so you'd possibly have to remote-control the Flash part so it shows the right image before capturing. Hard to tell without more info.

That said, screen-scraping (which is what you're doing) is always labour-intensive and fragile. You should really push for a better interface to get your data from, it will save loads of hassle in the long run.

Solution 2

Just set the content type of the page to "application/vnd.ms-excel". If the html page is just a table it will open with excel and look perfect. You can even add background colors and font styles.

Try some of these content types

application/excel
application/vnd.ms-excel
application/x-excel
application/x-msexcel

Solution 3

Excel can convert HTML tables by default. The easiest way to force it to do this is to save the HTML file with an XLS extension. Excel will then open the XLS as if it were its native workbook.

Solution 4

I cannot offer any advice on the Flash part, but I have done HTML table to Excel many times. Yes, Excel can open HTML tables but most HTML tables out there have extraneous crap in them that can make it fragile to consistently parse the tables.

CPAN module HTML::TableExtract is a wonderful module that allows you to focus on the non-presentation specific aspects of the table you are trying to extract. Just specify the column headings you are interested in and maybe specify the title or class of the table and you are mostly set. You might have to post process the rows returned a little, but that is considerably easier than dealing with the underlying tag soup in all its glory.

Further, for output to Excel format, stick with Spreadsheet::WriteExcel rather than the OLE interface. That way, you do not depend on having Excel installed for your program to work and things go a little faster.

Make sure you specify the data type of cells if you do not want content to be changed automatically by Excel upon opening the files (another reason I do not like sending around CSV files). Use a configuration file for formatting information so that you can change how the spreadsheet looks without having to change the program.

You can always use Excel's built-in charting facilities to replace the web site graphs.

This combination has enabled me to generate pretty good looking documents comprising several hundreds of megabytes of scraped data (with logos and image links etc) using just a few hundred lines of Perl and a couple of days' work.

Solution 5

There's a very good Java POI api that would let you do that, but it's Java. http://poi.apache.org/

If you're on Win32 you can also use Excel's COM api, there are quite a few tutorials on the net.

Share:
15,953
Admin
Author by

Admin

Updated on June 04, 2022

Comments

  • Admin
    Admin almost 2 years

    I have an HTML page which has a flash chart(FusionCharts) and HTML table. I need to convert this whole thing into Excel. HTML table should be displayed in cells of excel sheet. Flash chart can be displayed as an image.

    Is there any open source API that we could use for achieving this. Could you let me know what are the possible options.

    Can this be done by using javascript alone.

  • darkAsPitch
    darkAsPitch almost 15 years
    wow. I didn't know that. But Excel often sucks for CSV in regions that use ";" as the list separator as opposed to ","...
  • Oskar
    Oskar almost 15 years
    I agree and the list of things that Excel sucks at could go on for a while :) I mainly used this to present reports coming from a server side script so I had the ability to ensure the data format wouldn't hurt Excel.
  • Wouter van Nifterick
    Wouter van Nifterick almost 15 years
    +1 for mentioning regions with ";" separator. In the Netherlands the decimal separator is a comma. When floats have a dot, excel interprets the number as a string, unless I change the regional settings in windows. Generating CSV on the server and expecting it to load in everyone's excel is guaranteed to fail.