Your InputStream was neither an OLE2 stream, nor an OOXML stream

20,813

Solution 1

I don't know POI internal implementation, but my guess would be that they need a seekable stream. The streams returned by servlets (and networking in general) aren't seekable.

Try reading the whole contents and then wrapping it in ByteArrayInputStream:

byte[] bytes = getBytes(item.openStream());
InputStream stream = new ByteArrayInputStream(bytes);

public static byte[] getBytes(InputStream is) throws IOException {
    ByteArrayOutputStream buffer = new ByteArrayOutputStream();

    int len;
    byte[] data = new byte[100000];
    while ((len = is.read(data, 0, data.length)) != -1) {
    buffer.write(data, 0, len);
    }

    buffer.flush();
    return buffer.toByteArray();
}

Solution 2

The issue is solved ..

    while (iterator.hasNext()) {  //Apache commons file upload code
      FileItemStream item = iterator.next();
      InputStream stream = item.openStream();
      ByteArrayInputStream bs=new ByteArrayInputStream(IOUtils.toByteArray(stream));
      POITextExtractor extractor = ExtractorFactory.createExtractor(bs); 
      System.out.println(extractor.getText());
    }
Share:
20,813
user1493834
Author by

user1493834

Updated on August 08, 2022

Comments

  • user1493834
    user1493834 over 1 year

    I am using Apache Commons to upload a .docx file in google app engine as explained in this link File upload servlet. While uploading, I also want to extract text by using Apache POI libraries.

    If I pass this to the POI API:

     InputStream stream = item.openStream();
    

    I get the below exception:

    java.lang.IllegalArgumentException: Your InputStream was neither an OLE2 stream, nor an OOXML stream
    
    public static String docx2text(InputStream is) throws Exception {
        return ExtractorFactory.createExtractor(is).getText();
    }
    

    I am uploading a valid .docx document. The POI API works fine if I pass a FileInputStream object.

    FileInputStream fs=new FileInputStream(new File("C:\\docs\\mydoc.docx"));
    
  • user1493834
    user1493834 about 10 years
    Yes.I cannot use FileUpload Servlet's stream for apache poi.One can create temp file out of this stream and associate fileInputstream and pass to apache poi but GAE does not allow to write to temp file. Lot of restrictions !!!
  • Peter Knego
    Peter Knego almost 10 years
    So you wrapped it in ByteArrayInputStream like I suggested and then accepted your own answer? Good for you!