Why doesn't more Java code use PipedInputStream / PipedOutputStream?

48,464

Solution 1

From the Javadocs:

Typically, data is read from a PipedInputStream object by one thread and data is written to the corresponding PipedOutputStream by some other thread. Attempting to use both objects from a single thread is not recommended, as it may deadlock the thread.

This may partially explain why it is not more commonly used.

I'd assume another reason is that many developers do not understand its purpose / benefit.

Solution 2

In your example you're creating two threads to do the work that could be done by one. And introducing I/O delays into the mix.

Do you have a better example? Or did I just answer your question.


To pull some of the comments (at least my view of them) into the main response:

  • Concurrency introduces complexity into an application. Instead of dealing with a single linear flow of data, you now have to be concerned about sequencing of independent data flows. In some cases, the added complexity may be justified, particularly if you can leverage multiple cores/CPUs to do CPU-intensive work.
  • If you are in a situation where you can benefit from concurrent operations, there's usually a better way to coordinate the flow of data between threads. For example, passing objects between threads using a concurrent queue, rather than wrapping the piped streams in object streams.
  • Where a piped stream may be a good solution is when you have multiple threads performing text processing, a la a Unix pipeline (eg: grep | sort).

In the specific example, the piped stream allows use of an existing RequestEntity implementation class provided by HttpClient. I believe that a better solution is to create a new implementation class, as below, because the example is ultimately a sequential operation that cannot benefit from the complexity and overhead of a concurrent implementation. While I show the RequestEntity as an anonymous class, reusability would indicate that it should be a first-class class.

post.setRequestEntity(new RequestEntity()
{
    public long getContentLength()
    {
        return 0-1;
    }

    public String getContentType()
    {
        return "text/xml";
    }

    public boolean isRepeatable()
    {
        return false;
    }

    public void writeRequest(OutputStream out) throws IOException
    {
        output.setByteStream(out);
        serializer.write(doc, output);
    }
});

Solution 3

I too only discovered the PipedInputStream/PipedOutputStream classes recently.

I am developing an Eclipse plug-in that needs to execute commands on a remote server via SSH. I am using JSch and the Channel API reads from an input stream and writes to an output stream. But I need to feed commands through the input stream and read the responses from an output stream. Thats where PipedInput/OutputStream comes in.

import java.io.PipedInputStream;
import java.io.PipedOutputStream;

import com.jcraft.jsch.Channel;

Channel channel;
PipedInputStream channelInputStream = new PipedInputStream();
PipedOutputStream channelOutputStream = new PipedOutputStream();

channel.setInputStream(new PipedInputStream(this.channelOutputStream));
channel.setOutputStream(new PipedOutputStream(this.channelInputStream));
channel.connect();

// Write to channelInputStream
// Read from channelInputStream

channel.disconnect();

Solution 4

Also, back to the original example: no, it does not exactly minimize memory usage either. DOM tree(s) get built, in-memory buffering done -- while that is better than full byte array replicas, it's not that much better. But buffering in this case will be slower; and an extra thread is also created -- you can not use PipedInput/OutputStream pair from within a single thread.

Sometimes PipedXxxStreams are useful, but the reason they are not used more is because quite often they are not the right solution. They are ok for inter-thread communication, and that's where I have used them for what that's worth. It's just that there aren't that many use cases for this, given how SOA pushes most such boundaries to be between services, instead of between threads.

Solution 5

Here's a use case where pipes make sense:

Suppose you have a third party lib, such as an xslt mapper or crypto lib that has an interface like this: doSomething(inputStream, outputStream). And you do not want to buffer the result before sending over the wire. Apache and other clients disallow direct access to the wire outputstream. Closest you can get is obtaining the outputstream - at an offset, after headers are written - in a request entity object. But since this is under the hood, it's still not enough to pass an inputstream and outputstream to the third party lib. Pipes are a good solution to this problem.

Incidentally, I wrote an inversion of Apache's HTTP Client API [PipedApacheClientOutputStream] which provides an OutputStream interface for HTTP POST using Apache Commons HTTP Client 4.3.4. This is an example where Piped Streams might make sense.

Share:
48,464
Steven Huwig
Author by

Steven Huwig

Currently a software developer at Root Insurance Company.

Updated on July 08, 2022

Comments

  • Steven Huwig
    Steven Huwig almost 2 years

    I've discovered this idiom recently, and I am wondering if there is something I am missing. I've never seen it used. Nearly all Java code I've worked with in the wild favors slurping data into a string or buffer, rather than something like this example (using HttpClient and XML APIs for example):

        final LSOutput output; // XML stuff initialized elsewhere
        final LSSerializer serializer;
        final Document doc;
        // ...
        PostMethod post; // HttpClient post request
        final PipedOutputStream source = new PipedOutputStream();
        PipedInputStream sink = new PipedInputStream(source);
        // ...
        executor.execute(new Runnable() {
                public void run() {
                    output.setByteStream(source);
                    serializer.write(doc, output);
                    try {
                        source.close();
                    } catch (IOException e) {
                        throw new RuntimeException(e);
                    }
                }});
    
        post.setRequestEntity(new InputStreamRequestEntity(sink));
        int status = httpClient.executeMethod(post);
    

    That code uses a Unix-piping style technique to prevent multiple copies of the XML data being kept in memory. It uses the HTTP Post output stream and the DOM Load/Save API to serialize an XML Document as the content of the HTTP request. As far as I can tell it minimizes the use of memory with very little extra code (just the few lines for Runnable, PipedInputStream, and PipedOutputStream).

    So, what's wrong with this idiom? If there's nothing wrong with this idiom, why haven't I seen it?

    EDIT: to clarify, PipedInputStream and PipedOutputStream replace the boilerplate buffer-by-buffer copy that shows up everywhere, and they also allow you to process incoming data concurrently with writing out the processed data. They don't use OS pipes.

  • Steven Huwig
    Steven Huwig over 15 years
    That's an example I have handy. What IO delays are being introduced? PipedInputStreams and PipedOutputStreams are memory buffers.
  • kdgregory
    kdgregory over 15 years
    They may be memory buffers, but they use the underlying pipe implementation, which is a kernel I/O operation.
  • Steven Huwig
    Steven Huwig over 15 years
    Not according to the source they don't.
  • kdgregory
    kdgregory over 15 years
    As for your example: I haven't used HttpClient, but I would expect an alternate method to get access to the request body as an OutputStream. Perhaps not, although are you sure that PostMethod doesn't buffer its content in memory (in which case you don't gain anything)
  • Steven Huwig
    Steven Huwig over 15 years
    PostMethod can buffer or not, depending on whether the method has been configured to chunk the enclosed entity. By default it chunks when the content length is not set. It'd be more helpful if you assumed I had already read the APIs and source in question when you answer.
  • kdgregory
    kdgregory over 15 years
    Re Java piped streams: I learned something, and am somewhat disappointed. I always assumed that those classes used the pipe(2) syscall.
  • kdgregory
    kdgregory over 15 years
    Sorry to offend, but you asked the question "why isn't this more common," not "in this particular case, is there a reason this technique isn't used." And in the general case, you're creating a second thread to handle a sequential operation.
  • kdgregory
    kdgregory over 15 years
    Actually, in this case it is a sequential operation. One thread is writing the XML to a stream, the other thread is writing a stream to a stream. In this specific case, there's one unnecessary stream. That's not to say that the technique is not useful in some cases (to be contd)
  • kdgregory
    kdgregory over 15 years
    The case in which piping from one thread to another is useful is when there is significant text-level processing that will happen in each stage (ie, something similar to a Unix pipeline). In that case, you can (1) logically partition the operations, and (2) benefit from multi-core architectures.
  • Steven Huwig
    Steven Huwig over 15 years
    This "unnecessary stream" is for using the HttpClient API, which requires an InputStream for request entities.
  • Steven Huwig
    Steven Huwig over 15 years
    There is significant text-level processing that will happen in the InputStreamRequestEntity -- namely chunking.
  • Cristopher Van Paul
    Cristopher Van Paul over 15 years
    Why don't you do "text-level processing" in a FilterWriter or a FilterOutputStream ?
  • Steven Huwig
    Steven Huwig over 15 years
    That code is part of the HttpClient API, which requires an InputStream.
  • Steven Huwig
    Steven Huwig over 15 years
    @kdgregory: your code appears to be an unnecessary class. Why is an unnecessary class preferable to concurrency?
  • John Gardner
    John Gardner over 15 years
    Sadly concurrency is overused where it isn't needed, and underused where it was needed... oops! :)
  • matt b
    matt b over 15 years
    @iny, I'd argue that most developers aren't writing concurrent code. Maybe it's running in a concurrent environment, but I think that it is a minority of developers who deal every day with multithreading (and this is probably a good thing)
  • kdgregory
    kdgregory over 15 years
    1 - because concurrency increases complexity, and 2 - because it is a piece of reusable functionality (one that should probably get back into HttpClient)
  • Steven Huwig
    Steven Huwig over 15 years
    I found that both ends need to be closed.
  • Vishy
    Vishy over 15 years
    What do you see as an "unbounded memory consumption" I have been developing networking solutions for trading system for six years and I have never come across this problem.
  • Steven Huwig
    Steven Huwig almost 15 years
    Do your trading systems handle single messages with gigs of payload without running out of space? If so then they have bounded memory consumption; otherwise they have unbounded memory consumption. (Not that I would expect trading systems to do anything but reject messages over a certain size, but believe it or not that's not the case in every domain.)
  • Vishy
    Vishy almost 15 years
    It is true that trading messages are typically small as latency is important. They can add up fairly quickly and we end up with 10s of gigs of data in memory. However, I am not sure how this is relevant. The solution posted will not help you deal with very large messages as far as I can see, in fact instead of having one copy of the message passed around you will end up with two copies (as the writer cannot complete serialization of a large message and discard the original until the reader has almost read/rebuilt the copy)
  • Steven Huwig
    Steven Huwig over 14 years
    The reader can be streaming to the server as far as I can tell, with content-encoding: chunked. There doesn't need to be a second copy constructed (in this process, anyway).
  • Raedwald
    Raedwald about 12 years
    "many developers do not understand it's purpose / benefit" probably those developers who have not previously used Unix,and therefore have not been exposed to the usefulness of the pips-and-filters design pattern.
  • Dean Hiller
    Dean Hiller about 12 years
    javadoc said piped streams could get deadlock on one thread???? (which sucks as I want to use something exactly like this with no extra thread).....does that actually work or do you get deadlock?
  • Chris Mountford
    Chris Mountford over 9 years
    @JohnGardner agreed. All Java/JVM developers should read Java Concurrency In Practice (the bullet train book). It is the best written book on this critical topic (Doug Lea would agree) and it explains many of the problems due to underspecified concurrency attributes in most java code - including the JDK. It helps you solve these problems and also understand what to decide on and declare in your own APIs.
  • Robert Christian
    Robert Christian about 8 years
    @stevenhuwig - re "Do your trading systems handle single messages with gigs of payload without running out of space?" I think this is a non issue as long as the client code can get a handle on an inputstream that is outside of memory eg FileInputStream.
  • Robert Christian
    Robert Christian about 8 years
    Here's a use case. Suppose you have a third party lib, such as an xslt mapper or crypto lib that has an interface like this: doSomething(inputStream, outputStream). And you do not want to buffer the result before sending over the wire. Apache and other clients disallow direct access to the wire outputstream. Closest you can get is obtaining the outputstream - at an offset, after headers are written - in a request entity object. But since this is under the hood, it's still not enough to pass an inputstream and outputstream to the third party lib. Pipes are a good solution to this problem.
  • Vishy
    Vishy about 8 years
    @RobertChristian good point, if you have to use an API which is not really fit for purpose, you need to use the API you have. +1
  • Paul Draper
    Paul Draper over 7 years
    To add to this, a thread requires additional memory and introduces context switching. It's only worth it when you get real benefit from streaming. Handling 1kb of data this way would be a step backwards.
  • Archie
    Archie about 6 years
    I wrote a properly behaving replacement because I couldn't deal with the stupidity of the JDK version. See github.com/archiecobbs/dellroad-stuff/blob/master/…
  • Baum mit Augen
    Baum mit Augen over 3 years
    Comments are not for extended discussion; this conversation has been moved to chat.