June 3, 1999

Question

David Vriend noted a problem in Example 3-3 of Java I/O, StreamCopier, as well as several similar examples from that book. The copy() method attempts to synchronize on the input and output streams to "not allow other threads to read from the input or write to the output while copying is taking place". Here's the relevant method:

  public static void copy(InputStream in, OutputStream out) 
   throws IOException {

    // do not allow other threads to read from the
    // input or write to the output while copying is
    // taking place
    
    synchronized (in) {
      synchronized (out) {

        byte[] buffer = new byte[256];
        while (true) {
          int bytesRead = in.read(buffer);
          if (bytesRead == -1) break;
          out.write(buffer, 0, bytesRead);
        }
      }
    }
  }
However, this only helps if the other threads using those streams are also kind enough to synchronize them. In the general case, that seems unlikely. The question is this: is there any way to guarantee thread safety in a method like this when:

  1. You're trying to write a library routine to be used by many different programmers in their own programs so you can't count on the rest of the program outside this utility class being written in a thread safe fashion.

  2. You have not written the underlying classes that need to be thread safe (InputStream and OutputStream in this example) so you can't add synchronization directly to them.

  3. Wrapping the unsynchronized classes in a synchronized class is insufficient because the underlying unsynchronized class may still be exposed to other classes and threads.

Note that although the specific instance of this question deals with streams, the actual question is really more about threading. Since anyone answering this question probably already has a copy of Java I/O, I'll send out a free copy of XML: Extensible Markup Language for the best answer.

Answer

There were several thoughful answers to this question, but ultimately the answer is no. This does seem to be a design flaw in Java. There is simply no way to guarantee exclusive access to an object your own code did not create. Apparently other languages do exist that do solve this problem. In Java, however, the only thing you can do is fully document the intended use of the class and hope programmers read and understand the documentation. What's especially pernicious is that the behavior in a multithreaded environment of most of the Sun supplied classes in the java packages is undocumented. This has gotten a little better in JDK 1.2, but not enough.

Several people argued that the question was silly; that it was ridiculous to attempt to guarantee behavior in the face of unknown objects produced by unknown programmers. The argument essentially went that classes only exist as part of a system and that you can only guarantee thread safety by considering the entire system. The problem with this attitude is that it runs completely counter to the alleged benefits of data encapsulation and code reuse that object oriented programming is supposed to provide. If you can only write safe code by writing every line of code in the system, then we might as well go back to C and Pascal and forget all the lessons learned in the last twenty years.

More pointedly, any solution that requires complete knowledge of all code in the entire program is extremely difficult to maintain in a multi-programmer environment. It may be possible with small teams. It's completely infeasible for large teams. And it's absolutely impossible for anyone trying to write a class library to be used by many different programmers on many different projects.

All the answers were quite well though out this time around. You'll find them on the complete question page. I think the best answer came from Michael Brundage who gets a copy of XML: Extensible Markup Language as a prize just as soon as he sends me his snail mail address. For the rest of you, I'll have a new question for you soon which delves into some undocumented behavior in the InputStream class.

Michael Brundage

Subject: ThreadSafeStreams
Date: Mon, 7 Jun 1999 14:41:22 -0700
From: Michael Brundage brundage@ipac.caltech.edu

Hi Elliotte,

Saw your Cafe au Lait question regarding thread-safety and streams, and thought I would toss in a few cents:

The problem actually has nothing to do with streams or I/O at all. Rather, it's a fundamental problem with the thread/synchronization model chosen for Java. Monitors (as opposed to, say, concurrent sequential programming) just don't give the developer an opportunity to determine thread safety within the context of a single method. The developer is forced to look at the global picture to determine whether deadlocks or other threading problems might occur.

I once had a really informative conversation with Frode Odegard about this topic. I've attached a post he made to the LA Java Users' Group mailing list (including some useful references) to the end of this message for your reading pleasure.

A classic example is the thread-safe library class

public class Safe {
protected int x, y;
public synchronized void setXY(int x, int y) {
    this.x = x;
    this.y = y;
}
}

You see something like this in almost every library in existence: The class is meant to be extended, so the members are made protected. They can only be accessed through thread-safe methods, so everything's okay, right?

Wrong -- there is no way to prevent a developer from coming along and writing:

public class Scary extends Safe {
public void mySetXY(int x, int y) {
    this.x = x;
    this.y = y;
}
}

Now, calls to setXY() and mySetXY() can be interleaved, resulting is unexpected results. There is no way, as long as there are protected/public fields or protected/public unsynchronized methods, to prevent developers from coming along and shooting themselves in the foot with a subclass that bypasses synchronization.

(Btw, you can simplify this example by using just one member of type double; however, I've heard that there are plans to make assignment to doubles atomic in the future, and in any case no JVM I've seen treats it as nonatomic, .)

One can avoid this particular case by never declaring anything that could result in a thread-safety problem in subclasses or elswhere protected or public. The example above could be repaired using:

public class Safe {
private int x,y;
public synchronized void setXY(int x, int y) {
    this.x = x;
    this.y = y;
}
}

Now any subclass that wants to change the value of x or y is required to go through the setXY() method on Safe. As long as every field is always private and every public and protected method is synchronized, your object can be threadsafe (which says nothing about liveness issues like deadlock, unfortunately). You could even imagine an automated tool to check your source code for this kind of potential threading problem.

There are some other areas where subclassing is problematic with threading, such as mixing static and instance methods (which synchronize differently), and deadlock is particularly thorny with subclasses. Doug Lea gives several examples in his book, Concurrent Programming in Java.

The example I gave above demonstrates how to settle the subclassing problem when the lock is required on the instance -- just always require subclasses to go through your (synchronized) methods on the superclass to do any modifications to fields in the superclass. However, the example you gave on Cafe au Lait is trickier, because the library class is not one you wrote yourself and two locks need to be acquired instead of one. Even for this trickier problem, the same idea of containment applies: You want to prevent other methods from modifying the objects you're using. The only way to guarantee that other developers don't bypass your synchronization is to prevent all other kinds of unsynchronized access.

Because the standard Java I/O classes are inherently un-thread-safe, this means you cannot allow external code to get ahold of the original stream objects, only thread-safe wrapped versions of the original stream. In particular, any stream class you write with the intention of making it thread-safe cannot have a constructor taking a generic stream (InputStream, OutputStream) since the calling code could then modify the stream it passed in the ctor in an un-thread-safe way. This restriction alone makes it almost impossible to guarantee thread-safety when working with the pre-existing java.io stream classes, since they are designed to be attached to other streams through the ctor.

Although this demonstrates that it is not possible to accept a (potentially unsafe) stream from the user's code, it is still possible to create a safe bridge by putting a stream factory into the library, wrapping the stream in an opaque way for its travels through the user's code, and then extracting the original stream again in the library code. By preventing the developer from ever getting at the underlying (unsafe) instance, you can guarantee that the developer's code never interferes with your library's synchronization.

If your library resides in a package that you control, say my.library, then you could do the following:

package my.library;

import java.io.*;

public interface SafeStream {
}

final class MySafeInputStream implements SafeStream {   // package-local
private InputStream stream;
MySafeInputStream(InputStream s) {
     stream = s;
}
InputStream getInputStream() {
     return stream;
}
}

final class MySafeOutputStream implements SafeStream {  // package-local
private OutputStream stream;
MySafeOutputStream(OutputStream s) {
     stream = s;
}
OutputStream getOutputStream() {
     return stream;
}
}

/*
this implementation is just a sketch, to give the idea; it won't compile 
as-is
because (among other things) it's missing error-handling.  You'd also
rewrite parts of it for better efficiency, etc.

The main point of this class is that (together with the opaque handlers
MySafeInputStream and MySafeOutputStream) it prevents the underlying
InputStream or OutputStream instances from "leaking" to external
(untrusted) code.
*/
public final class StreamFactory {
     public static SafeStream createInputStream(Class type) {
          return new MySafeInputStream(type.forInstance());
     }
     public static SafeStream createInputStream(Class type, SafeStream 
inner) {
          Constructor ctor = type.getConstructor(new Class[] { 
InputStream.class });
          return new MySafeInputStream(ctor.newInstance(new Object[] { 
inner }));
     }
     public static SafeStream createOutputStream(Class type) {
          return new MySafeOutputStream(type.forInstance());
     }
     public static SafeStream createOutputStream(Class type, SafeStream 
inner) {
          Constructor ctor = type.getConstructor(new Class[] { 
OutputStream.class });
          return new MySafeOutputStream(ctor.newInstance(new Object[] { 
inner }));
     }
     public SafeStream createFileInputStream(String file) {
          return new MyInputStream(new File(file).getInputStream());
     }
}

public final class SafeStreamUtils {
     public static copy(SafeStream safeIn, SafeStream safeTo) throws 
IOException {
          InputStream in = ((MySafeInputStream)safeIn).getInputStream();
          OutputStream out = 
((MySafeOutputStream)safeIn).getOutputStream();
          // now copy(in, to), exactly as before
     }
}

You can always replace the one interface SafeStream with two, SafeInputStream and SafeOutputStream, and personally I would also add methods to these interfaces so that the user can work with them directly (but in a thread-safe way, mind you) instead of going through a global "Utils" object (which is so no-OO).

This factory/bridging mechanism works to solve the "guaranteed threadsafe" problem in general, not just for streams, although it also brings up issues of its own (mainly when trying to get two independent libraries to work well together). It can even be made to work better with custom stream subclasses by modifying the factory to accept arbitrary constructor method descriptors, although that opens you up to the possibility that the developer retains a reference to some underlying object (like another stream) and uses it to bypass your synchronization checks.

Michael Lawley

To: elharo@ibiblio.org
Subject: ThreadSafeStreams
Mime-Version: 1.0
Date: Wed, 09 Jun 1999 17:03:32 +1000
From: Michael Lawley lawley@dstc.edu.au

Hi Elliotte,

If Thread.suspend() were not deprecated, then one could even imagine a solution that involved attempting to suspend all other threads for the duration (not very pretty, and you'd have to contend with ThreadGroup security issues).

My only other suggestion is a real hack - grab the class files for the underlying implementation, hack them to change the class names, then wrap them with a new implementation using the original class names and arrange for the resulting classes to appear at the beginning of the classpath (or even the bootclasspath).

OTOH I wonder if this is really a problem in practice? Other than stdin, stdout and stderr, how often does one have multiple reader/ writer threads on a stream that one doesn't otherwise have control over either all the methods or the creation of the stream?

Irum Parvez

Date: Sat, 12 Jun 1999 18:23:08 -0400
From: Irum Parvez irum@sprint.ca
Organization: Irum Parvez CA
To: elharo@ibiblio.org
Subject: ThreadSafeStreams

Hi Rusty

One of the difficulties with streams is that you are often using a chain of them. Witch stream (of the chain) do you synchronize on? - the first or the last(one of the middle streams?). Let me define: last-The node stream that the filter streams use.(There is one of these in a chain) first-A filter stream that you are using middle-A filter stream that other filter streams often use(delegate to)

It would be very nice if Sun would add to the Stream interface (API) It would be very nice if java.io.InputStream (& OutputStream) would contain an abstract method called getNode(). The FilterStreams would delegate this to the Node Stream that would implement the method (returning itself - this). This would allow you to just synchronize on the node stream of (a chain)and eliminate the overhead and risk of multiple locks.You would to just synchronize on the object returned by getNode().

What do you think?

P.S. I think your question could be phrased a little better:

//------------------------------------------------------------------------------
3.Wrapping the unsynchronized classes in a synchronized class is insufficient 
because the underlying unsynchronized class may still be >exposed to other classes 
and threads. 

>'unsynchronized classes' - Java mutex locks synchronize on objects - not classes.

I prefer to use the term delicate data or delicate object rather than unsynchronized class.

There are few ways to protect delicate data from being corrupted.

A java primitive variable that is shared between threads can be protected by declaring it as volitile(so it not stored in a register).

A delicate object can only be protected (from coruption) by encapsulation and synchronization:

  1. You must make the member variable refering to the delecate data private.
  2. You must synchronize all critical sections - Only access the member variable in synchronized blocks (or methods).

NOTE:Protecting public delicate data is a waist of time.You must both synchronize and encasulate.

Michael Peter

Date: Thu, 03 Jun 1999 18:51:36 +0200
From: Michael Peter <Michael.Peter@stud.informatik.uni-erlangen.de>
X-Accept-Language: en
MIME-Version: 1.0
To: elharo@ibiblio.org
Subject: ThreadSafeStreams
Status:   

Hi!

Since you have no control over the classes you wish to synchronize, it is not possible to use them directly. In your example your StreamCopier class tries to use synchronized(in) and synchronized(out) to make the method thread safe. This is of no use, since the classes themselves are not synchronized and synchronization only takes places within synchronized blocks. Even if they were synchronized, your method could possibly create a problem if the classes use another object for synchronization like in the following example:

public class foo {
	private Object lock = new Object();
	private int someVariable;

	public int someSynchronizedMethod() {
		synchronized( lock ) {
		// lock is used instead of this to synchronize
			return someVariable;
		}
	}
}

Since you cannot access the lock, you cannot synchronize from the outside.

There is however a way to provide synchronization by using a wrapper. You write that wrapping the class is insufficient, but it is only if you don't create the class in your wrapper, but accept a previously created object. As for StreamCopier the following works:

public class SynchronizedInputStream extends InputStream {
	
	private InputStream in;
	
	public SynchronizedInputStream() {
		throw new IllegalArgumentException( "Use createXXXInputStream to
create a synchronized input stream" );
	}

	private SynchronizedInputStream( InputStream in ) {
		this.in = in;
	}
		
	public InputStream createFileInputStream( File f ) {
		return new SynchronizedInputStream( new FileInputStream( f ) );
	}

	public InputStream createByteArrayInputStream( byte[] buf ) {
		return new SynchronizedInputStream( new ByteArrayInputStream( buf ) );
	}
	
	/* ... You have to write a method for every InputStream type you want
to use */

	/* Wrap all InputStream methods */
	
	public int available() {
		synchronized( this ) {
			return in.available();
		}
	}

	public boolean markSupported() {
		synchronized( this ) {
			return in.markSupported();
		}
	}

	/* Do this for all remaining methods */

}

This method might not be very efficient if multiple InputStreams are chained, but I think there is some room for optimisation, like checking if a class is already a instance of SynchronizedInputStream and then synchronizing only once. This is left as an exercise to the reader. (I always wanted to say this!)

The copy method from StreamCopier would then look like this:

  public static void copy(SynchronizedInputStream in, SynchronizedOutputStream out)     
 throws IOException {

                           // do not allow other threads to read from the
                           // input or write to the output while copying is
                           // taking place
                           
                           synchronized (in) {
                             synchronized (out) {

                               byte[] buffer = new byte[256];
                               while (true) {
                                 int bytesRead = in.read(buffer);
                                 if (bytesRead == -1) break;
                                 out.write(buffer, 0, bytesRead);
                               }
                             }
                           }
                         }

Greg Guerin

Date: Thu, 3 Jun 1999 16:41:04 -0700
To: elharo@ibiblio.org
From: Greg Guerin <glguerin@amug.org>
Subject: ThreadSafeStreams

Hi,

Good Q o'Week...

I think that parts of the question as stated are misleading (as in "leading to incorrect, unsupported, or erroneous conclusions or constraints":

>However, this only helps if the other threads using those streams >are also kind enough to synchronize them. In the general case, that >seems unlikely.

It better not be -- those other threads don't magically pop themselves into existence executing whatever code they feel like. Those other threads are spawned by *MY* code, and presumably execute the run() method *I* designate for them. Kindness is not the question: "The question is which is to be master -- that's all".

>1.You're trying to write a library routine to be used by many different >programmers in their own programs so you can't count on the rest of the >program outside this utility class being written in a thread safe fashion.

Fine, that's what synchronized methods are for. If you have encapsulate an entire object so only one thread can use it over a longer period of time than a single method-call, that's what wait/notify are for. See more under 3 below.

>2.You have not written the underlying classes that need to be thread safe >(InputStream and OutputStream in this example) so you can't add >synchronization directly to them.

True, but that doesn't mean I can't sub-class them and only give a SynchronizedInputStream and/or a SynchronizedOutputStream to the threads that need to coordinate their I/O. Either I write the threads to do that themselves, or I pass args to them that guarantee the needed coordination. A Thread is a thread of execution, not a monolithic self-contained self-determined omnipotent blob of code with only a few puppet-strings emerging from it. An ActiveX control or a Java Applet may fit that mold, but not a Thread.

>3.Wrapping the unsynchronized classes in a synchronized class is insufficient >because the underlying unsynchronized class may still be exposed to other >classes and threads.

So what if the *CLASS* is exposed? The threads are either:

  1. Only using the synchronized sub-class instances I'm giving to them, or
  2. Only creating synchronized sub-class instances according to code I wrote.

If I don't have that level of control over my threads, then I have a bigger problem than merely coordinating some I/O streams.

This situation is no different than classes that take InputStream arguments, but happily work with FileInputStream, PipeInputStream, FilterInputStream, StrongCryptoInputStream, or MyBeamedInFromMarsInputStream. Indeed, a sub-class of FilterInputStream and FilterOutputStream with synchronization added to its methods should handily solve the problem, as I understand it.

If you need thread-safe access to an entire object over multiple method-calls, then a centralized arbitrator operating with wait/notify will be needed. You call the arbitrator to get the object, you use it until you don't need it any more, then you return the object to the arbitrator so it can hand it out again. Threads that fail to return an object to the arbitrator are defective -- they are resource-tyrants. If your question is asking "How can an arbitrator wrest contol of a resource back from a resource-tyrant?" the answer is "you can't". If that's a problem, then either don't write resource-tyrant threads, or don't share resources with a resource-tyrannical thread.

I think your StreamCopier.copy() example is kind of an odd case. First, I don't think I'd call it copy() -- seems to me that expand() or concatenate() might make more sense. Second, I probably wouldn't have structured it as a static method, but as an instance-method of a class that either expanded a given InputStream arg to an instance-variable OutputStream (guaranteeing the expansion would occur without intrusion), or a class constructed with several InputStream sources that guaranteed reading would occur in sequence (concatenation). The former makes more sense to me for your "copy" feature. The latter is essentially a java.io.SequenceInputStream. Either way, I think the StreamCopier is just a poorly-designed class for its intended purpose (judging by its copy() method alone, since I haven't read your book).

Since I haven't read your new book yet, I don't know to what use StreamCopier is being put. If it's guaranteeing non-intrusion of other writes, then a sub-classed FilterOutputStream could easily fill the bill. If it's concatenation, then SequenceInputStream might work, though I doubt its Enumeration is guaranteed thread-safe, so you might need a custom-made InputStream sub-class with thread-safety and concatenation.

Finally, here's a common idiom, overextended to make a point, that shares certain characteristics with the problem as you posed it. Consider:

  OutputStream out1 = mySocket.getOutputStream();
  OutputStream out2 = new BufferedOutputStream( out1 );
  OutputStream out3 = new CRLFingFilterOutputStream( out2 );
  OutputStream out4 = new BufferedOutputStream( out3 );

Here we have 4 different OutputStreams, and none of them are thread-safe. A single thread that writes to out1, out2, out3, and out4 at different points is going to screw things up badly, due to buffering, filtering, etc. OH NO, HOW DO WE PREVENT THAT?! Simple -- you create the thread's code to only write on out4. It can't mangle what it can't touch.

The same principle applies to the thread-safety issue with StreamCopier. If it's reading and writing to a synchronized stream, then each buffer-full that it processes has guaranteed sequential integrity. If it needs to guarantee sequential integrity of an entire InputStream onto an OutputStream, then it needs to participate in the Designated Sharing Ritual just like every other thread that has access to either stream. If a thread can't play by those rules, then you shouldn't let it play with other threads who are willing to share their toys and play nicely.

Sorry if that went too long.

-- GG

p.s. I don't have either of your Java I/O or XML books, but even if I don't win, you could just leave the choice up to the winner -- if unsure of customer's needs, ask customer. ;-)


[ Cafe au Lait | Books | Trade Shows | Links | FAQ | Tutorial | User Groups ]

Copyright 1999 Elliotte Rusty Harold
elharo@metalab.unc.edu
Last Modified June 18, 1999