Concurrency and Multi-threaded Programming in Java – Part 1

I am giving a talk at the office this week about some of the idiosyncrasies of concurrency in Java. So I thought it would be nice to just expand that discussion into a couple of blog posts. This first post will go over the basic constructs for synchronization and concurrency as well as how to implement them in Java, in the second post I will talk about the java.util.concurrent library provided in Java 5+ and talk about the most common error resulting made using concurrency, and in the final post I will discuss advanced concurrency issues specifically related to the Java programming language.

What is Concurrency?

Concurrency in programming is the idea of breaking up computation into pieces that can be executed simultaneously. How this actually occurs depends on the underlying hardware but for our purposes we will assume that execution occurs in parallel. One of these pieces of computation is called a Thread. Concurrency becomes especially interesting and complex when the threads interact with each other. This can generally be through shared access to the same data, direct action on each other’s fields or shared need for limited resources. When this is the case it becomes necessary to synchronize the actions of individual threads.

Terminology

There is a lot of terminology involved with concurrency so I am going to go ahead and define the terms I am going to use and what I mean by them. Many of these terms have multiple meanings but unless otherwise stated I am referring to the following meanings.

Lock: A lock is what a thread gains when it enters a block of code (or gains access to a resource) that is synchronized. In general, when one thread has a lock, all other threads are blocked from accessing that resource, object or block of code.

Critical path/section: A critical path is a piece of code that is synchronized.

Condition Variable: A condition variable is a variable that is associated with a lock.

Mutex: A mutex (mutual exclusion) is a construct for creating a lock and is generally used interchangeably with lock.

Semaphore: A semaphore is a mutex that supports the use of a counter to allow multiple threads access to the same block or resource. (A semaphore with a counter of 1 is equivalent to a mutex)

Barrier: A barrier is a holding zone for multiple threads. It acts as a gate that only releases threads when a certain number have reached that point.

Monitor: A monitor is a term that has an inconsistent meaning and I will avoid using it but if you see it elsewhere it will generally be referring to a lock or an implementation of a locking mechanism.

Concurrency in Java

Java is especially interesting because, unlike many programming languages, it has support for multi-threaded execution and synchronization built directly into the language. All objects in Java implicitly have a single condition variable used for locking associated with it. This variable is accessed through the wait() / notify() methods. Prior to Java 5 this was the primary way synchronization was achieved. Concurrency is a very abstract topic so let’s just dive into an example that uses multiple threads.

This first example demonstrates how multiple threads work independently. Here we create three simple thread objects that simply print the given character 100 times.

public class MultiThreadExample {

    public static void main(String args[]) {

        //Create 3 Thread objects
        Thread aThrd = new PrintThread("a");
        Thread bThrd = new PrintThread("b");
        Thread cThrd = new PrintThread("c");

        //Start the threads
        aThrd.start();
        bThrd.start();
        cThrd.start();

        //You might expect this code to simply print 100 a's followed by b's
        //and c's but instead you see the values are interweaved and likely
        //to be different each time you run this file.
    }

    static class PrintThread extends Thread {

        String val;

        public PrintThread(String val){ this.val = val;}

        //Our run method here simply prints the val 100 times
        @Override
        public void run() {
            for(int i=0; i<100; i++){
                System.out.print(val);
            }
        }
    }
}

Example 1 output:

andrew@andrew-Inspiron-1520:~/dev$ java MultiThreadExample
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaabbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb

As you can see from running the last example, what we get is our three threads executing in parallel and as a result printing to the console in no specific order. The threads share access to the standard out. We could modify the code to guarantee one thread completes it’s use of the standard out prior to another beginning. That is boring though so let’s instead look at a slightly more complicated example.

In this example we are going to build a Queue implemented with a linked list that is thread safe. Before we do so, let’s look at what could go wrong if we didn’t. Below is the implementation of our queue. Take a second to try and figure out where a problem might arise with using a Linked List if we didn’t include the synchronized keyword.

public class QueueExample {

    class Node {
        public Object data;
        public Node next;

        public Node(Object data){this.data = data;}
    }

    private Node head, tail;

    public synchronized void enqueue(Object data){
        if (null == head){
            head = new Node(data);
            tail = head;
        }else{
            tail.next = new Node(data);
            tail = tail.next;
        }
    }

    public synchronized Object dequeue(){
        Object result = null;
        if(null != head){
            result = head.data;
            front = front.next;
        }
        return result;
    }
}

Figure it out? That’s right the problem occurs when two threads add an element at the same times. Since the act of adding an element is not an atomic operation, there is room for an inconsistent state to develop. Consider the following execution given our queue q and two threads t1, t2. Let’s also assume q already contains elements and the tail is called x.

Thread 1 execution Thread 2 execution
t1 calls q.enqueue(y) T2 calls q.enqueue(z)
(x) tail.next is set to a new Node with data = y ….
…. (x) tail.next is set to a new Node with data = z
tail is set to y ….
…. tail is set to null

So what does this leave us with once both threads have finished executing? A broken queue is what. The former end of the list X is now pointing to Z as it’s next node when it should be Y; X is completely lost with nothing referencing it at all; and our queue thinks a null node is the tail element.

Why does that magic word ‘synchronized’ fix the problem? ‘synchronized’ is the Java keyword for requiring a lock or mutex on the object to use it’s method. It means that in order to enter that method a thread must first obtain a lock. Since only one thread can have a lock on that object at a time, our problem is fixed because an element can only be added or removed when no other thread is acting on the queue. So in our execution example above t2 would simply wait for t1 to exit the method and be notified that it was free before it began executing the enqueue() call. To further use our terminology, we would say that the enqueue method was our critical path because it required a lock to use.

What if we have a much more complex method and only a small part of it is a critical path? Naturally we don’t want to synchronize the entire method because it would waste cycles waiting and reduce the level of concurrency. Furthermore, what if we don’t need a lock on the entire object whose method we are using but rather just the small critical path? We instead lock only on what is absolutely required and we use a separate object. An example might be a method that modifies a user’s account given some identifying information about the user. We don’t need to lock on any computation to get the account, but instead only the portion that is making changes to the account.

public void modifyAccount(Object userId){
    //...
    //perform complex steps to get userAccount object
    //...
    synchronized(userAccount){
        //execute critical path
        //make changes to account
    }
    //...
    //finish computation
}

Note that we are locking on the userAccount object and not the object whose method we are executing. Often times the object that you use to lock a critical path is insignificant and used purely to construct the mutex with it’s implicit condition variable.

So now we know how to lock critical code blocks, but what if we have a limited resource that has a finite quantity available. A simple mutex won’t be sufficient because it doesn’t allow us to make full use of our resources. Here is where we use a semaphore and take advantage of the count variable. Below is a simple example implementation of a Semaphore class.

public class Semaphore {

    private int permitCount;

    public Semaphore(int permitCount){
        this.permitCount = permitCount;
    }

    /**
     * Get one resource lock
     */
    public synchronized void acquire(){
        try{
            while(permitCount == 0)
                wait();
            permitCount--;
        }catch(InterruptedException ie){System.err.println(ie.getMessage());}
    }

    /**
     * Release numPermits resource locks
     */
    public synchronized void release(){
        permitCount++;
        notify();
    }
}

We have now successfully created a basic semaphore. It is essentially just a mutex that allows multiple threads to have a lock on the resource. Some good examples of when this is useful is when you have a finite number of database connection available or you have a pool of threads handling messages as they arrive. There are a couple of things worth noting about our implementation that is different from what will be found in the Java 5 library. In our implementation we make no guarantee that threads will receive resources in the order they are requested and we allow threads to release resources independent of whether or not they had acquired one to begin with.

The last basic concurrency construct I am going to discuss is the barrier. A barrier is very simple and is quite often overlooked because its functionality is easily achieved in a variety of ways. It is essentially exactly what it sounds like. An object that holds up a set of threads until a certain number have arrive, then releases them. It can be thought of as the reverse of a semaphore. Below is a simple implementation:

public class Barrier {

    private int numMembers, generation = 0, waitingCount = 0;

    public Barrier(int numMembers){this.numMembers = numMembers;}

    public synchronized void await() throws InterruptedException {
        int currentGeneration = generation;
        waitingCount++;
        if(waitingCount == numMembers){
            waitingCount = 0;
            generation++;
            notifyAll();
        }else{
            while(generation == currentGeneration){
                wait();
            }
        }
    }
}

This simple construct simply forces the first numMembers – 1 threads to wait and then releases them all once enough have arrived. The barrier is then reset and available for use again. The barrier is a very straight forward construct that can be used for a variety of things.

To wrap up we have now been introduced to the basic terminology related to synchronization and concurrency as well as looked at how to implement the basic constructs in the Java programming language. You should hopefully now feel comfortable using threads to facilitate simple computing in parallel. In the next post, I will discuss the most common errors that result from incorrect synchronization, introduce the java.util.concurrent library that provides complete implementation for all of the constructs we created above as well as some more advanced tools. Finally in a third post I will discuss advanced concurrency issues specifically related to the Java programming language.

I am by no means an expert so I welcome any comments, corrections or requests for additional information in this series. I think this is an important and poorly understood topic and I would like to provide a strong introduction here.

Share or else...
  • Twitter
  • Facebook
  • RSS
  • HackerNews
  • Reddit

6 thoughts on “Concurrency and Multi-threaded Programming in Java – Part 1

  1. Thomas Nagel

    Maybe you could add another point:
    if there are goups or pairs of mutually exclusive methods then the “synchronized” keyword might as well – and imho even better – added into the methods, like this:

    public sychronized void producer () {
    sychronized(this) {
    // access resource: create object
    }
    }

    and

    public sychronized void consumer(){
    synchronized(this) {
    // access resource: consume object
    }
    }

    This makes sure not only can always only one Thread enter such a method, but it also waits for a co-Thread to leave the other paired methods.

    regards
    Thomas

    Reply
  2. Pingback: 多執行緒簡介 – Concurrency and Multi-threaded Programming in Java – Part 1 « OOXX

  3. Pingback: Concurrency and Multi-threaded Programming in Java – Part 1

  4. Pingback: An Open Forum » Blog Archive » Concurrency and Multi-Threaded Programming in Java – Part 2

  5. Matt H

    I thought adding “synchronized” to a method was equivalent to wrapping the entire method in synchronized(this) {} ?

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>