In Java concurrency, ForkJoinPool uses a work-stealing multi-threading framework which works well for executing tasks of uneven distribution of chunk sizes. Before you read on, you need to know Java Multithreading Approaches first.
A ForkJoinPool implements ExecutorService but it differs from other (I'll refer them as 'traditional') ExecutorService mainly by virtue of employing work-stealing - all threads in the pool attempt to find and execute subtasks created by other active tasks. It automatically balance the task load between threads, while traditional ThreadPoolExecutor has no mechanism for such kind of load balancing. If no available worker thread is available, tasks will be blocked until a thread becomes available to steal work from those workers who are busy.
ForkJoinPool is an implementation of the Divide and Conquer algorithm in which a central ForkJoinPool executes branching ForkJoinTasks. A ForkJoinTask is a thread-like entity but is much lighter weight than a normal thread. Huge numbers of tasks and subtasks may be hosted by a small number of actual threads in a ForkJoinPool. Because ForkJoinPool is an ExecutorService, its logic is a kind of 'submit a callable' approach in multithreading programming. It,
If so, how does ForkJoinPool differ from traditional 'summit a callable' approach introduced since Java 5?
To divvy up a bigger task into smaller ones, you extend RecursiveTask and implement a compute() method as follows. Inside compute(), you divide and conquer, then return the result after join. In RecursiveTask compute() is similar to run() method of Thread/Runnable and call() method of Callable interface. For example, the following example recursively execute sub-tasks to calculate Fibonacci series:
 The main computation performed by this task. You must define this method, but you should not in general call it directly. Implement compute() as if it is a recursive function that has en ending condition. The compute() of a RecursiveTask returns a V that is introduced in the generalized form of the declared RecursiveTask<V>.
 Performs the given ForkJoinTask task, returning its result upon completion and return a V. This V is what compute() method of the RecursiveTask returns. Usually, more tasks were invoked from within compute(). A ForkJoinTask is a thread-like entity that plays similar role as Future thus can be thought as a lightweight form of Future.
Let's zero in on to the compute() method, it is the where you divvy up bigger tasks into smaller ones and invoke to execute each task:
 fork() allows a new ForkJoinTask (worker1) to be launched from an existing one.
 join() allows a ForkJoinTask (existing one) to wait for the completion of another one (worker1).
 When a task calls the invokeAll() method it waits until the tasks sent to execute through this method finish.
 The value from the subtasks is obtained with the get() method from the Future interface.
performance comparison - ForkJoinPool vs. ThreadPoolExecutor
With an unevenly distributed workload among tasks/threads, the ForkJoinPool achieves better results, while the traditional ExecutorService suffers under the uneven distribution. However, using ForkJoinPool, if the tasks are broken up into sub-tasks that are too small, performance will suffer. (see this benchmark)
The Fork/Join library introduced in Java 7 extends the Java concurrency package (ExecutorService) introduced in Java 5 with support for multicore hardware parallelism. Class ForkJoinPool implements Executor and ExecutorService interfaces. It is isn't intended to replace older Java Concurrency classes (e.g. ExecutorService); instead it updates and completes them for dealing with uneven task distributions.
what to read next