Suck It Dry - Tuning Parallel Execution

Home Page

Back to Papers

Please note this page has moved to http://oracledoug.com/px2.html. You will be redirected in 5 seconds but it would be better to update your bookmarks ;-)

Architecture

Single-threaded Full Table Scan

First, let’s look at the default Single-threaded architecture.

Figure 1 - Standard Single-threaded Architecture using Dedicated Server Process

This should be very familiar. The User Process (on the client or server) submits a SELECT statement that requires a full table scan of the EMP table and the Dedicated Server Process is responsible for retrieving the results and returning them to the User Process.

Parallel Full Table Scan - Degree 2

Let’s look at how things change when we enable Parallel Execution.

Figure 2 – Parallel Full Table Scan – Degree 2

This time, the server is going to process the query in parallel as a result of the optimizer hint. When the server sees that the requested Degree of Parallelism (DOP) for the emp table is two the dedicated server process becomes the Query Coordinator. It makes a request for two PX slaves and, if it’s able to acquire them, it will divide all of the blocks that it would have had to scan in the emp table into two equal ranges. Then it will send a SQL statement similar to the following to each of the slave processes.


SELECT /*+ Q1000 NO_EXPAND ROWID(A1) */ A1."EMPNO",A1."ENAME",A1."JOB",
  		A1."MGR",A1."HIREDATE",A1."SAL",A1."COMM",A1."DEPTNO"
FROM "EMP" PX_GRANULE(0, BLOCK_RANGE, DYNAMIC)  A1

Note that Oracle 10g changes this approach slightly, so that the SQL statement associated with the slave processes will be the same as the Query Co-ordinator, i.e.


SELECT /*+ parallel (emp,2) */ 
*
FROM EMP

Although this makes things a little easier to follow, it’s more difficult to get at the detail of what various px slaves are doing because you can’t use the SQL statement to differentiate them or find the cost for a given slave’s SQL statement.

As the data is retrieved from the emp table, it will be returned to the query co-ordinator which will, in turn, return the data to the user process. The way that all of the data is moved between the processes is using areas of memory called parallel execution message buffers or table queues. These can be stored in either the Shared Pool or Large Pool.

Parallel Full Table Scan with Sort

Figure 3 –Parallel Full Table Scan with Sort – Degree 2

The first thing to note is that there is no PARALLEL hint in the query and yet Oracle chooses to use Parallel Execution to process it. The reason is that the emp table itself also has a parallel DEGREE setting which allows us to specify whether Oracle should attempt to parallelise operations against that table and in this case, it’s been set to 2 (Note that the use of PX implies the use of the cost-based optimiser, so you should make sure that you have appropriate statistics on the object too) :-


ALTER TABLE emp PARALLEL 2;
SELECT table_name, degree
FROM user_tables
WHERE table_name = 'EMP';
TABLE_NAME                     DEGREE
------------------------------ ----------
EMP                                     2

Hold on a minute! We requested a DOP of two and yet there are four PX slaves being used to process our request. This is because Oracle will often use two sets of PX slaves for a specific action. The first set produces rows (and are known as producers) and the second set (called consumers) consumes the rows produced by the producers. So, in this case, Oracle can see that we are going to have to perform a sort because the NAME column isn’t indexed, so it requests 4 PX slaves – two sets of two. The first set are responsible for scanning the EMP table and the second set for sorting the data as it’s delivered by the producers. As you can see, though, the type of operation defines the way the workload is distributed between the slaves in each set. For the full table scan, it’s based on block ranges. For the sort, the QC process acts as a Ranger and divides up the sort activity by calculating the correct range of values for each slave to sort so that they’ll process a reasonably equal number of rows. The sort slaves will receive statements similar to the following.


SELECT A1.C0 C0,A1.C1 C1,A1.C2 C2,A1.C3 C3,A1.C4 C4,A1.C5 C5,A1.C6 C6,A1.C7 C7
FROM :Q1000 A1 ORDER BY A1.C0

There are a couple of very important things to note :-

Each PX slave in a given set must be able to communicate with all of the slaves in the other set so as the DOP increases, the number of connections will increase rapidly. As Jonathan Lewis points out in his article The Parallel Query Option in Real Life, the number of inter-slave communication paths is DOP-squared. Even with a DOP of two, you can see this means four inter-slave connections, a DOP of four would need 16 connections and so on.
The maximum number of processes required for a simple query is 2 x DOP plus the Query Co-ordinator. However, if there are multiple DFOs in the plan then additional PX slaves may be acquired and slave sets may be re-used. For example, Jonathan Lewis sent me an example statement that will fire up 34 slaves on Oracle 10g and 22 slaves on 9i, so things are more complicated than they might appear! (I plan to write a seperate paper on this so I can cover the subject properly.

Technical Papers Utilities and Scripts Book Reviews Links
My Resume Fun & Games Email Home