Jeff's Strongbox: Outer Joins no Data Integrator

Implementando um outer join no Oracle

Outer joins são aquelas consultas baseadas em duas tabelas (digamos, A e B) em que o resultado apresenta todas as linhas da tabela A, mesmo que estas não possuam correspondência em B (+), que nesse caso, é expandida com nulos . Ex:

select departments.department_id, count(employees.employee_id)
from hr.departments departments, hr.employees employees
where departments.department_id = employees.department_id (+)
group by departments.department_id

O resultado dessa consulta é:

Observe, na imagem, que todos os departamentos da tabela departments foram exibidos mesmo quando não houve nenhum registro correspondente na tabela employees. Nesses casos a função count retornou o número 0, que foi exibido nas linhas correspondentes a esses departamentos.

Outro exemplo parecido:

select departments.department_id, count(employees.employee_id)
from hr.departments departments, hr.employees employees
where departments.department_id(+) = employees.department_id
group by departments.department_id

O resultado, desta vez, é:

No exemplo acima, havia um registro na tabela employees cujo departamento não existia na tabela departments. A query então retorna todos os registros de funcionários, inclusive aquele. Porém, não encontrando correspondência na tabela departments, retorna um null.

Implementando um outer join no Data Integrator

Para implementar o primeiro exemplo no Data Integrator, preencha a aba “Where” com o texto abaixo e configure a aba “Outer Join” conforme a figura a seguir:

where departments.department_id = employees.department_id

Em outras palavras, após definir uma query comum, preencha a aba “Outer Join” com o nome das duas tabelas, sendo que a tabela que teria o sinal (+) no Oracle deve ficar no campo “Inner Source”.

Considerações sobre performance:

Data Integrator Performance Optimization Manual, pagina 28:

Data Integrator implements joins as nested loop joins. The source with the
higher join rank or the one specified as an outer source becomes an outer loop.
If all the ranks are equal or not set (defaults to 0), then Data Integrator picks an
inner and outer source at random. During job execution, Data Integrator reads
the source in the outer loop one time; and reads the source in the inner loop for
each row in the outer loop. Performance improves when fewer rows are read.

Wikipedia (http://en.wikipedia.org/wiki/Join_(SQL)):

Nested loops
Main article: Nested loop join
This is the simplest join algorithm. For each tuple in the outer join relation, the entire inner join relation is scanned, and any tuples that match the join condition are added to the result set. Naturally, this algorithm performs poorly if either the inner or outer join relation is very large. The performance though can be enhanced if the inner relation has an index on joining column.

A refinement to this technique is called "block nested loops" (BNL): for every block in the outer relation, the entire inner relation is scanned. For each match between the current inner tuple and one of the tuples in the current block of the outer relation, a tuple is added to the join result set. This variant means that more computation is done for each tuple of the inner relation, but far fewer scans of the inner relation are required.

Jeff

Jeff's Strongbox

quarta-feira, 2 de maio de 2007

Outer Joins no Data Integrator

Nenhum comentário:

Arquivo do blog