public class UtilizedColumnsAnalyzer
extends Object
Finds all utilized columns in the query. Utilized columns are those that would have an "impact" on the query's results.
For example, in the query:
SELECT nationkey FROM (SELECT * FROM nation WHERE name = 'USA')
Even though all the columns in table nation are referenced by the query (in the SELECT * part), only the columns
"name" and "nationkey" have an "impact" on the query's results.
The high-level algorithm works as follows:
1. Find all fields referenced in all clauses of the outermost SELECT query, and add them to an explore list.
2. For each field reference F in the explore list, find its referenced relation R.
3. If R is a SELECT query:
a. Find the SELECT item expression that F references. Add all fields referenced by that expression to the explore list.
b. Add all fields referenced by every other clause of the SELECT query to the explore list.
4. Otherwise,
a. Add F's referenced field to a referenced fields list.
b. For each child of R, find the corresponding child of F, and add it to the explore list.
5. Repeat from step 2 for all fields in the explore list, until all have been resolved to a base table relation.
The referenced fields list at the end of this algorithm will contain all the columns referenced by the query, that impact the output.
Step 3a is where fields that do not impact the output are pruned.