Override SemanticAnalyzer.
Override SemanticAnalyzer.analyzeInternal to handle CTAS caching and INSERT updates.
Unified views: For CTAS and INSERT INTO/OVERWRITE the generated Shark query plan matches the one created if the target table were not cached. Disk => memory loading is done by a SparkLoadTask that executes _after_ all other tasks (SparkTask, Hive MoveTasks) finish executing. For INSERT INTO, the SparkLoadTask will be able to determine, using a path filter based on a snapshot of the table/partition data directory taken in genMapRedTasks(), new files that should be loaded into the cache. For CTAS, a path filter isn't used - everything in the data directory is loaded into the cache.
Non-unified views (i.e., the cached table content is memory-only): The query plan's FileSinkOperator is replaced by a MemoryStoreSinkOperator. The MemoryStoreSinkOperator creates a new table (or partition) entry in the Shark metastore for CTAS, and creates UnionRDDs for INSERT INTO commands.
Generate tasks for executing the query, including the SparkTask to do the select, the MoveTask for updates, and the DDLTask for CTAS.
This is used in driver to get the result schema.
This is used in driver to get the result schema.
Shark's version of Hive's SemanticAnalyzer. In SemanticAnalyzer, genMapRedTasks() breaks the query plan down to different stages because of mapreduce. We want our query plan to stay intact as a single tree. Since genMapRedTasks is private, we have to overload analyzeInternal() to use our own genMapRedTasks().