Hey Jody, thanks for stopping by
The culprit was a nested IF statement on a time dimension that was dynamically flagging a column (Y/N) in an attribute view. This of course joined to millions of rows in the fact got expensive very quickly, and was even able to bring the DB to its knees for about 15 minutes as described above.
Once I was able to recognize what the developer did, we materialized the logic in the underlying table (populates through a stored procedure) and the difference was incredible, something like 20s on 200m rows vs. 15 minutes at 100% CPU prior. Of course after applying some filters to help narrow the join I got it to about 3s, which was great.
The problem (for me at least), is that it was not obvious by analyzing the VizPlan where the expense was coming from. All I could see was an extra step in the calc engine (ceGNavAggregationPop) and a long running step in the ceOlapSearchPop step, example here (not the super long running example however). It seems that with this SQL syntax, that no matter what we are always invoking the calc engine.
By trial and error, I was able to isolate the problem column and remove it from execution, and accordingly the VizPlan dropped that extra step and looked more like this.
As I have seen you state a few times, the cost of calculated columns can be immense and this is a very serious example of that! In general however, would you say that these are really only expensive when there is expression logic or string manipulation? From what I have seen, as long as the calculations are only with measures, the processing seems to stay within the OLAP engine. Do you have any examples of VizPlan where it was obvious to you that a calculated column (string or logic) was causing the issue or how do you typically diagnose that?
Another interesting observation regarding Explorer queries to HANA is that any time there is a variable built on a model, you have the option to choose a value for this variable during Explorer InfoSpace design time (through 'validate' button). One would assume that this would affect the WHERE clause (which it does), but the added side effect is each time that the InfoSpace or ViewSet is executed, there is a SELECT DISTINCT <variable_column> FROM <model> also executed even though the value is already selected. This seems a little redundant and also adds useless overhead to the processing. The problem materializes itself even moreso when creating viewsets as it issues this same DISTINCT for each component in a viewset (maybe 4-7 times), and adds no value. So we are in the process of analyzing how to achieve similar functionality without using a variable.
Always something interesting to figure out these days!
Regards,
Justin