Friday, September 10, 2010

Datawarehousing WorkBench

Datawarehousing Workbench Modeling:

Introduction of Query Performance Optimization:

Project has set a minimum acceptable run time for queries. Some queries
violate this threshold. Our job is to improve these slow-running queries,
and we want to learn how to accomplish this goal.

Query and Web Performance Factors

Technical/Basis team Responsibilities:

1. Server CPU and Memory and Disk Sizing
2. Application Server system settings(Memory,Buffers)
3.OLAP Cache
4.BI Accelerators
5. DB Statistics
7.Client Hardward

BI Back-end Team Responsibilities:
1.Data Model
5.Virtual Key figure/Characteristics
6.Read Mode

BI Front-end Responsibilities:

1.Query Definition(incl, OLAP Features)
2.Pre-calculation Web templates
BEx Vs Web
ODBO/Third party.

Examine BI Stastics for system load runtime information for query
in question.
BI Stastics keeps detailed data about processing times.
With information from BI Statistics, review query design for violations
of good practices.
Consider Precalculation/information broadcasting(useful for many reports
targeted to end users).
4. with information from BI Statistics, consider use of BI Accelertor,
if it has been installed on the system.
5. With information from BI Statistics, consider the creation of aggregates,
assuming BI Accelerator is not an option.
6. With information form BI Statistics and other information on table
sizes(RSRV) and SQL access times(ST01), consider remodeling the star
schema or adding indexes to operational data stores to improve the design.
7.Work with the Basis team to evaluate cache options, system load, and
other system settings that could be affecting performance.

There are many tools we can use to determine the root cause of performance
problems, including the query monitor(transaction RSRT). The query
monitor is a simple way for you to run and debug a query in the back
end. By running the query with this tool, we eliminate many other issues,
such as web page design and workbook customizing, from the core of
issue of query performance.

Creating Aggregates for the Cost Center Accounting Infocube will improve
the performance of queries that use data from this Infocube. Our BI Project
team needs to define aggregates for this Infocube because query performance
is less than acceptable. The team also wants to compare the advantages
of optimizing system performance for the end user against any associated
disadvantages and costs.

An Aggregate is a summarized view of the data in a Infocube. In an Aggregate,
the data set of a Infocube is stored redundantly and persistantly in its
own Infocube like structure on the database. An aggregate on an Infocube
is similar to indexes on a database table. Although technically very
different, they both improve system performance without the need for
user intervention. The only way a user would know that an aggregate
was built for a Infocube they were querying would be by noticing that
the query ran faster. Aggregates can be created only for Infocubes(not
Virtual Cubes or Datastore Objects). Aggregates can be formed on Infocubes
by selecting specific objects connected to the Infocube, such a

Characteristics in dimension tables
Hierarchies associated with the Infocube
Navigational attributes associated with the Infocube.

Example for Aggregates:

Although the data is already stored in a summarized star schema structure,
the Infocube must contain the level of detail(granualarity)  needed to
support even the most detailed user requirement. For example if 1% of the
queries need to have the Day level of detail, but the other 99% just
need Month, your time dimension and the connected fact table must be
designed to support DAY leval of detail. This will cause many more
records to be stored in our infocube than would be necessary to just
support MONTH requirement. The end result might be that 99% of the
queries run to slower than they otherwise could, which would result
in the query not meeting our response-time goals.

Using aggregates reduces the volume of data that is accessed for each
query because data is stored in a separate aggregate infocube, which
contains a limited set of characteristics from the connected Infocube.
This would improve the reading performance of queries that have details
supported by an aggregate. Within the aggregate, data is automatically
summarized by selecting a limited number of characteristics and their
navigational attributes or available hierarchies.

If the navigation of the query is supported by an aggregate, the OLAP
processor will use it. If more than one meets the need, the OLAP processor
chooses the most efficient one.

Aggregates are particularly useful in the following cases:

Executing and navigating query leads to delays with groups of queries;
Aggregates can be designed to improve the speed of a group of queries.
2. We want to speed up the execution and navigation of a specific query
This is not as desirable as the reason above, unless the query is used
very often.
3.We frequently use navigational attributes in queries. Navigational
attributes are inherently slowlhy, as they are one table further away
from the facts table, hence require an additional table join to retrive
the navigation attribute values. The run time of queries with navigational
attributes is drastically improved when they are in an aggregates.
4.We want to speed up reporting on characteristic hierarchies by
aggregating specific hierarchy levels. A infocube that has salaries for
1,00,000 employees, for example, could have an aggregate using the
organization structure all the way down to supervisor, with a lot

An aggregate can be built from the characteristics, navigation attributes,
and hierarchies of a Infocube. Both time-dependent navigational attributes
and time-dependent hierarcheis can be used in aggregates, as well as
time-independent ones. The use of time-dependent objects complicates the
ability to utilize the aggregate by queries.

Aggregates are defines as follows:
* Inclusive, the characteristics is used in the summarization.
' ' Exclusive; a characteristic is not used in the summarization.
F Inclusive with a fixed value; the characteristic is summarized
    with a specific fixed value
H Hierarchy level; the characteristic is summarized for a specific
hierarchy level.

We can combine several characteristics into an aggregate. Combinations
in the restrictions, such as fixed values of hierarchy levels, are also
possible. For example we can have a 'country=German & all customers &
Materials" aggregate

Exception aggregation means that the key figure behaves differently
when analyzed against different characteristics. For example, Inventory
balance can be added across customers, but not across time. When time is
on the report, the value for the result of the inventory is the value
of the balance of the last day. It should not add up the inventory on
each day to make a total inventory. If you have thee types of keyfigures
on your infocube, we would need to have the exception characteristic
included in the aggregate design; in this case, DAY must have * set
in the aggregate.

Aggregates have technical properties, content properties, and status
properties. This information is stored in the RSDDAGGRDIR table and is
displayed on the GUI.

Before we can be used in reporting, aggregates must be active and filled
with data. If we are prevented from using one or more aggregates, there
are three options available;

Swith off the aggregates: The aggregate is still filled with data, but
is not used by the OLAP processor. We still need to roll up new data.
Deactive the aggregagte: The aggregate data is deleted, but the aggregate
definition remains.
Delete the aggregate: The aggregate data and the definition of the aggregate
are deleted.

A Infocube can contain more than one aggregate, but usually only one
aggregate can be used in each query step. If several aggregatres are
created for a infocube, they are checked by the OLAP processor when a
qury is run. During this check, the OLAP processor reads the Infocube
aggregates and selects the aggregate that is suitable for the query

In situations where more than one SQL statement is created, it is possible,
for the OLAP processor to select data from more than one Infocube.
One example of this is with Multiprovider-based queries. Another situation
where this happens is if you had 2 restricted key figures, for example
one "sales for Germany" and the other "sales for year" = 2006'.

If 14 or fewer characteristics are included in the aggregate, the BI
system does not create real dimensions; a line item dimension is created
instead. In the case of a line item dimension table is eliminated and
the charactteristic Infoobjects SID is instead written directly to the
fact table. When this happens, the aggregates are called flat aggregates.

The characteristics for data package and time are the exceptions to this
rule. These characteristics are each stored in a dimension as they were
on the original infocube.

Even though new data was loaded to a BI Infocube, the data is not available
on any query until it has been moved(rolled up) into all the aggregates
This ensures a consistent(yet older) response whether a query uses
a infocube or one of its aggregates.

Be particularly careful when creating an aggregate containing a navigation
attribute. This type of aggregate can be costly to maintain, as any change
in the value of the attribute might cause an adjustment or complete
rebuild of all the aggregates.

A change run is usually executed after master data is loaded via a
process chain.

Process Chains and Aggregates:

A process chain is a sequence of processes scheduled in the background
and waiting to be triggered by a specific event.

By using process chains, we can:

Use event-controlled processing to automate complex processes in the
BI system.
Use network graphics to visualize the processes
Control and monitor the processes centrally.

1 comment:

  1. Hello all,

    Data warehousing is combining data from multiple and usually varied sources into one comprehensive and easily manipulated database. Common accessing systems of data warehousing include queries, analysis and reporting. Because data warehousing creates one database in the end, the number of sources can be anything you want it to be, provided that the system can handle the volume. Thanks a lot!

    Extract Web