The hottest multidimensional data model and the im

  • Detail

The realization of multidimensional data model and OLAP

in recent years, with the wide application of network technology and mathematical analysis in the banking industry, western commercial banks began to widely adopt demographic and geographical statistics theory, and use data mining and business intelligence technologies to deal with the problems of bank data integration and analysis across time, space, departments and products, gradually realizing the cross sale of financial products and financial services in 2016, To retain quality customers. At present, many domestic banks have also carried out the construction of business intelligence (BI) systems to varying degrees. From the perspective of information management, business intelligence is the further development and improvement of the application of decision support technology in commercial banks, and OLAP (on-line analytical processing) technology is one of the key technologies of BI. Since express, the first prototype tool of OLAP, was released in 1970, and Edward Kuder (DD), the father of relational database, mathematician and computer scientist, systematically put forward the concept of OLAP and the 12 criteria of OLAP in 1993, OLAP technology and products have made great progress, and its connotation and extension have also changed to some extent. But its essential feature is still that it organizes and stores data based on multidimensional data model to meet the rapid response and interactive operation to user requests

in the process of the rise and development of OLAP Technology in China, people still have different understandings of some basic concepts. For example, the relationship between OLAP and multidimensional data model, the relationship between multidimensional data model and multidimensional database (MDD), the difference between MOLAP (multidimensional OLAP) and HoLap (hybrid OLAP), and whether multidimensional database and multidimensional OLAP are completely consistent need to be further clarified

I. multidimensional data model and related concepts

data model generally has two levels: concept layer (logic layer) and physical layer. Logical data model abstracts the internal laws of the real world from the conceptual point of view, such as business process, data architecture, etc; The physical data model focuses on the specific implementation in a specific environment, such as efficiency, security and so on

multidimensional data model is a logical concept, which mainly solves how to quickly query and display a large amount of data from multiple angles, so as to obtain information and knowledge conducive to management decision-making. The application fields of multidimensional data model mainly include data warehouse, OLAP and data mining. Among them, multidimensional structure is the core of OLAP

the multidimensional data model conceptually treats information as a cube by introducing the concepts of dimension, dimension hierarchy and measurement. Figure 1 shows a data cube

Figure 1 a data cube

1 Cube: describes an object in three or more dimensions, each perpendicular to each other. Data measures occur at the intersection of dimensions, and all parts of the data space have the same dimension attributes

2. Dimension: it refers to the specific angle from which people observe data. It is a kind of attribute when considering problems. The set of attributes constitutes a dimension (such as time dimension, organization dimension, etc.)

3. Dimension hierarchy: the same dimension can also have various description aspects with different levels of detail (for example, the time dimension can include year, quarter, month, ten day, date, etc.)

4. Dimension attribute: a value of a dimension, which is the description of the position of a data item in a dimension (for example, "a year, a month, a day" is the description of the position in the time dimension)

5. Measure: a cell in a cube that holds data

olap's basic multidimensional analysis operations include roll up, drill down, slice, dice, and pivot

drilling includes drill down and drill up (roll up) operations. The drilling depth corresponds to the level divided by the dimension. The roll up operation aggregates on the data cube through the dimension specification; The tripping operation is the reverse of the roll up operation, from less detailed data to more detailed data

slice and slice are used to measure the distribution of data in the remaining dimensions after selecting values in some dimensions. In the multi-dimensional data structure, the required data can be obtained by cutting blocks according to two-dimensional and two-dimensional. For example, the sales situation of each city and each product can be obtained by cutting blocks and slices in the "organization, product and time" three-dimensional cube

rotation (rotation axis) is to change the direction of the dimension, that is, rearrange the placement of the dimension in the table (such as row column exchange), and obtain data from different perspectives through rotation

2. Physical implementation of multidimensional data model

olap multidimensional data model can be implemented in many ways, including multidimensional database using array, relational database and the combination of the two. People usually call them MOLAP, ROLAP and HoLap. However, MOLAP is easily misunderstood. After all, according to the multidimensional concept of OLAP, ROLAP is also a way of organizing multidimensional data

1. Multidimensional online analytical processing (multidimensional database management system)

multidimensional online analytical processing strictly follows the definition of Kuder, and establishes a multidimensional database to store the data of the online analytical system. It takes the multidimensional data organization as the core, that is, multidimensional online analytical processing uses multidimensional arrays to store data

when using multidimensional database to store OLAP data, it is not necessary to convert the concepts of dimension, layer division and cube in the multidimensional data model into other physical models, because multidimensional array (matrix) can well reflect the characteristics of the multidimensional data model

for Figure 1, a three-dimensional array matrix (7, 6, 3) can be defined to reflect the dimension, attribute and dimension quantity of the cube. The number of dimensions in the array corresponds to the number of dimensions of the cube, the value of each dimension in the array corresponds to the number of attributes of each dimension in the cube, and the 126 intersections of the array correspond to cells in the cube for storing data

the advantage of using array to realize multidimensional data model lies in the fast access to data, but it will also bring redundancy of storage space, that is, the problem of sparse matrix, which leads to great demand for storage space. For example, a cube structure defined in Figure 2 may take 10446.3 billion values when defined with an array. However, in fact, not every day, every operating institution in different regions and different special merchants will generate transactions with different currencies and different card types. Compared with the relational database management system, only when a transaction does occur will records be left in the corresponding tables

Figure 2 the problem of sparse matrix in a cube

in order to solve the problem of sparse matrix, some products have proposed strategies of sparse dimension and density dimension. The index block is generated from the sparse dimension, and the data block is formed from the density dimension. Only when the combination of sparse dimensions occurs for the first time, the index block is created, and then the data block is created

Figure 3 shows the relationship between data blocks and index blocks. The introduction of sparse dimension and density dimension reduces the storage redundancy of cube to some extent. In addition, the storage space of data block can be reduced through data compression technology

Figure 3 sparse dimension and density dimension

2 Relational online analytical processing (relational database management system)

rolap takes the relational database as the core and uses the relational structure to represent and store multidimensional data. The multidimensional structure of the multidimensional database is divided into two types of tables: one is the fact table, which is used to store data and dimensional keywords; The other is the dimension table, which uses at least one table for each dimension to store dimension description information such as dimension hierarchy and member category. Dimension tables and fact tables are linked by primary and external keywords to form a "star pattern". For dimensions with complex hierarchies, in order to avoid excessive storage space occupied by redundant data, multiple tables can be used to describe them. This extension of the star pattern is called "snowflake pattern". This multi-dimensional data representation enables users to understand these data in a simpler way, increases query efficiency, and requires less storage space for massive data

Figure 4 is a star schema, in which there is a fact table and an unormalized dimension table

Figure 4 star architecture

fact table has the following characteristics: ① release a large number of data columns of the pendulum, and the storage capacity can reach TBT; ② Mainly numerical information, only a few text or multimedia information; ③ There are external keywords connected with dimension tables; ④ Static data and aggregated data

The information in the

dimension table is the corresponding description of the fact table, such as product features, sales time, customer account number, etc. Through the dimension table, the complex description is divided into several small parts, such as the sales volume at a certain time point, so as to reduce the scanning of the fact table and realize the optimization query. It mainly has the following characteristics: ① the number of records is small, and there may be only thousands or tens of thousands of records; ② Mostly written materials; ③ Information has a hierarchical structure; ④ There is only one primary key (primary key or dimension key); ⑤ Information can be modified


snowflake architecture is a variation of the star architecture. It normalizes the dimension tables under the star architecture, so that they can express richer information and make information processing more flexible

3. Hybrid online analytical processing

hybrid online analytical processing uses multidimensional online analytical processing technology to store upper level summary data and relational online analytical processing to store detailed data, that is, the lower level is relational and the upper level is multidimensional matrix. This method has better flexibility

there are other ways to implement OLAP, such as providing a special SQL server and providing special support for SQL queries for some storage modes (such as star and snowflake)

III. comparison and selection of storage modes

the advantage of multidimensional online analytical processing is not only that it can clearly express multidimensional concepts, but also that it has a very high comprehensive speed. In the relational database management system, if you want to get the total sales volume in a certain region, you can only search the records one by one, while the plastic granulator finds the records that meet the conditions and adds the data; In multidimensional database, data can be accumulated directly by row or column, and its statistical speed is much faster than that of relational database management system. The more records in the database, the more obvious the effect. However, for multidimensional online analytical processing, with the increase of dimensions and dimension members, the storage space may explode

there is no size limit on the storage space of relational OLAP. The existing relational database technology can be used. The detailed data and summary data can be stored through SQL. The existing relational databases have made many optimizations for OLAP, including parallel storage, parallel query, parallel data management, cost based query optimization, bitmap index, OLAP extension of SQL, etc., which greatly improve the access efficiency of relational online analytical processing. In comparison, relational online analytical processing technology has greater scalability

see Table 1 for the comparison between multidimensional OLAP and relational OLAP. Table 1 Comparison between relational OLAP and multidimensional OLAP

data warehouse projects in the banking industry have the characteristics of ultra massive data, so the selection of OLAP products and their storage modes is particularly important

during the project implementation, the selection of OLAP products and storage modes should consider the size of enterprise data, data processing process and access

Copyright © 2011 JIN SHI