Карта сайта
Версия для печати

Как сочетается технология Hadoop с Вашей стратегией использования решений Business Intelligence?

31 января 2013 Многие люди отождествляют Большие Данные с технологией Hadoop. Тем не менее, не все решения интеллектуальной аналитики поддерживают или предусмотрены для интеграции с сервисом Hadoop. (Материал опубликован на английском языке)
Over the past 20 years, a number of different data structures and technologies have been introduced to increase performance or enable a BI capability; many of these are self-service oriented, and they all deliver different levels of capabilities depending on the problem they are intended to solve.

For example, the decision to move and transform operational data to an operational data store (ODS), to an enterprise data warehouses (EDW) or to some variation of OLAP is often made to improve performance or enhance broad consumability by business people, particularly for interactive analysis. Business rules are needed to interpret data and to enable BI capabilities such as drill up/drill down. The more business rules built into the data stores, the less modelling effort needed between the curated data and the BI deliverable.

Figure 1 - The relative BI modelling effort needed for an ODS, EDW and OLAP data store.

Hadoop is another data storage choice in this technology continuum. The Hadoop Distributed File System (HDFS) or Hive is often used to store transactional data in its “raw state.” The map-reduce processing supported by these Hadoop frameworks can deliver great performance, but it does not support the same specialized query optimization that mature relational database technologies do. Improving query performance, at this time, requires acquiring query accelerators or writing code. In other words, retrieving a list of transactions for specific dates, geography and so forth may be fast and simple but aggregate-oriented calculations – average same-store sales or sales by square feet , for example – will likely require programming skills to obtain the desired performance.

Hadoop-based data tends to be limited to reporting capabilities in a business intelligence application due to its batch oriented processing. Good performance for interactive capabilities may be achieved for specific areas, but performance for general ad hoc queries may not be satisfactory due to the overhead in setting up jobs for processing. Contributions, such as Impala, to the Apache open source project establish a starting point for delivering better performance for interactivity, but this technology needs to evolve and mature before broad adoption is feasible.

Leveraging systems that are optimized for interactive analytics is recommended when data is frequently analyzed or being delivered to interactive dashboards. The diagram below extends the previous diagram to convey where Hadoop-based data fits in the data store continuum.

Figure 2 - Inclusion of Hadoop to illustrate the relative effort to describe data in a BI application

 In conclusion, the key question isn’t “Does your BI tool support [my Hadoop technology]?” It really needs to be “What is the best way to leverage an Hadoop infrastructure with my BI tool?”

Source:  ibmbigdatahub.com