Карта сайта
Версия для печати

Что же такое Big Data?

21 ноября 2012
Что же такое Big Data?
Вы слышите этот термин уже отовсюду: коллег из разных отделов, руководства, сталкиваетесь с Ней в прессе, но все еще не знаете, что стоит за понятием Big Data? Компания SAP, имея отличную экспертизу в этой области, очень подробно описывает, откуда возникают такие массивные объемы информации, где могут возникнуть проблемы с ее управлением, какие основные инструменты стоит использовать и почему навыки управления Big Data становятся критично важными. (Материал опубликован на английском языке)
Big data is the term that market researchers have adopted to refer to what Gartner describes as “high-volume, high-velocity, and/or high-variety information assets that require new forms of processing to enable enhanced decision making, insight discovery and process optimization.” The big question on the minds of IT specialists and managers is: What challenges does big data pose? And where exactly does it come from?

According to predictions made by network specialist Cisco in May 2012, the volume of Internet data will quadruple between 2011 and 2016 to 1.3 zettabytes, or 1,300,000,000,000,000,000,000 bytes, per year. In the same period, the number of Internet-connected devices will double to 19 billion, says Cisco. These will be used by 3.4 billion people –almost half of the global population.

But where do these huge volumes of data come from? Some of it originates from conventional transactions. Another source is wireless WLAN data traffic, which, according to Cisco, will account for about half of all data traffic by 2016. In Germany, where every member of the population will be using five Internet-connected devices by 2016, mobile data traffic is set to increase 21-fold between 2011 and 2016, from 18 to 394 petabytes (PB), or 394,000,000,000,000,000 bytes, per month. By these calculations, mobile data traffic will outgrow fixed-data traffic three-fold in a five-year period. Moreover, video data traffic will comprise 63% of mobile traffic by 2016, compared with its current share of 44%. Faster broadband connections and suitably powerful end devices, such as surveillance cameras, will foster this development.

Where the data comes from

Mobile devices, WLAN, social networks, sensors, and machines – they all generate the kind of mass data that market researchers refer to as big data. But, depending on where the data comes from, its characteristics can vary significantly. This is an important point to bear in mind if you want to get information out of data and turn that information into insight.

According to Gartner, the volume of data traffic is growing by 59% every year. “Today’s information-management disciplines and technologies are no match for this pace of growth,says Mark Beyer, Research Vice President at Gartner.Information managers need to completely rethink their approach to data processing by planning for all dimensions of information management.

Herein lies the problem

While big data certainly presents a problem in terms of storage and analysis, the actual problem, according to Gartner, lies in spotting meaningful patterns within the data that can help companies make better decisions.

The search for meaningful data is hampered by the way in which the data is structured, because this causes difficulties for existing IT systems. Relational databases (RDBMS), which support virtually all core processes, are very good at storing structured transaction data in rows and columns and giving easy access to it. This is because transaction data consists chiefly of data in fields that each have a single data attribute such as a numeric or alphanumeric value. Often the data even describes itself by means of so-called “metadata”.

But where does a relational database store an e-mail that only consists of a header and a text? And how does it store – not to mention analyze – a tweet or Facebook message?  Clearly, either traditional databases require new tools for analyzing data with multiple structures or users need to deploy other databases that are better suited to the job at hand.

Available tools

When it comes to data that already has a degree of structure, the tools are already available. Call-center records, for example, consist of standardized forms that are filled out by call-center agents. These have a prescribed structure that is relatively simple to search through. Web shops, on the other hand, use tools that log users’ mouse-clicks as they browse a web page and create what is known as a “clickstream”. Large companies have been storing clickstreams in data warehouses for years and analyzing them in the hope of recognizing the kinds of patterns that Gartner is referring to.

The results of these analyses give customer-facing departments useful information about how they could improve their advertising, marketing, sales campaigns, and even their product development. This is because the logged mouse-clicks usually provide a very clear picture of where users’ preferences lie and which products or product features do not interest them at all. This kind of analysis is at its most valuable when it reveals new trends that initially appear as statistical outliers. These give companies the potential to develop innovative productsthat could transform them into trend-setting market leaders.

While big data certainly presents a problem in terms of storage and analysis, the actual problem, according to Gartner, lies in spotting meaningful patterns within the data that can help companies make better decisions.

The search for meaningful data is hampered by the way in which the data is structured, because this causes difficulties for existing IT systems. Relational databases (RDBMS), which support virtually all core processes, are very good at storing structured transaction data in rows and columns and giving easy access to it. This is because transaction data consists chiefly of data in fields that each have a single data attribute such as a numeric or alphanumeric value. Often the data even describes itself by means of so-called “metadata”.

But where does a relational database store an e-mail that only consists of a header and a text? And how does it store – not to mention analyze – a tweet or Facebook message?  Clearly, either traditional databases require new tools for analyzing data with multiple structures or users need to deploy other databases that are better suited to the job at hand.


Why handling big data will be a core skill

Whatever an enterprise’s big data plans are, they should definitely be long-term ones. “The ability to handle extremely large data volumes,” predicts Yvonne Genovese, Vice President and analyst at Gartner,will become a core skill in businesses and organizations. Increasingly, they will be looking to use new forms of information – such as text, context, and social media – to identify decision-supporting patterns. This is what Gartner calls a Pattern-Based Strategy.”

This strategy, Genovese continues, is a major driving force behind the big data trend. It relies on using the full range of dimensions in the search for meaningful patterns, and its results provide the basis for modeling new business solutions that allow companies to adapt to changing market conditions. “The cycle of searching, modeling, and adapting can be completed in various media, such as a social media analysis or in context-oriented calculation models.

Worldwide, companies invested 3.38 billion euros in big data projects and services in 2011,reports Steve Janata, a consultant with Experton Group. The market for new solutions will grow by 36% per year between 2011 and 2016 and, in Germany alone, some 350 million euros will be invested in Big Data in 2012. According to Experton, this makes the market for big data one of the fastest-growing segments in the IT industry and a driving force in the IT economy as a whole.


Source: sap.info