Jump to content
  • Big Data

       (0 reviews)

    Tim Kannegieter
     Share

    Introduction

    One definition of big data is that the amount of data collected is sufficiently large to allow the development of insights that would be impossible with smaller data collections. Another definition is that big data cannot be dealt with by traditional data analytics techologies. If the questions being asked of the volume of data cannot be easily answered by traditional technologies, then it is big data.

    The primary purpose of big data is to create data based products, whereas traditional analytics' primary purpose is for internal decision support. One way of looking at difference between big data and traditional analytics is shown in the table below. In summary, big data is very large, unstructured and fast moving compared to traditional analytics, which calls for a different approach.

    DM6.thumb.png.d700e9a7097c27be5fd253b1ea4f66c4.png

    In order to be able to analyse information, present it in a meaningful way and visualise it, an enterprise needs to collect and store all data in their legacy systems, CRM systems or ERP systems and data from third party solutions and applications in a data warehouse. A simplified diagram of a typical data warehouse is shown below (excluding ETL software, business intelligence, dashboards and advanced analytic tools).

    DM7.thumb.png.02af572eb01c97cb70b0d39abcbe853e.png

    Diagram courtesy of Arthur Baoustanos, aib Consulting Services

    Big data requires a different storage and aggregation approach. Information from emails, documents, weblogs, social media sources, images and videos is collected in one storage system, or platform. One commonly used open source platform is Hadoop, which stores data in a Hadoop distributor file system (HDFS). Big data in a Hadoop environment is extremely useful for storing and retrieving very large amounts of data. If it is necessary to join databases or different datasets, other tools, such as in-memory computing tools, will be needed to provide the necessary computing power.

    Other technologies used for storing and processing big data are shown in the diagram below.

    DM8.png.616ace6ddacefe3c6f74793e633ee944.png

    Diagram courtesy of Arthur Baoustanos, aib Consulting Services

    These technologies are used to create a big data environment as shown in the following diagram.
    DM9.thumb.png.6ba3f4c160b7a1e32fb05e6fc6431a35.png

    Diagram courtesy of Arthur Baoustanos, aib Consulting Services

    Big data is stored by combining the traditional data warehouse with the big data environment as shown below.

    DM11.png.2d02913e9d8b261a1f88ef67ac9af890.png

    Diagram courtesy of Arthur Baoustanos, aib Consulting Services

    Aggregation vs correlation

    Much of the focus when analysing Big Data is aggregation of data, which is how to reduce the size of data.

    Data correlation, or relating seemingly unrelated data through other data, is challenging with Big Data in an unaggregated form, as in a multi-dimensional data space with a lot of attributes, and a lot of data, the wrong hypothesis will result in the wrong conclusion.

    User interaction with Big Data is through summaries or aggregations. For example, the data from a group of sensors, can be characterised in terms of one-minute, daily, weekly or yearly summaries. In many IoT applications, users do not need to see the source data.

    There are other ways of aggregating Big Data. For example, anomaly detection is an aggregation approach because it takes a lot of data to produce very few results. Some machine learning algorithms can also be thought of as aggregations, as they follow a similar approach.

    Solutions have been developed that analyse source data on insertion and instantaneously stream aggregations of Big Data for users as micro- and macro-summaries which are useful for real-time monitoring and decision support systems.

    Sources: The information on this page has been sourced primarily from the following:

     Share


    User Feedback

    Join the conversation

    You can post now and register later. If you have an account, sign in now to post with your account.
    Note: Your post will require moderator approval before it will be visible.

    Guest
    • Add a review...

      ×   Pasted as rich text.   Restore formatting

        Only 75 emoji are allowed.

      ×   Your link has been automatically embedded.   Display as a link instead

      ×   Your previous content has been restored.   Clear editor

      ×   You cannot paste images directly. Upload or insert images from URL.


×
×
  • Create New...