We define Big Data as an evolving term that describes any voluminous amount of structured, semi-structured and unstructured data that has the potential to be mined for information and converted to knowledge.
In 2012, Gartner updated its definition of Big Data as follows: “Big data is high volume, high velocity, and/or high variety information assets that require new forms of processing to enable enhanced decision making, insight discovery and process optimization.” Additionally, a V for “Veracity” and one for “Value” are added by some organizations to describe it.
Rembrandt offers the following Services related to the Big Data Track:
- Big Data Readiness Assessment
- Big Data Strategy & Roadmap
- Big Data Architecture
- Big Data Implementation
Big Data Readiness Assessment: 1-day corporate assessment seminar to prepare and educate clients in the design, architectures and key concepts needed to embark on a Big Data initiative.
Big Data Strategy & Roadmap: In-depth consulting services to help define the Big Data strategy and roadmap to incrementally build the architecture. We build on our experience in developing Department and Enterprise Big Data Frameworks for Pharmaceutical, Consumer Goods, Banking, Brokerage, Insurance and Information Services clients.
Big Data Architecture
Choosing the applicable Big Data architecture depending upon the specifics of an organization’s situation.
Big Data Implementation
Multi-person team time and materials project to develop the appropriate Big Data architecture defined in the architecture deliverable. Rembrandt will supply a project manager, architect, modeler and developers skilled in Hadoop and Map Reduce. We will also supply visualization and statistical personnel skilled in the appropriate toolset and predictive analytics.
Gartner defines the difference between Big Data and Business Intelligence regarding data and their use:
- Business Intelligence uses descriptive statistics with data with high information density to measure things, detect trends etc.
- Big data uses inductive statistics and concepts from nonlinear system identification to infer laws (regressions, nonlinear relationships, and causal effects) from large sets of data with low information density to reveal relationships, dependencies and perform predictions of outcomes and behaviors.
Currently, Big Data can be described by the following characteristics: Volume Variety Velocity Variability, Veracity, Complexity. In 2004, Google published a paper on a process called MapReduce that used such an architecture.
Rembrandt has built the Map Reduce architecture for Clients in Banking, Brokerage, Telecommunications and Big Pharmaceuticals.
The MapReduce framework provides a parallel processing model and associated implementation to process huge amounts of data. With MapReduce, queries are split and distributed across parallel nodes and processed in parallel (the Map step). The results are then gathered and delivered (the Reduce step). The framework was very successful so others wanted to replicate the algorithm. Therefore, an implementation of the MapReduce framework was adopted by an Apache open source project named Hadoop.
Big Data Technologies
Big Data requires exceptional technologies to efficiently process large quantities of data within tolerable elapsed times. Rembrandt suggests suitable technologies include A/B Testing, crowdsourcing, data fusion & integration, genetic algorithms, machine learning, natural language processing, signal processing, simulation, time series analysis and visualization.
Multidimensional Big Data can also be represented as tensors, which can be more efficiently handled by tensor-based computation, such as multilinear subspace learning.
Additional technologies being applied to Big Data include massively parallel-processing (MPP) databases, search-based applications, data mining, distributed file systems, distributed databases, cloud based infrastructure (applications, storage and computing resources) and the Internet.