Jan 17 2019
Believe it or not, business intelligence is an extremely old field of study. It was first mentioned in 1865 regarding a banker who utilized information about the outcome of battles to make strategic business decisions. Hans Peter Luhn of IBM published one of the earliest articles about business intelligence in 1958. Forrester Research defines business intelligence as “a set of methodologies, processes, architectures, and technologies that transform raw data into meaningful and useful information used to enable more effective strategic, tactical, and operation insights and decision-making.” The global market for business intelligence is expected to reach a value of $29.48M by 2022 (Stratistics MRC, 2017).
Real-time analytics is one of the fastest-growing fields in business analytics. While previous generations of business intelligence analytics such as data warehousing and data mining focused primarily on analyzing historical data in an offline fashion, real-time analytics focuses on analyzing data as soon as it is available. This allows business decisions to be made quicker, reacting in real-time to changes in business conditions, and allowing new opportunities to be exploited before competitors get to it. Most business real-time analytic solutions strive for sub-second latencies, and extremely high availability (four nines or better). Typical users include the financial services markets, governments, retail/e-commerce, and high-tech marketing.
Some of the more popular real-time analytics applications include MongoDB, Redis, MemSQL, SAP HANA, and Aerospike, nearly all of which are in-memory databases. In all of these applications, significant design energy went into making these applications scale well, balancing loads across multiple servers to increase performance and availability. Beyond that, each company has taken a different approach: SAP HANA uses a column-oriented data storage structure, Redis uses a key-value approach; while MemSQL uses lock-free data structures and just-in time compilation to speed processing. While these techniques tend to maintain realtime performance as the data set scales, they also result in very large compute clusters, which increase both the cost and energy usage of these applications. Throw in the direction of using object-based versus file-based and you create even more challenges moving forward. Our next blog will look at some if these issues, and how to effectively address them.