What you need to finally harness the value of your Hadoop data lake

Can you have too much of a good thing? If you’re not making the best use of it, definitely!

I have met many data technicians who have created a Hadoop data lake because their companies have outgrown their current Oracle, SQL server and other databases. They don’t want to lose any of their data, so they build a Hadoop infrastructure and dump it all in there. But then the question becomes, How do I derive value out of that data?

Many smart organizations using the Hadoop ecosystem will try Hive, Impala or Spark SQL to query the data lake. And many find it painfully limited and slow in performance, often waiting days or weeks to complete their analysis.

Instead, teams put workarounds in place to provide BI analysts the kind of data interactivity and performance that they want. Some would move data to an external data mart or cache it in memory on the application, and analysts end up working with sampled data or summarized data, losing the value of granular detail in the process.

Why Hadoop data lakes need a BI Consumption Layer

We first realized the need for building a BI Consumption Layer from one of our customers, a top 10 US bank, as they explained how Kyvos fits into their big data architecture. Like many Hadoop users, they were struggling to get the kind of performance from Hadoop that their users demanded. It was really important for this bank (as it is for many of our customers) that its BI analysts are able to access and analyze big data in a self-service way using tools they are already familiar with.

We loved the concept of the BI Consumption Layer because it succinctly describes an important aspect of our solution: making data lakes more consumable for the business analyst community.

The BI Consumption Layer — a key component of Kyvos 2.0 — sits between Hadoop data lakes and analysts, allowing them interactive access to big data with sub-second response times using their preferred BI tools.

From the perspective of business analysts, this changes everything. There’s no more waiting for queries to return; no more making requests to IT to run reports; and by accessing data directly in Hadoop, there’s no duplication nor loss of granularity of data — if it’s in the data lake, and they have the right credentials, they can see it. This is what BI on Hadoop should be all about.

Once we establish that all analysts can use the data lake and derive value from it whether they’re an Excel user or user of Tableau or other BI tools, now the question really becomes, How secure is the data lake and who can access what data? Security and authentication is critical when dealing with so much sensitive business data. Here, Kyvos provides organizations with fine grained access control and strong authentication at the row and column level.

Transformative effects

Under the hood, our engineers have applied a lot of patent-pending technological innovations (such as multi-dimensional OLAP on Hadoop) to make analytical processing on the Hadoop cluster highly performant. The outcomes are transformative, not just for data technicians struggling with Hadoop or the business analysts they support, but for the enterprise as it becomes a data-driven business. Consider these three examples:

  • A global investment bank limited its daily risk exposure with a consolidated view of risk across all asset classes and the ability to drill down to trade transactions
  • A telecom company negotiated programming cost savings by analyzing viewer behavior on 14 TBs of set-top box data
  • An international airline company now optimizes seat revenue by understanding passenger behavior to upgrade to premium economy seats

At the end of the day, this is the tremendous value that organizations seek out of big data. But until now, it was mostly trapped beneath the surface of the data lake. The Kyvos BI Consumption Layer lets you finally harness the potential of big data.

