Can you have too much of a good thing? If you’re not making the best use of it, definitely!
I have met many data technicians who have created a Hadoop data lake because their companies have outgrown their current Oracle, SQL server and other databases. They don’t want to lose any of their data, so they build a Hadoop infrastructure and dump it all in there. But then the question becomes, How do I derive value out of that data?
Many smart organizations using the Hadoop ecosystem will try Hive, Impala or Spark SQL to query the data lake. And many find it painfully limited and slow in performance, often waiting days or weeks to complete their analysis.