The architecture principle that drove the creation of the data-lake paradigm was:
These principles give rise to the data-lake pattern, where considerable investment by web-scale companies {google, amazon, Alibaba, Facebook, etc} continue to accrue new insights from the click-stream of users; this in turn led to the widespread adoption of Hadoop and its many derivatives as a data-storage pattern.
The allegory of a lake is appealing because lakes store water for later use, but also imply an effortless natural process rather than the effort and cost of a building reservoir. Data-sewer is the anti-pattern of the data-lake pattern, you really do not know whether you’ve built a lake or sewer until you try to accrue value from it.
The reason for is post is to highlights three traits that lead to data-sewers rather than a data-lakes:
The Architecture mistake is to see Hadoop as a paradigm shift in technology rather than a (potentially) cheaper data-warehouse. When cloud providers offer hybrid solutions that combine traditional MPP databases (SQL/Server PDW, Oracle Exadata, etc) with Hadoop/Spark/Kafka integration and block-storage replacing HDFS.. it is not unreasonable for business sponsors to question whether all the effort was a waste of time.
Microsoft Synapse is one example of technology advance obsoleting chief data offices