Architecting the Modern Data Lake for AI/ML
The Open Table Formats (OTFs) designed by Netflix (Apache Iceberg), Uber (Aache Hudi), and Databricks (Delta Lake) have made it possible to build a cloud-native data infrastructure capable of supporting all the requirements of AI/ML. Such a data infrastructure can hold all data needed for all model types and scale out as capacity requirements change. This session will present a reference architecture for building an AI/ML data infrastructure and show how it supports MLOps, distributed training, and advanced data manipulation techniques made possible by OTF-based data storage. Additionally, this talk will focus on how enterprise data is key to all Generative AI efforts, importance of a modern data infrastructure and how high performance data is the key factor for all of these efforts.