Create a POC for data lakehouse solution using:
1. hdfs for storage,
2. deltalake, dremio to query delta lake, dremio-superset integration
It should have folln features:
1. ACID support
2. data governance feature
3. Data/metadata discovery using Amundsen
4. data catalog with Apache atlas .. here Amundsen to be integrated with Atlas.
5. provision for masking/encrypting data based on role accessing the data.
6. security controls and audit.
1. demo showcasing all the above requirements.
2. setup instructions/scripts that will help install the POC setup.
3. any scripts used to configure the products used in solution.
4. explanation of each and every solution component used, its configuration etc.
5. spark jobs code used to showcase one end to end scenario