Data Lake
A centralized storage repository (often on ADLS, S3, or GCS) that holds raw structured and unstructured data at scale. It serves as the foundation for analytics, ML, and data warehousing on Databricks.
Delta Lake
An open-source storage layer that brings ACID transactions, schema enforcement, time travel, and data reliability to the Data Lake. It transforms a basic lake into a Lakehouse, combining the best of data lakes and data warehouses.
Unity Catalog
Databricks’ unified governance layer providing centralized data access control, lineage, and auditing across workspaces. It manages permissions at the table, column, and data asset levels — ensuring compliance and consistent data governance.
Data Intelligence
The Databricks Data Intelligence Platform integrates AI, ML, and analytics on top of unified data — enabling intelligent data discovery, semantic understanding, and AI-assisted development through tools like Databricks Assistant.
Roles in Databricks Ecosystem
- Data Engineer – Builds and optimizes ETL/ELT pipelines, manages Delta tables, and ensures data quality/performance.
- Data Analyst – Uses SQL Analytics and notebooks for querying, dashboarding, and reporting.
- Data Scientist – Develops ML models using Python, R, or MLflow on shared datasets.
- Data Steward / Admin – Manages Unity Catalog, governance, and access control.
- ML Engineer / Architect – Designs scalable ML pipelines and integrates AI workloads within the Lakehouse.