1.1 Data Modeling Fundamentals

1.1.1 Introduction and Course Overview

This course covers how to model, transform, and serve data for both analytics and machine learning workloads.

WeekTopicFocus
Week 1Batch data modelingNormalization, star schemas, and modeling approaches
Week 2Data modeling for MLTabular and unstructured data preparation
Week 3Transformation deep-diveDistributed processing with Hadoop, Spark, and AWS services
Week 4Serving dataEnd-to-end pipeline for analytics and ML

1.1.2 Data Modeling Concepts

Data modeling is the practice of choosing a coherent data structure that aligns with business goals and logic. It has traditionally been used to structure data in warehouses and relational databases. During the data lake 1.0 era, modeling was often ignored, leading to โ€œdata swamps.โ€ The recommended approach today is target data modeling - model data for specific business domains.

A data model organizes and standardizes data in a precise, structured representation to enable and guide human and machine behavior, inform decision making, and facilitate actions. For tabular data, this means deciding which tables make up the model, how they relate to each other, and which columns to include.


Good vs. Bad Data Models

Good Data ModelsBad Data Models
Reflect business goals and logic while incorporating business rulesDonโ€™t reflect how the business operates
Ensure compliance with operational and legal requirementsCreate more problems than they solve
Outline relationships between business processesProvide stakeholders with inaccurate information
Serve as a communication tool, creating a shared languageGenerate confusion rather than clarity

Building a Good Data Model

  1. Identify business goals and stakeholder needs
  2. Define system requirements
  3. Choose tools and technologies
  4. Build, evaluate, iterate, and evolve

1.1.3 Conceptual, Logical, and Physical Data Models

Data models exist at three levels of abstraction, each adding more implementation detail.

LevelDescriptionContains
ConceptualHigh-level business entities and relationships, visualized with an Entity-Relationship DiagramEntities, relationships, cardinality notation
LogicalAdds implementation details to the conceptual modelData types, primary keys, foreign keys
PhysicalSpecifies how the logical model is implemented in a specific DBMSConfiguration, storage approach, partitioning, replication

Entity-Relationship Cardinality

ER diagrams use notation to express the nature of relationships between entities:

NotationMeaningExample
||One and only oneEach order detail belongs to exactly one order
|OZero or oneA customer may or may not have a profile
|{One or manyEach order has one or many order details
O{Zero or manyEach product appears in zero or many order details

The symbol at each end of the relationship line describes the cardinality from that entityโ€™s perspective.