Key concepts and technologies covered across all four courses.
Batch Processing
Data Engineering Lifecycle
Data Pipeline
Ingestion
Serving
Stream Processing
Transformation
Undercurrents
ACID Compliance
Amazon DynamoDB
IoT Devices
Logs
NoSQL Databases
OLTP
Relational Databases
REST API
Source Systems
Structured Data
Unstructured Data
Airbyte
Amazon Data Firehose
Amazon Kinesis Data Streams
Amazon MSK
AWS DMS
Batch Ingestion
Change Data Capture (CDC)
Dead Letter Queue
Debezium
Event Streaming Platform
Fivetran
JDBC/ODBC
Message Queue
Stream Ingestion
Amazon EBS
Amazon EFS
Amazon RDS
Amazon S3
Block Storage
Cassandra
Column-Oriented Storage
Distributed Storage Systems
File Storage
HDFS
In-Memory Storage
Memcached
Object Storage
Redis
Row-Oriented Storage
Storage Tiers
Apache Hudi
Apache Iceberg
AWS Lake Formation
Data Lake
Data Lakehouse
Data Mart
Data Warehouse
Delta Lake
Medallion Architecture
Open Table Formats
Schema Evolution
Schema-on-Read
Schema-on-Write
Separation of Storage and Compute
Aggregate Queries
Amazon Athena
B-Tree Index
Common Table Expressions (CTEs)
Distribution Styles
Exactly-Once Semantics
EXPLAIN
Hash Join
Index
Joins
Massively Parallel Processing (MPP)
Partition Pruning
Query Optimizer
Redshift Spectrum
Sort Key
SQL
Streaming Queries
VACUUM
Watermark (Streaming)
Window Functions
Windowing (Tumbling, Sliding, Session)
Conformed Dimension
Data Vault
Denormalization
Dimension Table
Entity-Relationship Diagram
Fact Table
Grain
Inmon Approach
Kimball Approach
Normalization
One Big Table (OBT)
Primary Key / Foreign Key
Slowly Changing Dimension (SCD)
Star Schema
Surrogate Key
Apache Hadoop
Apache Spark
Backfill
dbt
ELT
ETL
Feature Engineering
Idempotency
MapReduce
PySpark
Reverse ETL
Spark DataFrames
Spark Structured Streaming
User-Defined Functions (UDFs)
Amazon QuickSight
Amazon SageMaker
Business Intelligence
Embedded Analytics
Materialized View
Metabase
Operational Analytics
Semantic Layer
View
Amazon MWAA
Apache Airflow
Dagster
Directed Acyclic Graph (DAG)
Mage
Prefect
Taskflow API
XComs
Amazon CloudWatch
CI/CD
Data Contract
Data Governance
Data Lineage
Data Observability
Data Quality
DataOps
Great Expectations
Incident Response
Infrastructure as Code (IaC)
Terraform
Amazon EC2
Amazon EMR
Amazon Neptune
Amazon Redshift
Amazon SQS
Amazon VPC
Apache Flink
Apache Kafka
AWS CloudFormation
AWS Glue
AWS IAM
AWS Lambda
AWS Well-Architected Framework
Boto3
Databricks
Glue Data Catalog
Google BigQuery
MySQL
Neo4j
Oracle
PostgreSQL
Snowflake
Apache Avro
Apache Parquet
Compression
CSV
JSON
Serialization
XML
Availability Zones
CAP Theorem
Conway's Law
Data Architecture
Data Mesh
FinOps
GDPR
Kappa Architecture
Lambda Architecture
Loosely Coupled Systems
Monolith vs. Modular
Partitioning (Sharding)
Principle of Least Privilege
Replication
Serverless
Shared Responsibility Model
Total Cost of Ownership (TCO)