2.2 ETL vs. ELT and REST APIs

2.2.1 ETL vs. ELT

The order in which you extract, transform, and load data has significant implications for pipeline speed, flexibility, and data quality.

ETL vs ELT comparison diagram ETL vs ELT comparison diagram

ETL (Extract-Transform-Load) extracts raw data from the source, transforms it in a staging area, then loads the transformed data into the target destination.

ELT (Extract-Load-Transform) loads raw data directly into a cloud data warehouse (e.g., Redshift, Snowflake), then transforms it within the warehouse. This allows flexible transformations to be applied later.



Advantages of ELT

  • Faster implementation and data availability
  • More flexibility in transformations
  • Suitable for semi-structured/unstructured data (e.g., JSON, text, images)

Downsides of ELT

The main risk is creating an โ€œEL pipelineโ€ where no transformation ever happens, turning your data warehouse into a data swamp.


Comparison of ETL vs. ELT

FeatureETLELT
HistoryDeveloped in the 80s/90s when storage was expensiveGained popularity in the cloud era
Transformation TimingBefore loadingAfter loading
Load TimeLongerFaster
FlexibilityStructured data onlyStructured, semi-structured, and unstructured data
ScalabilityManual effort required for scalingUses cloud warehouse power for large-scale processing
Data QualityEnsures data quality before loadingRequires transformations after loading

2.2.2 REST API

APIs are a fundamental ingestion mechanism. Jeff Bezos famously enforced API-based communication within Amazon, a mandate that laid the foundation for AWS.

What is an API? A set of rules and specifications for programmatic communication between applications, typically including metadata, documentation, authentication, and error handling.

REST API (Representational State Transfer) is the most common API type, using HTTP as its communication protocol.



HTTP Request Types

MethodPurpose
GETRetrieve a resource
POSTCreate a resource
PUTUpdate/replace a resource
DELETERemove a resource