Study Notes 4.2.1 | 4.2.2: DBT Project Setup

Introduction to DBT Projects DBT (Data Build Tool) is a framework that helps transform data in your data warehouse by enabling analytics engineers to work more effectively with SQL. Here are comprehensive notes on setting up a DBT project based on the provided transcripts. Project Structure Every DBT project has a standardized structure: Standard folders with specific naming conventions Global configuration files Models for your data transformations Two Setup Methods Method A: Using DBT Cloud (BigQuery) Project Initialization: DBT Cloud IDE guides you through setup process Connection to git repository (GitHub in this course) is required IDE creates necessary folder structure and configuration files Configuration: Project name defined in project YAML file Connection to BigQuery configured under account settings Subdirectory can be defined if creating project in existing repository Workflow in DBT Cloud: Branch management through the cloud interface Work from a development branch, not main Initial commit creates basic "Hello DBT" project structure Method B: Using DBT Core Locally (Postgres) Prerequisites: Empty repository to store your project Local database (Postgres in this example) Appropriate adapter installed (postgres-adapter or bigquery-adapter) Profile Configuration: Create profiles.yml file at ~/.dbt/profiles.yml Define database connections (user, password, database, schema) Can define multiple targets (dev, prod) and even multiple database types Project Initialization: Run dbt init command in your repository folder Select database type when prompted Name your project (e.g., "taxi_rides_new_york") Key Configuration Files dbt_project.yml: Project name definition Profile selection (must match profiles.yml) Global configurations Default materializations (table/view) for models Custom variables profiles.yml (for local setup): Located outside project at ~/.dbt/profiles.yml Contains database connection details Allows defining multiple targets and databases Sets number of threads for parallel execution Initial Project Structure After initialization, your project contains: models/ directory with example models analyses/ directory for non-materialized SQL tests/ directory for data quality tests macros/ directory for reusable SQL snippets seeds/ directory for CSV data Configuration files Validation and Testing Run dbt debug to verify your connection settings Run dbt deps to install dependencies Run dbt build to verify the entire project works Development Workflow Create a development branch (don't work directly on main) Initialize project structure Configure project and connection settings Start building models Commit changes to repository Notes on Model Materialization The project.yml file allows you to set default materializations: Tables: Persistent and fully materialized Views: Virtual tables that run their query on access These can be applied globally or to specific directories within your models. Connecting to Data Warehouse For BigQuery: Configure through DBT Cloud account settings For Postgres: Define connection in profiles.yml with credentials Common Commands dbt init: Start a new project dbt debug: Test connections and project configuration dbt deps: Install dependencies dbt build: Run all models, tests, and snapshots These notes cover the fundamentals of setting up a DBT project using either the cloud interface with BigQuery or locally with Postgres.

Feb 25, 2025 - 22:06

Study Notes 4.2.1 | 4.2.2: DBT Project Setup

Introduction to DBT Projects

DBT (Data Build Tool) is a framework that helps transform data in your data warehouse by enabling analytics engineers to work more effectively with SQL. Here are comprehensive notes on setting up a DBT project based on the provided transcripts.

Project Structure

Every DBT project has a standardized structure:

Standard folders with specific naming conventions
Global configuration files
Models for your data transformations

Two Setup Methods

Method A: Using DBT Cloud (BigQuery)

Project Initialization:
- DBT Cloud IDE guides you through setup process
- Connection to git repository (GitHub in this course) is required
- IDE creates necessary folder structure and configuration files
Configuration:
- Project name defined in project YAML file
- Connection to BigQuery configured under account settings
- Subdirectory can be defined if creating project in existing repository
Workflow in DBT Cloud:
- Branch management through the cloud interface
- Work from a development branch, not main
- Initial commit creates basic "Hello DBT" project structure

Method B: Using DBT Core Locally (Postgres)

Prerequisites:
- Empty repository to store your project
- Local database (Postgres in this example)
- Appropriate adapter installed (postgres-adapter or bigquery-adapter)
Profile Configuration:
- Create profiles.yml file at ~/.dbt/profiles.yml
- Define database connections (user, password, database, schema)
- Can define multiple targets (dev, prod) and even multiple database types
Project Initialization:
- Run dbt init command in your repository folder
- Select database type when prompted
- Name your project (e.g., "taxi_rides_new_york")

Key Configuration Files

dbt_project.yml:
- Project name definition
- Profile selection (must match profiles.yml)
- Global configurations
- Default materializations (table/view) for models
- Custom variables
profiles.yml (for local setup):
- Located outside project at ~/.dbt/profiles.yml
- Contains database connection details
- Allows defining multiple targets and databases
- Sets number of threads for parallel execution