Study Notes 4.2.1 | 4.2.2: DBT Project Setup
Introduction to DBT Projects DBT (Data Build Tool) is a framework that helps transform data in your data warehouse by enabling analytics engineers to work more effectively with SQL. Here are comprehensive notes on setting up a DBT project based on the provided transcripts. Project Structure Every DBT project has a standardized structure: Standard folders with specific naming conventions Global configuration files Models for your data transformations Two Setup Methods Method A: Using DBT Cloud (BigQuery) Project Initialization: DBT Cloud IDE guides you through setup process Connection to git repository (GitHub in this course) is required IDE creates necessary folder structure and configuration files Configuration: Project name defined in project YAML file Connection to BigQuery configured under account settings Subdirectory can be defined if creating project in existing repository Workflow in DBT Cloud: Branch management through the cloud interface Work from a development branch, not main Initial commit creates basic "Hello DBT" project structure Method B: Using DBT Core Locally (Postgres) Prerequisites: Empty repository to store your project Local database (Postgres in this example) Appropriate adapter installed (postgres-adapter or bigquery-adapter) Profile Configuration: Create profiles.yml file at ~/.dbt/profiles.yml Define database connections (user, password, database, schema) Can define multiple targets (dev, prod) and even multiple database types Project Initialization: Run dbt init command in your repository folder Select database type when prompted Name your project (e.g., "taxi_rides_new_york") Key Configuration Files dbt_project.yml: Project name definition Profile selection (must match profiles.yml) Global configurations Default materializations (table/view) for models Custom variables profiles.yml (for local setup): Located outside project at ~/.dbt/profiles.yml Contains database connection details Allows defining multiple targets and databases Sets number of threads for parallel execution Initial Project Structure After initialization, your project contains: models/ directory with example models analyses/ directory for non-materialized SQL tests/ directory for data quality tests macros/ directory for reusable SQL snippets seeds/ directory for CSV data Configuration files Validation and Testing Run dbt debug to verify your connection settings Run dbt deps to install dependencies Run dbt build to verify the entire project works Development Workflow Create a development branch (don't work directly on main) Initialize project structure Configure project and connection settings Start building models Commit changes to repository Notes on Model Materialization The project.yml file allows you to set default materializations: Tables: Persistent and fully materialized Views: Virtual tables that run their query on access These can be applied globally or to specific directories within your models. Connecting to Data Warehouse For BigQuery: Configure through DBT Cloud account settings For Postgres: Define connection in profiles.yml with credentials Common Commands dbt init: Start a new project dbt debug: Test connections and project configuration dbt deps: Install dependencies dbt build: Run all models, tests, and snapshots These notes cover the fundamentals of setting up a DBT project using either the cloud interface with BigQuery or locally with Postgres.

Introduction to DBT Projects
DBT (Data Build Tool) is a framework that helps transform data in your data warehouse by enabling analytics engineers to work more effectively with SQL. Here are comprehensive notes on setting up a DBT project based on the provided transcripts.
Project Structure
Every DBT project has a standardized structure:
- Standard folders with specific naming conventions
- Global configuration files
- Models for your data transformations
Two Setup Methods
Method A: Using DBT Cloud (BigQuery)
-
Project Initialization:
- DBT Cloud IDE guides you through setup process
- Connection to git repository (GitHub in this course) is required
- IDE creates necessary folder structure and configuration files
-
Configuration:
- Project name defined in project YAML file
- Connection to BigQuery configured under account settings
- Subdirectory can be defined if creating project in existing repository
-
Workflow in DBT Cloud:
- Branch management through the cloud interface
- Work from a development branch, not main
- Initial commit creates basic "Hello DBT" project structure
Method B: Using DBT Core Locally (Postgres)
-
Prerequisites:
- Empty repository to store your project
- Local database (Postgres in this example)
- Appropriate adapter installed (postgres-adapter or bigquery-adapter)
-
Profile Configuration:
- Create
profiles.yml
file at~/.dbt/profiles.yml
- Define database connections (user, password, database, schema)
- Can define multiple targets (dev, prod) and even multiple database types
- Create
-
Project Initialization:
- Run
dbt init
command in your repository folder - Select database type when prompted
- Name your project (e.g., "taxi_rides_new_york")
- Run
Key Configuration Files
-
dbt_project.yml:
- Project name definition
- Profile selection (must match profiles.yml)
- Global configurations
- Default materializations (table/view) for models
- Custom variables
-
profiles.yml (for local setup):
- Located outside project at
~/.dbt/profiles.yml
- Contains database connection details
- Allows defining multiple targets and databases
- Sets number of threads for parallel execution
- Located outside project at
Initial Project Structure
After initialization, your project contains:
-
models/
directory with example models -
analyses/
directory for non-materialized SQL -
tests/
directory for data quality tests -
macros/
directory for reusable SQL snippets -
seeds/
directory for CSV data - Configuration files
Validation and Testing
- Run
dbt debug
to verify your connection settings - Run
dbt deps
to install dependencies - Run
dbt build
to verify the entire project works
Development Workflow
- Create a development branch (don't work directly on main)
- Initialize project structure
- Configure project and connection settings
- Start building models
- Commit changes to repository
Notes on Model Materialization
The project.yml file allows you to set default materializations:
- Tables: Persistent and fully materialized
- Views: Virtual tables that run their query on access
These can be applied globally or to specific directories within your models.
Connecting to Data Warehouse
- For BigQuery: Configure through DBT Cloud account settings
- For Postgres: Define connection in profiles.yml with credentials
Common Commands
-
dbt init
: Start a new project -
dbt debug
: Test connections and project configuration -
dbt deps
: Install dependencies -
dbt build
: Run all models, tests, and snapshots
These notes cover the fundamentals of setting up a DBT project using either the cloud interface with BigQuery or locally with Postgres.