Debugging Complex SQL Queries: A Structured Logging Approach

TLDR: Use json_build_object to convert relations (tables or CTE/WITH clauses) to JSON. Write a helper JavaScript function that abstracts the approach. Background and problem Imagine you are working on something exciting, colorful, and wonderful like a warehouse management system. As the system evolves, new features, requirements, and processing logic keep expanding, SQL queries start becoming more complex, so you resort to using WITH clauses to keep everything organized and maintainable. It works great. However, when it comes to debugging intermediate steps/states/variables, while in the application layer, you start reaching out to structured logging libraries like JavaScript's Pino or Golang's slog. The idea Postgres has advanced JSON support, so why not do something like the following? JSON.stringify({ data1: data1, data2: data2 }); In SQL, that would translate to: SELECT json_build_object( 'data1', data1, 'data2', data2 ) FROM data1, data2; Or, in a way to avoid cross joins and using JSONB (supposedly both are better for performance): SELECT jsonb_build_object( 'data1', (SELECT jsonb_agg(data1) FROM data1), 'data2', (SELECT jsonb_agg(data2) FROM data2) ); Example Imagine your project manager assigns the following user story to you: User Story: Inventory Reorder Alert As a warehouse manager, I need a report identifying products requiring immediate reorder based on low total stock quantities across all locations. Requirements: Use the provided sample data (locations: S1, S2, S3; products and quantities). Calculate the total quantity per product across all locations. Flag products as Reorder if total stock is below 50 units; otherwise, mark as Sufficient. Display only products needing reorder in the final output. Acceptance Criteria: Output columns: product, stock_status. Only include rows where stock_status is Reorder. You could solve it using a single SQL query. Notice that we are using CTEs and are only interested in Step 1 and Step 2. -- Define raw data using the VALUES clause. WITH raw_data(location, product, quantity) AS ( VALUES ('S1', 'Cocoa beans', 40), ('S2', 'Cocoa beans', 10), ('S2', 'Sugar', 20), ('S3', 'Vanilla', 75) ), -- Aggregation: Get total sum per product. step1 AS ( SELECT product, SUM(quantity) AS quantity FROM raw_data GROUP BY product ), -- Transformation: Add a stock status based on total quantity. step2 AS ( SELECT product, quantity, CASE WHEN quantity `'${rel}', (SELECT jsonb_agg(${rel}) FROM ${rel})`) .join(', '); qb.with( 'logged_steps', knex.raw(`SELECT jsonb_build_object(${pairs}) AS json_payload`) ); } Usage: knex .select('step3.*') .with('generated_data', '(VALUES ...)') .with('step1', '...') .with('step2', '...') .with('step3', '...') .from('step3') .modify(withLoggedSteps, 'step1', 'step2') .then(rows => console.log(rows)); Interactive demo See: Live demo Using: PGLite in-memory embedded Postgres database. Knex and a PGLite "driver" https://pglite.dev/docs/orm-support#knex-js https://github.com/czeidler/knex-pglite Some REPL library (vue-live) Performance Disclaimer Let's be honest - in my day-to-day development, I'm typically working with small, filtered datasets. Most of the time, I'm debugging queries for a specific product or a narrow time range. So while these JSON logging techniques work wonderfully in my development environment, I haven't extensively tested them on massive datasets that might process millions of rows. When you're in development mode, logging overhead is practically unimportant. We're here to understand why our queries behave strangely and to figure out where your complex query goes wrong. I am aware of alternative approaches like using RAISE NOTICE or inserting intermediate results into temporary tables. I have not tested them. Conclusion And there you have it, implementing JSON structured logging in SQL thanks to Postgres features. Live demo END.

Mar 4, 2025 - 19:17
 0
Debugging Complex SQL Queries: A Structured Logging Approach

TLDR:

  • Use json_build_object to convert relations (tables or CTE/WITH clauses) to JSON.
  • Write a helper JavaScript function that abstracts the approach.

Background and problem

Imagine you are working on something exciting, colorful, and wonderful like a warehouse management system.

As the system evolves, new features, requirements, and processing logic keep expanding,
SQL queries start becoming more complex, so you resort to using WITH clauses to keep everything organized and maintainable. It works great.

However, when it comes to debugging intermediate steps/states/variables,
while in the application layer, you start reaching out to structured logging libraries like JavaScript's Pino or Golang's slog.

The idea

Postgres has advanced JSON support, so why not do something like the following?

JSON.stringify({
  data1: data1,
  data2: data2
});

In SQL, that would translate to:

SELECT json_build_object(
  'data1', data1,
  'data2', data2
)
FROM data1, data2;

Or, in a way to avoid cross joins and using JSONB (supposedly both are better for performance):

SELECT jsonb_build_object(
  'data1', (SELECT jsonb_agg(data1) FROM data1),
  'data2', (SELECT jsonb_agg(data2) FROM data2)
);

Example

Imagine your project manager assigns the following user story to you:

User Story: Inventory Reorder Alert

As a warehouse manager,
I need a report identifying products requiring immediate reorder based on low total stock quantities across all locations.

Requirements:

  • Use the provided sample data (locations: S1, S2, S3; products and quantities).
  • Calculate the total quantity per product across all locations.
  • Flag products as Reorder if total stock is below 50 units; otherwise, mark as Sufficient.
  • Display only products needing reorder in the final output.

Acceptance Criteria:

  • Output columns: product, stock_status.
  • Only include rows where stock_status is Reorder.

You could solve it using a single SQL query.

Notice that we are using CTEs and are only interested in Step 1 and Step 2.

-- Define raw data using the VALUES clause.
WITH raw_data(location, product, quantity) AS (
    VALUES
        ('S1', 'Cocoa beans', 40),
        ('S2', 'Cocoa beans', 10),
        ('S2', 'Sugar', 20),
        ('S3', 'Vanilla', 75)
),
-- Aggregation: Get total sum per product.
step1 AS (
    SELECT product,
           SUM(quantity) AS quantity
    FROM raw_data
    GROUP BY product
),
-- Transformation: Add a stock status based on total quantity.
step2 AS (
    SELECT
         product,
         quantity,
        CASE
            WHEN quantity < 50 THEN 'Reorder'
            ELSE 'Sufficient'
        END AS stock_status
    FROM step1
),
-- Filter
step3 AS (
    SELECT
        product,
        stock_status
    FROM step2
    WHERE stock_status = 'Reorder'
),
logged_steps AS (
    SELECT jsonb_build_object(
        'step1', (SELECT jsonb_agg(step1) FROM step1),
        'step2', (SELECT jsonb_agg(step2) FROM step2)
    ) AS json_payload
)
SELECT *
FROM step3, logged_steps

Better DX

We can reuse this logic with the help of query builders like Knex and its modify method.

Helper:

function withLoggedSteps(qb, ...relations) {
  qb.crossJoin('logged_steps');
  qb.select('logged_steps.json_payload');
  const pairs = relations
    .map(rel => `'${rel}', (SELECT jsonb_agg(${rel}) FROM ${rel})`)
    .join(', ');
  qb.with(
    'logged_steps',
    knex.raw(`SELECT jsonb_build_object(${pairs}) AS json_payload`)
  );
}

Usage:

knex
  .select('step3.*')
  .with('generated_data', '(VALUES ...)')
  .with('step1', '...')
  .with('step2', '...')
  .with('step3', '...')
  .from('step3')
  .modify(withLoggedSteps, 'step1', 'step2')
  .then(rows => console.log(rows));

Interactive demo

See: Live demo

Using:

Performance Disclaimer

Let's be honest - in my day-to-day development, I'm typically working with small, filtered datasets.
Most of the time, I'm debugging queries for a specific product or a narrow time range.

So while these JSON logging techniques work wonderfully in my development environment, I haven't extensively tested them on massive datasets that might process millions of rows.

When you're in development mode, logging overhead is practically unimportant.
We're here to understand why our queries behave strangely and to figure out where your complex query goes wrong.

I am aware of alternative approaches like using RAISE NOTICE or inserting intermediate results into temporary tables. I have not tested them.

Conclusion

And there you have it, implementing JSON structured logging in SQL thanks to Postgres features.

Live demo

END.