Turning Parquet File into a Queryable RESTful API with DuckDB, Quarkus & Kotlin

Parquet files are a powerhouse for storing large, columnar datasets in big data workflows. Traditionally, querying them requires bigdata frameworks like Apache Spark or Hadoop, which bring hefty setup and resource demands. But what if you could skip the heavy lifting and build a lightweight RESTful API instead? Enter DuckDB — a nimble embedded SQL engine that queries Parquet files directly. In this article, I’ll show you how to pair it with *Quarkus *(a cloud-native Java framework) and *Kotlin *(for its clean, safe syntax) to expose Parquet data over HTTP. We’ll use Jdbi’s bindMap to bind REST request bodies, Quarkus’s config for query templates, and add periodic loading of Parquet files from S3. Let’s craft this efficient API! Why This Stack? Here’s a quick look at our tools: Parquet: A columnar file format optimized for analytics, widely used in big data ecosystems DuckDB: An embedded SQL engine that queries Parquet files directly—no servers or clusters needed Quarkus: A fast, lightweight framework built for modern apps, with excellent REST support Kotlin: A concise, null-safe language that makes coding with Quarkus a joy. Our goal Query Parquet data with DuckDB, serve it via Quarkus REST endpoints, and manage query templates with Quarkus’s config—all in Kotlin, without the baggage of traditional big data frameworks. The Big Picture The flow is sleek: Quarkus periodically downloads Parquet files from S3 based on a configurable path (To add this in next article) DuckDB queries these files directly using SQL that configed in yaml. Quarkus serves the results as REST endpoints, with Jdbi binding JSON request bodies to query templates from the config. This delivers a dynamic, lightweight API for Parquet data over HTTP. Let’s Build It Step 1: Set Up with Quarkus Initializr Head to code.quarkus.io to scaffold your project: Artifact: parquet-rest Build Tool: Maven Java Version: 21 Extensions: REST REST Jackson Kotlin (quarkus-kotlin) Generate, download, and unzip Add DuckDB and Jdbi to pom.xml: org.duckdb duckdb_jdbc 1.2.0 org.jdbi jdbi3-kotlin 3.48.0 Step 2: ConfigMapping for Query Templates In this demo, we use userdata.parquet from timestored and put the file to src/test Define query templates in application.properties Quarkus's config mappings is possible to group multiple configuration properties in a single interface that share the same prefix. rest.queries.get-all=SELECT * FROM 'src/test/userdata.parquet' rest.queries.findUsersByLastName=SELECT * FROM 'src/test/userdata.parquet' where last_name = :lastName We can Load them with a config class import io.smallrye.config.ConfigMapping import io.smallrye.config.WithName @ConfigMapping(prefix = "rest") interface QueryConfig { @get:WithName("queries") val queries: Map } Here, @get:WithName("queries") turns the queries() method into a field-like queries property in Kotlin, making access clean and intuitive: config.queries["get-all"] instead of config.queries(). This is a standout feature of Quarkus’s ConfigMapping with Kotlin! Step 3: Querying Parquet with DuckDB and Jdbi Create a service to query the Parquet file: import jakarta.enterprise.context.ApplicationScoped import org.jdbi.v3.core.Jdbi import org.jdbi.v3.core.kotlin.withHandleUnchecked @ApplicationScoped class ParquetService( val config: QueryConfig, ) { private val jdbi = Jdbi.create("jdbc:duckdb:") fun executeQuery( queryName: String, parameters: Map, ): List { val query = config.queries[queryName] ?: throw IllegalArgumentException("Query with name '$queryName' not found in configuration") return jdbi.withHandleUnchecked { handle -> handle .createQuery(query) .bindMap(parameters) .mapToMap() .list() } } } Here, withHandleUnchecked simplifies the code by throwing unchecked exceptions (like SQLException) directly, avoiding the need to declare a checked exception type. It’s a Kotlin-friendly tweak from jdbi3-kotlin, keeping things concise. Step 4: REST Endpoint Expose a POST endpoint: import jakarta.ws.rs.Consumes import jakarta.ws.rs.POST import jakarta.ws.rs.Path import jakarta.ws.rs.Produces import jakarta.ws.rs.core.MediaType import org.jboss.resteasy.reactive.RestPath @Path("/query") class QueryResource( val parquetService: ParquetService, ) { @POST @Path("/{queryName}") @Consumes(MediaType.APPLICATION_JSON) @Produces(MediaType.APPLICATION_JSON) fun queryWithParams( @RestPath queryName: String, body: Map, ) = parquetService.executeQuery(queryName, body) } Step 5: Testing with Res

Feb 23, 2025 - 17:20
 0
Turning Parquet File into a Queryable RESTful API with DuckDB, Quarkus & Kotlin

Parquet files are a powerhouse for storing large, columnar datasets in big data workflows. Traditionally, querying them requires bigdata frameworks like Apache Spark or Hadoop, which bring hefty setup and resource demands. But what if you could skip the heavy lifting and build a lightweight RESTful API instead? Enter DuckDB — a nimble embedded SQL engine that queries Parquet files directly. In this article, I’ll show you how to pair it with *Quarkus *(a cloud-native Java framework) and *Kotlin *(for its clean, safe syntax) to expose Parquet data over HTTP. We’ll use Jdbi’s bindMap to bind REST request bodies, Quarkus’s config for query templates, and add periodic loading of Parquet files from S3. Let’s craft this efficient API!

Why This Stack?

Here’s a quick look at our tools:

  • Parquet: A columnar file format optimized for analytics, widely used in big data ecosystems
  • DuckDB: An embedded SQL engine that queries Parquet files directly—no servers or clusters needed
  • Quarkus: A fast, lightweight framework built for modern apps, with excellent REST support
  • Kotlin: A concise, null-safe language that makes coding with Quarkus a joy.

Our goal

Query Parquet data with DuckDB, serve it via Quarkus REST endpoints, and manage query templates with Quarkus’s config—all in Kotlin, without the baggage of traditional big data frameworks.

The Big Picture

The flow is sleek:

  1. Quarkus periodically downloads Parquet files from S3 based on a configurable path (To add this in next article)
  2. DuckDB queries these files directly using SQL that configed in yaml.
  3. Quarkus serves the results as REST endpoints, with Jdbi binding JSON request bodies to query templates from the config.

This delivers a dynamic, lightweight API for Parquet data over HTTP.

Let’s Build It

Step 1: Set Up with Quarkus Initializr

Image description

Head to code.quarkus.io to scaffold your project:

Artifact: parquet-rest
Build Tool: Maven
Java Version: 21
Extensions:

  • REST
  • REST Jackson
  • Kotlin (quarkus-kotlin)

Generate, download, and unzip

Add DuckDB and Jdbi to pom.xml:

        
        
            org.duckdb
            duckdb_jdbc
            1.2.0
        

        
        
            org.jdbi
            jdbi3-kotlin
            3.48.0
        

Step 2: ConfigMapping for Query Templates

In this demo, we use userdata.parquet from timestored and put the file to src/test

Image description

Define query templates in application.properties

Quarkus's config mappings is possible to group multiple configuration properties in a single interface that share the same prefix.

rest.queries.get-all=SELECT * FROM 'src/test/userdata.parquet'
rest.queries.findUsersByLastName=SELECT * FROM 'src/test/userdata.parquet' where last_name = :lastName

We can Load them with a config class

import io.smallrye.config.ConfigMapping
import io.smallrye.config.WithName

@ConfigMapping(prefix = "rest")
interface QueryConfig {
    @get:WithName("queries")
    val queries: Map<String, String>
}

Here, @get:WithName("queries") turns the queries() method into a field-like queries property in Kotlin, making access clean and intuitive: config.queries["get-all"] instead of config.queries(). This is a standout feature of Quarkus’s ConfigMapping with Kotlin!

Step 3: Querying Parquet with DuckDB and Jdbi

Create a service to query the Parquet file:

import jakarta.enterprise.context.ApplicationScoped
import org.jdbi.v3.core.Jdbi
import org.jdbi.v3.core.kotlin.withHandleUnchecked

@ApplicationScoped
class ParquetService(
    val config: QueryConfig,
) {
    private val jdbi = Jdbi.create("jdbc:duckdb:")

    fun executeQuery(
        queryName: String,
        parameters: Map<String, Any>,
    ): List<Map<String, Any>> {
        val query =
            config.queries[queryName]
                ?: throw IllegalArgumentException("Query with name '$queryName' not found in configuration")

        return jdbi.withHandleUnchecked { handle ->
            handle
                .createQuery(query)
                .bindMap(parameters)
                .mapToMap()
                .list()
        }
    }
}

Here, withHandleUnchecked simplifies the code by throwing unchecked exceptions (like SQLException) directly, avoiding the need to declare a checked exception type. It’s a Kotlin-friendly tweak from jdbi3-kotlin, keeping things concise.

Step 4: REST Endpoint

Expose a POST endpoint:

import jakarta.ws.rs.Consumes
import jakarta.ws.rs.POST
import jakarta.ws.rs.Path
import jakarta.ws.rs.Produces
import jakarta.ws.rs.core.MediaType
import org.jboss.resteasy.reactive.RestPath

@Path("/query")
class QueryResource(
    val parquetService: ParquetService,
) {
    @POST
    @Path("/{queryName}")
    @Consumes(MediaType.APPLICATION_JSON)
    @Produces(MediaType.APPLICATION_JSON)
    fun queryWithParams(
        @RestPath queryName: String,
        body: Map<String, Any>,
    ) = parquetService.executeQuery(queryName, body)
}

Step 5: Testing with RestAssured-Kotlin

Add a test in 'src/test/kotlin'

@QuarkusTest
class QueryResourceTest {
    @Test
    fun testQueryEndpoint() {
        Given {
            body("{}")
            header("Content-Type", "application/json")
        } When {
            post("/query/get-all")
        } Then {
            statusCode(200)
        }
    }
}

Here, restassured kotlin-extensions DSL made RESTful test more readable in Given-When-Then tyle

Testing It Out

Run the app:

./mvnw quarkus:dev

Test with http test

### GET All
POST http://localhost:8080/query/get-all
Content-Type: application/json

{}

### filter-by-column
POST http://localhost:8080/query/findUsersByLastName
Content-Type: application/json

{
  "lastName": "Mason"
}

Test with http test

Why This Shines

This setup skips big data frameworks for a lightweight win. DuckDB queries Parquet natively, Quarkus handles REST and scheduling, and Jdbi3-Kotlin’s withHandleUnchecked keeps queries sleek. ConfigMapping with @get:WithName makes config.queries a joy, and RestAssured ensures it all works reliably.

Wrapping Up

We’ve built a RESTful API for Parquet files with DuckDB, Quarkus, and Kotlin, dodging heavy frameworks. With Jdbi3-Kotlin, ConfigMapping’s field-like access, and RestAssured tests, it’s robust and simple. Try it with your own data—tweak the queries and share your thoughts in the comments!

The full project can be found on my github.