RavenDB 7.0 Released: Snowflake & data warehouse integration
RavenDB 7.0 adds Snowflake integration to the set of ETL targets it supports. Snowflake is a data warehouse solution, designed for analytics and data at scale. RavenDB is aimed at transactional scenarios and has a really good story around data distribution and wide geographical deployments. You can check out the documentation to read the details about how you can use this integration to push data from RavenDB to your Snowflake database. In this post, I want to introduce one usage scenario for such integration.RavenDB is commonly deployed on the edge, running on site in grocery stores, restaurants’ self-serve kiosks, supermarket checkout counters, etc. Such environments have to be tough and resilient to errors, network problems, mishandling, and much more. We had to field support calls in the style of “there is ketchup all over the database”, for example.In such environments, you must operate completely independently of the cloud. Both because of latency and performance issues and because you must keep yourself up & running even if the Internet is down. RavenDB does very well in such a scenario because of its internal architecture, the ability to run in a multi-master configuration, and its replication capabilities.From a business perspective, that is a critical capability, to push data to the edge and be able to operate independently of any other resource. At the same time, this represents a significant challenge since you lose the ability to have an overall view of what is going on. RavenDB’s Snowflake integration is there to bridge this gap. The idea is that you can define Snowflake ETL processes that would push the data from all the branches you have to a single shared Snowflake account. Your headquarters can then run queries, analyse the data, and in general have near real-time analytics without hobbling the branches with having to manage the data remotely.The Grocery Store ScenarioIn our grocery store, we manage the store using RavenDB as the backend. With documents such as this to record sales:{ "Items": [ { "ProductId": "P123", "ProductName": "Milk", "QuantitySold": 5 }, { "ProductId": "P456", "ProductName": "Bread", "QuantitySold": 2 }, { "ProductId": "P789", "ProductName": "Eggs", "QuantitySold": 10 } ], "Timestamp": "2025-02-28T12:00:00Z", "@metadata": { "@collection": "Orders" } }And this document to record other inventory updates in the store:{ "ProductId": "P123", "ProductName": "Milk", "QuantityDiscarded": 3, "Reason": "Spoilage", "Timestamp": "2025-02-28T14:00:00Z", "@metadata": { "@collection": "DiscardedInventory" } }These documents are repeated many times for each store, recording the movement of inventory, tracking sales, etc. Now we want to share those details with headquarters. There are two ways to do that. One is to use the Snowflake ETL to push the data itself to the HQ’s Snowflake account. You can see an example of that when we push the raw data to Snowflake in this article.The other way is to make use of RavenDB’s map/reduce capabilities to do some of the work in each store and only push the summary data to Snowflake. This can be done in order to reduce the load on Snowflake if you don’t need a granular level of data for analytics.Here is an example of such a map/reduce index:// Index Name: ProductConsumptionSummary // Output Collection: ProductsConsumption map('Orders', function(order) { return { ProductId: order.ProductId, TotalSold: order.QuantitySold, TotalDiscarded: 0, Date: dateOnly(order.Timestamp) }; }); map('DiscardedInventory', function(discard) { return { ProductId: discard.ProductId, TotalSold: 0, TotalDiscarded: discard.QuantityDiscarded, Date: dateOnly(order.Timestamp) }; }); groupBy(x => ({x.ProductId, x.Date) ) .aggregate(g =>{ ProductId: g.key.ProductId, Date: g.key.Date, TotalSold: g.values.reduce((c, val) => g.TotlaSld + c, 0), TotalDiscarded: g.values.reduce((c, val) => g.TotalDiscarded + c, 0) });This index will output its documents to the artificial collection: ProductsConsumption.We can then define a Snowflake ETL task that would push that to Snowflake, like so:loadToProductsConsumption({ PRODUCT_ID: doc.ProductId, STORE_ID: load('config/store').StoreId, TOTAL_SOLD: doc.TotalSold, TOTAL_DISCARDED: doc.TotalDiscarded, DATE: doc.Date });With that in place, each branch would push details about its sales, inventory discarded, etc., to the Snowflake account. And headquarters can run their queries and get a real-time view and understanding about what is going on globally. You can read more about Snowflake ETL here. The full docs, including all the details on how to set up properly, are here.
RavenDB 7.0 adds Snowflake integration to the set of ETL targets it supports.
Snowflake is a data warehouse solution, designed for analytics and data at scale. RavenDB is aimed at transactional scenarios and has a really good story around data distribution and wide geographical deployments.
You can check out the documentation to read the details about how you can use this integration to push data from RavenDB to your Snowflake database. In this post, I want to introduce one usage scenario for such integration.
RavenDB is commonly deployed on the edge, running on site in grocery stores, restaurants’ self-serve kiosks, supermarket checkout counters, etc. Such environments have to be tough and resilient to errors, network problems, mishandling, and much more.
We had to field support calls in the style of “there is ketchup all over the database”, for example.
In such environments, you must operate completely independently of the cloud. Both because of latency and performance issues and because you must keep yourself up & running even if the Internet is down. RavenDB does very well in such a scenario because of its internal architecture, the ability to run in a multi-master configuration, and its replication capabilities.
From a business perspective, that is a critical capability, to push data to the edge and be able to operate independently of any other resource. At the same time, this represents a significant challenge since you lose the ability to have an overall view of what is going on.
RavenDB’s Snowflake integration is there to bridge this gap. The idea is that you can define Snowflake ETL processes that would push the data from all the branches you have to a single shared Snowflake account. Your headquarters can then run queries, analyse the data, and in general have near real-time analytics without hobbling the branches with having to manage the data remotely.
The Grocery Store Scenario
In our grocery store, we manage the store using RavenDB as the backend. With documents such as this to record sales:
{
"Items": [
{
"ProductId": "P123",
"ProductName": "Milk",
"QuantitySold": 5
},
{
"ProductId": "P456",
"ProductName": "Bread",
"QuantitySold": 2
},
{
"ProductId": "P789",
"ProductName": "Eggs",
"QuantitySold": 10
}
],
"Timestamp": "2025-02-28T12:00:00Z",
"@metadata": {
"@collection": "Orders"
}
}
And this document to record other inventory updates in the store:
{
"ProductId": "P123",
"ProductName": "Milk",
"QuantityDiscarded": 3,
"Reason": "Spoilage",
"Timestamp": "2025-02-28T14:00:00Z",
"@metadata": {
"@collection": "DiscardedInventory"
}
}
These documents are repeated many times for each store, recording the movement of inventory, tracking sales, etc. Now we want to share those details with headquarters.
There are two ways to do that. One is to use the Snowflake ETL to push the data itself to the HQ’s Snowflake account. You can see an example of that when we push the raw data to Snowflake in this article.
The other way is to make use of RavenDB’s map/reduce capabilities to do some of the work in each store and only push the summary data to Snowflake. This can be done in order to reduce the load on Snowflake if you don’t need a granular level of data for analytics.
Here is an example of such a map/reduce index:
// Index Name: ProductConsumptionSummary
// Output Collection: ProductsConsumption
map('Orders', function(order) {
return {
ProductId: order.ProductId,
TotalSold: order.QuantitySold,
TotalDiscarded: 0,
Date: dateOnly(order.Timestamp)
};
});
map('DiscardedInventory', function(discard) {
return {
ProductId: discard.ProductId,
TotalSold: 0,
TotalDiscarded: discard.QuantityDiscarded,
Date: dateOnly(order.Timestamp)
};
});
groupBy(x => ({x.ProductId, x.Date) )
.aggregate(g =>{
ProductId: g.key.ProductId,
Date: g.key.Date,
TotalSold: g.values.reduce((c, val) => g.TotlaSld + c, 0),
TotalDiscarded: g.values.reduce((c, val) => g.TotalDiscarded + c, 0)
});
This index will output its documents to the artificial collection: ProductsConsumption.
We can then define a Snowflake ETL task that would push that to Snowflake, like so:
loadToProductsConsumption({
PRODUCT_ID: doc.ProductId,
STORE_ID: load('config/store').StoreId,
TOTAL_SOLD: doc.TotalSold,
TOTAL_DISCARDED: doc.TotalDiscarded,
DATE: doc.Date
});
With that in place, each branch would push details about its sales, inventory discarded, etc., to the Snowflake account. And headquarters can run their queries and get a real-time view and understanding about what is going on globally.
You can read more about Snowflake ETL here. The full docs, including all the details on how to set up properly, are here.