Member-only story

Pandas on Snowflake vs Snowpark DataFrame vs Snowpark pandas

Cristian Scutaru
7 min readSep 4, 2024

Let’s face it, it’s easy to get lost. So follow me here and let’s just clarify where are the main differences when using these different APIs or libraries…

Read here the post for free if you do not have a Medium subscription.

In the meantime, send me an email to let you know when my new masterclass hands-on course “10+ Popular DataFrame Libraries for Data Science” is published, in just a few days.

I’ll talk in detail about the Pandas DataFrame API and the R Data Frames, Apache Arrow and DuckDB In-Memory Analytics, Polars and Dask DataFrame APIs, Apache Spark and Snowpark DataFrame APIs, BigQuery’s Bigframes API and .NET alternatives (including LINQ, Deedle), and more…

Bringing the “compute” closer to data

The “compute” in Snowflake is about their SQL engines, running in what they call “virtual warehouses”. Which are some EC2 virtual machines in an AWS cloud. Or something similar in Azure or GCP.

Whenever you run an SQL query, you do it through an active virtual warehouse, and this is how Snowflake the company makes money. And their main interest is to make you bring even more code…

--

--

Cristian Scutaru
Cristian Scutaru

Written by Cristian Scutaru

World-class expert in Snowflake Data Cloud. Former Snowflake "Data Superhero" and SnowPro SME (Subject Matter Expert). 7x SnowPro certification exams.

No responses yet