You're reading for free via Lazar Gugleta's Friend Link. Become a member to access the best of Medium.

Why Polars Destroy Pandas in All Possible Ways for Data Scientists?

One of the first Data Science libraries, Pandas, has been improving the lives of many developers across the globe, but Polars shows it is time to move on.

Published in

Towards AI

5 min readAug 8, 2024

Pandas needs no introduction, but this article will dive deep into answering the question of why Polars is better than Pandas (even the Author of Pandas agrees).

You might be aware of some basics like memory and speed improvements, but why? How does Polars do their magic to achieve such high speeds and less memory usage?

This article will provide all the reasons why Polars has an advantage over Pandas as well as what it is lacking in comparison (for now).

Let’s jump right into it!

Clean API

There are so many tricks and hacks you can do with Pandas that probably developers themselves are not aware. Daily usage is no different because If I gave you a piece of code in Pandas like this: data.iloc[:, 2:] >= 4 and assuming you don’t have hyperthymesia, you would not know what this code does. It is known that developers use Google and AI bots to produce code and do not know everything off the top of their heads, but the point here is different.

The functions that the library provides should be straightforward, clear and dedicated to one use.

That is what Polars provides with their excellent documentation, function names, and overall feel of the library stability.
Their expressive API is one of the best parts of the library. It provides such a different insight into working with data that going from one framework to another takes a toll on brainpower and shifts the mindset completely.

Speed and memory optimization

There are multiple reasons for this, and two main ones are Apache Arrow and Rust. Arrow is a language-independent columnar memory format for flat and hierarchical data, organized for efficient analytic operations.
Pandas struggles to utilize this efficiently because of the legacy code and data type extensions internal to the library. Polars out of the box works with the Arrow format and hence achieves much higher speeds.

Polars underlying code is implemented in Rust, and since it is a compiled language, unlike Python, which is interpreted, it has a speed advantage again. That is not the only reason, besides that there is memory safety and concurrency, which is better handled in Rust.

Production code

Great API brings us back to the point of whether some should be using either library in production, which is another advantage for Polars. Pandas is not stable enough to be used in production, as it has been shown for years and discussed in the community. Many changes and underlying legacy code give so many pain points that it is not worth going with Pandas.

Dependencies

I want to point out some of the advantages of Pandas as well, and those are dependencies, which are, in this case, a sword with two edges.
Although this provides us with a lot of integration with libraries like Seaborn and matplotlib to achieve even better results, we are stuck with Pandas and sometimes can’t move away from the library.

As mentioned, Polars primarily depends on the Arrow data format, which provides a high-performance, in-memory columnar data structure. This reduced dependency chain contributes to Polars’ overall performance and flexibility, as it avoids potential compatibility issues and overhead associated with managing multiple external libraries.

Community

The dependency problem will be solved as the community grows over time in this direction of clean code and efficiency, but it takes time. That is another advantage for Pandas because it has existed for so long.

With an increasing number of developers and data scientists adopting Polars for their projects, the ecosystem is expanding at an accelerated pace. While Pandas has a significant head start, the momentum behind Polars suggests that it will quickly close the gap in community size, resources, and available tools, positioning itself as a strong competitor in the data manipulation landscape. Still, this time, we are going in the right direction.

Switching from Pandas to Polars

Transitioning from Pandas to Polars can be a smooth process for many users due to the similar DataFrame structure and familiar Python syntax. While there are differences in API and functionality, Polars’ performance benefits, especially for large datasets, often outweigh the initial learning curve. Many everyday Pandas operations have direct equivalents in Polars, and the growing community provides ample resources and support to aid in the migration. However, for complex workflows heavily reliant on Pandas-specific features, a gradual adoption approach or hybrid use of both libraries might be necessary.

Conclusion

Starting your Data Science journey with Polars can be good, but you will discover that many Stackoverflow questions and discussion forums are still focused on Pandas. Getting the right mindset from the get-go is vital so that Polars can be very beneficial later on as the starting point.

Switching from Pandas to Polars is also great, so going with Polars right now would benefit the project and developers working on the code.

That is all for today! If you have any questions, please send them my way!

Why Polars Destroy Pandas in All Possible Ways for Data Scientists?

One of the first Data Science libraries, Pandas, has been improving the lives of many developers across the globe, but Polars shows it is time to move on.

Clean API

Speed and memory optimization

Production code

Dependencies

Community

Switching from Pandas to Polars

Conclusion

Published in Towards AI

Written by Lazar Gugleta

Responses (13)

More from Lazar Gugleta and Towards AI

Polars Just Got Even Faster

Nvidia announced RAPIDS, which accelerates Polars up to 13x faster by improving the library workflows compared to CPU usage.

The DeepSeek Revolution: Why This AI Model Is Outperforming Tech Giants in 85% of Enterprise Tasks

In late 2024, a critical shift occurred in enterprise AI adoption: DeepSeek’s models (in particular v3) began consistently outperforming…

Kolmogorov-Arnold Networks: Exploring Dynamic Weights and Attention Mechanisms

A Step-by-Step Guide to KAN, Dynamic Weight Adjustments, and Their Relationship to Attention Mechanisms: Investigating the Attention in KAN…

Boost your Local Business with Data Analysis and Web Scraping

Improving your business is a daily and tedious task, but using competition data can provide interesting underlying insights.

Recommended from Medium

I used OpenAI’s o1 model to develop a trading strategy. It is DESTROYING the market

It literally took one try. I was shocked.

Top 12 Skills Data Scientists Need to Succeed in 2025

It’s (not) all about LLMs and AI tools

Lists

Predictive Modeling w/ Python

Practical Guides to Machine Learning

Coding & Development

Natural Language Processing

Jeff Bezos Says the 1-Hour Rule Makes Him Smarter. New Neuroscience Says He’s Right

Jeff Bezos’s morning routine has long included the one-hour rule. New neuroscience says yours probably should too.

AI Is Killing Coding

There’s a new IDE out called Cursor. Although as I said before:

The Top 10 AI Research Papers of 2024: Key Takeaways and How You Can Apply Them

As the curtains draw on 2024, it’s time to reflect on the innovations that have defined the year in AI. And let’s be real — what a year it…

How This 17-Year-Old Quietly Built a $1.12M/Month AI App

I stumbled upon his exact strategy from A to Z and it's brilliant.