How AI Is Rewriting the Day-to-Day of Data Scientists

In my past articles, I have explored and compared many AI tools, for example, Google’s Data Science Agent, ChatGPT vs. Claude vs. Gemini for Data Science, DeepSeek V3, etc. However, this is only a small subset of all the AI tools available for data science. Just to name a few that I used at work: […] The post How AI Is Rewriting the Day-to-Day of Data Scientists appeared first on Towards Data Science.

May 1, 2025 - 21:17
 0
How AI Is Rewriting the Day-to-Day of Data Scientists

In my past articles, I have explored and compared many AI tools, for example, Google’s Data Science Agent, ChatGPT vs. Claude vs. Gemini for Data Science, DeepSeek V3, etc. However, this is only a small subset of all the AI tools available for Data Science. Just to name a few that I used at work:

  • OpenAI API: I use it to categorize and summarize customer feedback and surface product pain points (see my tutorial article).
  • ChatGPT and Gemini: They help me draft Slack messages and emails, write analysis reports, and even performance reviews.
  • Glean AI: I used Glean AI to find answers across internal documentation and communications quickly.
  • Cursor and Copilot: I enjoy just pressing tab-tab to auto-complete code and comments.
  • Hex Magic: I use Hex for collaborative data notebooks at work. They also offer a feature called Hex Magic to write code and fix bugs using conversational AI.
  • Snowflake Cortex: Cortex AI allows users to call Llm endpoints, build RAG and text-to-SQL services using data in Snowflake.

I am sure you can add a lot more to this list, and new AI tools are being launched every day. It is almost impossible to get a complete list at this point. Therefore, in this article, I want to take one step back and focus on a bigger question: what do we really need as data professionals, and how AI can help

In the section below, I will focus on two main directions — eliminating low-value tasks and accelerating high-value work. 


1. Eliminating Low-Value Tasks

I became a data scientist because I truly enjoy uncovering business insights from complex data and driving business decisions. However, having worked in the industry for over seven years now, I have to admit that not all the work is as exciting as I had hoped. Before conducting advanced analyses or building machine learning models, there are many low-value work streams that are unavoidable daily — and in many cases, it is because we don’t have the right tooling to empower our stakeholders for self-serve analytics. Let’s look at where we are today and the ideal state:

Current state: We work as data interpreters and gatekeepers (sometimes “SQL monkeys”)

  • Simple data pull requests come to me and my team on Slack every week asking, “What was the GMV last month?” “Can you pull the list of customers who meet these criteria?” “Can you help me fill in this number on the deck that I need to present tomorrow?” 
  • BI tools do not support self-service use cases well. We adopted BI tools like Looker and Tableau so stakeholders can explore the data and monitor the metrics easily. But the reality is that there is always a trade-off between simplicity and self-servability. Sometimes we make the dashboards easy to understand with a few metrics, but they can only fulfill a few use cases. Meanwhile, if we make the tool very customizable with the capability to explore the metrics and underlying data freely, stakeholders could find the tool confusing and lack the confidence to use it, and in the worst case, the data is pulled and interpreted in the wrong way.  
  • Documentation is sparse or outdated. This is a common situation, but could be caused by different reasons — maybe we move fast and focus on delivering results, or there is no great data documentation and governance policies in place. As a result, tribal knowledge becomes the bottleneck for people outside of the data team to use the data.

Ideal state: Empower stakeholders to self-serve so we can minimize low-value work

  • Stakeholders can do simple data pulls and answer basic data questions easily and confidently.
  • Data teams spend less time on repetitive reporting or one-off basic queries.
  • Dashboards are discoverable, interpretable, and actionable without hand-holding.

So, to get closer to the ideal state, what role can AI play here? From what I have observed, these are the common directions AI tools are going to close the gap:

  1. Query data with natural language (Text-to-SQL): One way to lower the technical barrier is to enable stakeholders to query the data with natural language. There are lots of Text-to-SQL efforts in the industry:
    • For example, Snowflake is one company that has made lots of progress in Text2SQL models and started integrating the capability into its product. 
    • Many companies (including mine) also explored in-house Text2SQL solutions. For example, Uber shared their journey with Uber’s QueryGPT to make data querying more accessible for their Operations team. This article explained in detail how Uber designed a multi-agent architecture for query generation. Meanwhile, it also surfaced major challenges in this area, including accurately interpreting user intent, handling large table schemas, and avoiding hallucinations etc. 
    • Honestly, to make Text-to-SQL work, there is a very high bar as you have to make the query accurate — even if the tool fails just once, it could ruin the trust and eventually stakeholders will come back to you to validate the queries (then you need to read+rewrite the queries, which almost double the work                         </div>
                                            <div class= Read More