2 Exploring Data with AI
This chapter goes from a raw spreadsheet to a written recommendation with charts in under ten minutes — using nothing but plain-English conversation. We use a spreadsheet as an example, but other data sources can be used similarly. Chapter 4 discusses databases.
2.1 The scenario
You are a regional VP at a retail company. Q4 numbers just came in. The question on the table: which regions should get additional marketing budget, and which product lines should be expanded or cut?
The data is the Kaggle “Sample Superstore” dataset — roughly 10,000 orders across four U.S. regions, with columns for sales, profit, discount, product category, and customer segment. You can download it here: superstore.csv.
An analyst would take half a day to work through this. With a coding agent, you can do it in ten minutes.
2.2 Step 1: Open and orient
Place the CSV file in a folder and open your coding agent in that folder. Then ask:
Describe the data in this folder.
The agent reads the CSV into a dataframe and reports back: 10,000 rows, 21 columns, four regions, three product categories, a date range from 2014 to 2017, total sales of roughly $2.3 million with an average profit margin around 12%. It may also flag surprises — sub-categories that are losing money, or a disproportionate concentration of sales in one region.
This orienting step catches formatting issues (dates parsed as strings, currencies with dollar signs, encoding problems) and surfaces the structure of the data before you start asking pointed questions.
2.3 Step 2: Ask decision-relevant questions
Frame every question around the decision, not around the data. For the budget decision:
Which region has the highest profit margin, and which has the lowest?
Show quarterly sales trends by region. Are any accelerating or declining?
Which sub-categories are losing money? How much?
Each of these questions connects directly to the budget allocation decision. The first identifies where margin is strongest. The second reveals momentum — a region with declining sales may not deserve more investment regardless of its current margin. The third pinpoints money-losers that could be cut.
2.4 Step 3: Visualize the tradeoffs
Ask the agent to create charts that make the decision visual.
“Quarterly profit by region” calls for a line chart, because you are looking at trends over time. “Profit by sub-category, sorted” calls for a bar chart, because you are comparing categories. “Discount rate vs. profit margin” calls for a scatter plot, because you are looking for a relationship between two continuous variables.
Once you have a chart, iterate. Ask the agent to highlight money-losers in red, add a break-even line, or annotate the December spike. Each follow-up takes seconds. If an annotation overlaps a data point, you can ask the agent to view its own chart and fix the layout — it can see images and adjust the code accordingly.
2.5 Step 4: Build the recommendation
The final step is synthesis. Ask the agent to turn the analysis into a recommendation.
Which two regions should get increased marketing budget and why?
Which sub-categories should we discontinue? Quantify the savings.
Write a one-page executive memo with the recommendation and supporting charts.
In ten minutes you have gone from a raw CSV to a written recommendation backed by evidence. No code was written by you. No analyst was involved. No one waited three days for a dashboard.
2.6 What just happened
Behind the scenes, the agent read the CSV into Python, wrote and ran code for each question, rendered charts with Python, and synthesized findings into a memo. You never see the code unless you ask for it.
2.7 Exercises
This is the core exercise for Chapter 2. Download superstore.csv and place it in a folder. Open your coding agent in that folder and work through the four steps described above.
First, ask the agent to describe the data. Confirm the row count, column names, date range, and financial summary. Second, ask three decision-relevant questions tied to the marketing budget allocation decision. Third, create at least one line chart (for trends), one bar chart (for comparisons), and one scatter plot (for relationships). Iterate on at least one chart by adding annotations or improving the layout. Fourth, ask the agent to write a one-page executive memo with the recommendation and supporting charts.
Bring your dataset and memo to Chapter 3 — you will use them as thinking-partner exercises.
Your colleague has made several claims about the Superstore data. Use your coding agent to verify or debunk each one.
Claim 1: Discounts over 30% always increase total profit. Claim 2: The East region has the best profit margins. Claim 3: Office Supplies is the most profitable category. Claim 4: Q4 is always the best quarter.
For each claim, ask the agent to test it, show the evidence, and write one sentence stating whether the claim is confirmed or debunked, with the supporting number.
You are an HR director at a mid-size company. Employee turnover has been rising and the CEO wants to know why people are leaving and what to do about it.
Download employee-attrition.csv and place it in a folder. Open your coding agent and work through the same four steps from this chapter.
First, ask the agent to describe the data. How many employees are in the dataset? What is the overall attrition rate? What departments and job roles are represented?
Second, ask decision-relevant questions: Which departments have the highest attrition? Does overtime correlate with leaving? How does monthly income differ between employees who stay and those who leave? Are employees with longer commutes more likely to leave?
Third, create at least three charts: a bar chart comparing attrition rates by department, a box plot of monthly income for stayers vs. leavers, and a chart of your choice that reveals a pattern you find interesting.
Fourth, ask the agent to write a one-page memo recommending three retention interventions, backed by the data.
Your VP of People Operations has made several claims about the employee attrition data. Use your coding agent to verify or debunk each one.
Claim 1: Employees who work overtime leave at more than double the rate of those who don’t. Claim 2: Job satisfaction has no meaningful relationship with attrition. Claim 3: Younger employees (under 30) are the most likely to leave. Claim 4: Employees who haven’t been promoted in the last two years leave at higher rates.
For each claim, ask the agent to test it, show the evidence, and write one sentence stating whether the claim is confirmed or debunked, with the supporting number.
Identify a recurring report your team produces — weekly sales, monthly KPIs, quarterly review — and try to automate it.
Place the source data in a folder and open your coding agent. Ask it to produce the full report: charts, tables, and a narrative summary. Then compare: how long does this report normally take to produce? How long did AI take? What is missing from the AI version?