How to Automate Customer Feedback Categorization with Python and LLMs
The real-world scenario
Imagine you are a Lead Developer or Data Analyst at a fast-growing startup. Every Monday morning, you face a data_dump containing thousands of unstructured customer support tickets, tweets, and emails. Manually reading these to identify critical bugs versus minor feature requests is a bottleneck that delays product iterations. It is like trying to sort a mountain of mixed LEGO bricks by hand while more bricks are being poured on your head. This script acts as an automated sorting machine, reading every entry and filing it into the correct category with high precision.
The solution
We leverage Python and the OpenAI API with Structured Outputs. Instead of receiving messy text responses, we force the LLM to adhere to a specific Pydantic schema. We use pathlib for robust file system navigation to scan a directory of feedback files, process them through the model, and aggregate the results into a structured report.json.
Prerequisites
Ensure you have Python 3.10 or higher installed. You will need an OpenAI API key. Install the necessary libraries using the following command:
pip install openai pydantic python-dotenvThe code
"""
-----------------------------------------------------------------------
Authors: Sharanam & Vaishali Shah
Recipe: Automated Feedback Classifier
Intent: Use LLMs to convert unstructured text files into structured JSON data.
-----------------------------------------------------------------------
"""
import json
from pathlib import Path
from typing import List
from pydantic import BaseModel
from openai import OpenAI
from dotenv import load_dotenv
# Load environment variables from .env file
load_dotenv()
# Define the expected structure of our analysis
class FeedbackAnalysis(BaseModel):
sentiment: str
category: str
priority: int
summary: str
class FeedbackProcessor:
def __init__(self):
self.client = OpenAI()
self.input_dir = Path("feedback_input")
self.output_file = Path("analysis_report.json")
# Create input directory if it does not exist
self.input_dir.mkdir(exist_ok=True)
def analyze_text(self, text: str) -> FeedbackAnalysis:
"""Sends text to LLM and returns structured feedback."""
response = self.client.beta.chat.completions.parse(
model="gpt-4o-mini",
messages=[
{"role": "system", "content": "Analyze the customer feedback and categorize it accurately."},
{"role": "user", "content": text},
],
response_format=FeedbackAnalysis,
)
return response.choices[0].message.parsed
def process_all_files(self):
"""Iterates through text files and aggregates results."""
results = []
# Use pathlib to find all .txt files
files = list(self.input_dir.glob("*.txt"))
if not files:
print(f"No files found in {self.input_dir}. Please add .txt files to process.")
return
for file_path in files:
print(f"Processing: {file_path.name}")
content = file_path.read_text(encoding="utf-8")
analysis = self.analyze_text(content)
# Combine original metadata with LLM analysis
results.append({
"file_name": file_path.name,
"analysis": analysis.model_dump()
})
# Save the aggregated data to a JSON file
self.output_file.write_text(json.dumps(results, indent=4), encoding="utf-8")
print(f"Success! Report generated at {self.output_file}")
if __name__ == "__main__":
processor = FeedbackProcessor()
processor.process_all_files()
Code walkthrough
The script begins by importing Path from pathlib, which provides an object-oriented approach to handling file paths, making the code cross-platform compatible without manual slash adjustments. We define a FeedbackAnalysis class using Pydantic. This class acts as a contract; it tells the OpenAI API exactly which fields we need (sentiment, category, priority, and summary) and their data types.
The FeedbackProcessor class handles the orchestration. Inside analyze_text, we use the beta.chat.completions.parse method. This is the modern way to handle LLM outputs because it eliminates the need for complex regex or error-prone json.loads() calls. The API returns a validated object that matches our FeedbackAnalysis model.
The process_all_files method uses the glob function to find every .txt file in the feedback_input directory. It reads the raw text, passes it to the AI, and collects the structured objects into a list. Finally, we use write_text to save the entire dataset into analysis_report.json, ready for a dashboard or a database import.
Sample output
When you run the script, the terminal will display the progress for each file found in the feedback_input folder:
Processing: ticket_001.txt
Processing: ticket_002.txt
Processing: tweet_99.txt
Success! Report generated at analysis_report.json
The resulting analysis_report.json will look like this:
[
{
"file_name": "ticket_001.txt",
"analysis": {
"sentiment": "Negative",
"category": "Bug Report",
"priority": 1,
"summary": "User unable to reset password via the mobile app."
}
}
]
Conclusion
Automating data categorization with Python and LLMs removes the human fatigue factor from high-volume operations. By using Structured Outputs, you transform an unpredictable AI into a reliable data pipeline component. This script can be scheduled as a cron job or integrated into a CI/CD pipeline to monitor user sentiment in real-time, ensuring that critical issues never slip through the cracks again.
🚀 Don’t Just Learn AI & LLMs — Master It.
This tutorial was just the tip of the iceberg. To truly advance your career and build professional-grade systems, you need the full architectural blueprint.
My book, Large Language Models Crash Course, takes you from “making it work” to “making it scale.” I cover advanced patterns, real-world case studies, and the industry best practices that senior engineers use daily.