How to Automate Customer Feedback Categorization with Python and LLMs

The real-world scenario

Imagine you are a Lead Developer or Data Analyst at a fast-growing startup. Every Monday morning, you face a data_dump containing thousands of unstructured customer support tickets, tweets, and emails. Manually reading these to identify critical bugs versus minor feature requests is a bottleneck that delays product iterations. It is like trying to sort a mountain of mixed LEGO bricks by hand while more bricks are being poured on your head. This script acts as an automated sorting machine, reading every entry and filing it into the correct category with high precision.

The solution

We leverage Python and the OpenAI API with Structured Outputs. Instead of receiving messy text responses, we force the LLM to adhere to a specific Pydantic schema. We use pathlib for robust file system navigation to scan a directory of feedback files, process them through the model, and aggregate the results into a structured report.json.

Prerequisites

Ensure you have Python 3.10 or higher installed. You will need an OpenAI API key. Install the necessary libraries using the following command:

pip install openai pydantic python-dotenv

The code


"""
-----------------------------------------------------------------------
Authors: Sharanam & Vaishali Shah
Recipe: Automated Feedback Classifier
Intent: Use LLMs to convert unstructured text files into structured JSON data.
-----------------------------------------------------------------------
"""

import json
from pathlib import Path
from typing import List
from pydantic import BaseModel
from openai import OpenAI
from dotenv import load_dotenv

# Load environment variables from .env file
load_dotenv()

# Define the expected structure of our analysis
class FeedbackAnalysis(BaseModel):
    sentiment: str
    category: str
    priority: int
    summary: str

class FeedbackProcessor:
    def __init__(self):
        self.client = OpenAI()
        self.input_dir = Path("feedback_input")
        self.output_file = Path("analysis_report.json")
        
        # Create input directory if it does not exist
        self.input_dir.mkdir(exist_ok=True)

    def analyze_text(self, text: str) -> FeedbackAnalysis:
        """Sends text to LLM and returns structured feedback."""
        response = self.client.beta.chat.completions.parse(
            model="gpt-4o-mini",
            messages=[
                {"role": "system", "content": "Analyze the customer feedback and categorize it accurately."},
                {"role": "user", "content": text},
            ],
            response_format=FeedbackAnalysis,
        )
        return response.choices[0].message.parsed

    def process_all_files(self):
        """Iterates through text files and aggregates results."""
        results = []
        
        # Use pathlib to find all .txt files
        files = list(self.input_dir.glob("*.txt"))
        
        if not files:
            print(f"No files found in {self.input_dir}. Please add .txt files to process.")
            return

        for file_path in files:
            print(f"Processing: {file_path.name}")
            content = file_path.read_text(encoding="utf-8")
            
            analysis = self.analyze_text(content)
            
            # Combine original metadata with LLM analysis
            results.append({
                "file_name": file_path.name,
                "analysis": analysis.model_dump()
            })

        # Save the aggregated data to a JSON file
        self.output_file.write_text(json.dumps(results, indent=4), encoding="utf-8")
        print(f"Success! Report generated at {self.output_file}")

if __name__ == "__main__":
    processor = FeedbackProcessor()
    processor.process_all_files()

Code walkthrough

The script begins by importing Path from pathlib, which provides an object-oriented approach to handling file paths, making the code cross-platform compatible without manual slash adjustments. We define a FeedbackAnalysis class using Pydantic. This class acts as a contract; it tells the OpenAI API exactly which fields we need (sentiment, category, priority, and summary) and their data types.

The FeedbackProcessor class handles the orchestration. Inside analyze_text, we use the beta.chat.completions.parse method. This is the modern way to handle LLM outputs because it eliminates the need for complex regex or error-prone json.loads() calls. The API returns a validated object that matches our FeedbackAnalysis model.

The process_all_files method uses the glob function to find every .txt file in the feedback_input directory. It reads the raw text, passes it to the AI, and collects the structured objects into a list. Finally, we use write_text to save the entire dataset into analysis_report.json, ready for a dashboard or a database import.

Sample output

When you run the script, the terminal will display the progress for each file found in the feedback_input folder:


Processing: ticket_001.txt
Processing: ticket_002.txt
Processing: tweet_99.txt
Success! Report generated at analysis_report.json

The resulting analysis_report.json will look like this:


[
    {
        "file_name": "ticket_001.txt",
        "analysis": {
            "sentiment": "Negative",
            "category": "Bug Report",
            "priority": 1,
            "summary": "User unable to reset password via the mobile app."
        }
    }
]

Conclusion

Automating data categorization with Python and LLMs removes the human fatigue factor from high-volume operations. By using Structured Outputs, you transform an unpredictable AI into a reliable data pipeline component. This script can be scheduled as a cron job or integrated into a CI/CD pipeline to monitor user sentiment in real-time, ensuring that critical issues never slip through the cracks again.


🚀 Don’t Just Learn AI & LLMs — Master It.

This tutorial was just the tip of the iceberg. To truly advance your career and build professional-grade systems, you need the full architectural blueprint.

My book, Large Language Models Crash Course, takes you from “making it work” to “making it scale.” I cover advanced patterns, real-world case studies, and the industry best practices that senior engineers use daily.


📖 Grab Your Copy Now →