How to Automate Legacy PHP Code Audits with Python

The real-world scenario

Imagine you are a DevOps engineer or a Senior Developer tasked with migrating a legacy PHP 5.6 application to PHP 8.2. Manually opening hundreds of files to find deprecated functions like mysql_connect or insecure patterns is a recipe for burnout. It is like trying to find every single expired item in a massive warehouse using nothing but a flashlight. This Python script acts as an automated health inspector, scanning your entire codebase in seconds and generating a detailed report of exactly what needs fixing.

The solution

We use a Python script that leverages the pathlib library for modern file system navigation and the re module for high-speed pattern matching. The script recursively crawls through a directory, identifies .php files, and flags specific deprecated functions or syntax errors, outputting the results into a clean audit_report.csv file.

Prerequisites

  • Install Python 3.8 or higher.
  • No external libraries are required; we use built-in modules for maximum compatibility.

The code


"""
-----------------------------------------------------------------------
Authors: Sharanam & Vaishali Shah
Recipe: PHP Legacy Auditor
Intent: Scan PHP projects for deprecated functions and security risks.
-----------------------------------------------------------------------
"""
import re
import csv
from pathlib import Path

def audit_php_project(target_dir, report_name):
    # Define patterns to look for (e.g., deprecated mysql functions or eval)
    patterns = {
        'mysql_extension': r'mysql_w+',
        'eval_usage': r'evals*(',
        'short_tags': r'<?(s|=)',
        'var_dump_leftover': r'var_dumps*('
    }

    results = []
    base_path = Path(target_dir)

    if not base_path.exists():
        print(f'Error: The path {target_dir} does not exist.')
        return

    # Recursively find all PHP files
    for php_file in base_path.rglob('*.php'):
        try:
            content = php_file.read_text(encoding='utf-8', errors='ignore')
            lines = content.splitlines()

            for line_num, line in enumerate(lines, 1):
                for issue_type, pattern in patterns.items():
                    if re.search(pattern, line):
                        results.append({
                            'file': str(php_file.relative_to(base_path)),
                            'line': line_num,
                            'issue': issue_type,
                            'snippet': line.strip()[:50]
                        })
        except Exception as e:
            print(f'Could not read {php_file}: {e}')

    # Write findings to CSV
    output_file = Path(report_name)
    headers = ['file', 'line', 'issue', 'snippet']
    
    with output_file.open('w', newline='', encoding='utf-8') as f:
        writer = csv.DictWriter(f, fieldnames=headers)
        writer.writeheader()
        writer.writerows(results)

    print(f'Audit complete. Found {len(results)} issues. Report saved to: {report_name}')

if __name__ == '__main__':
    # Set the directory of your PHP project here
    project_path = './my_php_app' 
    report_filename = 'audit_report.csv'
    audit_php_project(project_path, report_filename)

Code walkthrough

The script begins by importing pathlib for cross-platform path handling and re for regular expression logic. We define a dictionary named patterns where the keys are the names of the issues and the values are the regex strings used to catch them. For example, mysql_w+ identifies any function starting with mysql_, which was removed in PHP 7.0.

The rglob method is the engine of the script; it performs a recursive search for every file ending in .php within the target_dir. As each file is opened, the script iterates through its lines and checks them against our pattern list. If a match is found, the file name, line number, and a snippet of the code are stored in a list of dictionaries.

Finally, the csv.DictWriter takes that list and transforms it into a structured audit_report.csv. This allows you to open the results in Excel or Google Sheets to prioritize your refactoring efforts.

Sample output

When you execute the script in your terminal, you will see the following progress message:


$ python php_auditor.py
Audit complete. Found 12 issues. Report saved to: audit_report.csv

Opening the audit_report.csv will display data like this:


file,line,issue,snippet
index.php,12,short_tags,<? echo "Hello World";
db.php,45,mysql_extension,$conn = mysql_connect($host, $user);
utils.php,102,var_dump_leftover,var_dump($user_data);

Conclusion

Automating the discovery phase of a code migration saves hours of manual labor and ensures no deprecated function slips through to production. By using Python to audit PHP, you leverage the best of both worlds: Python’s superior automation capabilities and PHP’s web-focused logic. You can easily extend the patterns dictionary to search for specific API keys, hardcoded passwords, or custom internal functions that need to be retired.


🚀 Don’t Just Learn PHP — Master It.

This tutorial was just the tip of the iceberg. To truly advance your career and build professional-grade systems, you need the full architectural blueprint.

My book, Unlocking AI with PHP, takes you from “making it work” to “making it scale.” I cover advanced patterns, real-world case studies, and the industry best practices that senior engineers use daily.


📖 Grab Your Copy Now →