How to automate Java stack trace extraction from application logs

The real-world scenario

Imagine a production environment where a **Spring Boot** microservice is intermittently failing. The **logs** directory is flooded with gigabytes of data. For a **DevOps Engineer** or **Site Reliability Engineer**, manually scrolling through these files to find the root cause of an **Internal Server Error** is like trying to find a specific grain of sand on a beach. You need to group these errors to see if a specific **NullPointerException** is happening 10 times or 10,000 times. This script acts as an automated forensic tool that scans messy log files and extracts structured intelligence.

The solution

We use Python to implement a non-blocking file reader combined with regular expressions (**regex**). The script identifies the start of a Java exception (typically a fully qualified class name ending in **Exception** or **Error**) and captures all subsequent lines starting with **at**, **Caused by**, or **…**. By using the **pathlib** library, the script ensures cross-platform compatibility for handling file paths on **Windows**, **Linux**, or **macOS**.

Prerequisites

To run this script, you only need a standard installation of Python 3.8 or higher. No external libraries are required, ensuring it runs in restricted production environments.

Install **Python 3.8+**
Ensure you have read access to the **logs** folder

The code


"""
-----------------------------------------------------------------------
Authors: Sharanam & Vaishali Shah
Recipe: Java Stack Trace Parser
Intent: Extract and group unique Java exceptions from log files into JSON
-----------------------------------------------------------------------
"""
import re
import json
from pathlib import Path
from collections import defaultdict

def parse_java_logs(log_path, output_json):
    # Regex to identify the start of a stack trace
    # Matches patterns like: com.example.MyException: error message
    exception_pattern = re.compile(r'^([a-zA-Z0-9._$]+(?:Exception|Error):.*)')
    # Matches the stack trace lines
    trace_line_pattern = re.compile(r'^s+(?:ats+|Causeds+by:|...s+d+s+more).*')

    extracted_errors = defaultdict(int)
    current_trace = []
    
    log_file = Path(log_path)
    if not log_file.exists():
        print(f'Error: The file {log_file} does not exist.')
        return

    with log_file.open('r', encoding='utf-8') as f:
        for line in f:
            line = line.strip('n')
            
            # Check if line is the start of an exception
            if exception_pattern.match(line):
                if current_trace:
                    extracted_errors['n'.join(current_trace)] += 1
                current_trace = [line]
            # Check if line is part of an ongoing stack trace
            elif trace_line_pattern.match(line) and current_trace:
                current_trace.append(line)
            else:
                # If we hit a normal log line, save the previous trace and reset
                if current_trace:
                    extracted_errors['n'.join(current_trace)] += 1
                    current_trace = []

    # Final trace capture if the file ends with one
    if current_trace:
        extracted_errors['n'.join(current_trace)] += 1

    # Format findings into a list of dictionaries
    report = [
        {'exception': trace, 'occurrence_count': count}
        for trace, count in extracted_errors.items()
    ]

    # Write to JSON file
    output_path = Path(output_json)
    with output_path.open('w', encoding='utf-8') as out:
        json.dump(report, out, indent=4)

    print(f'Analysis complete. Unique exceptions found: {len(report)}')
    print(f'Report saved to: {output_path.absolute()}')

if __name__ == '__main__':
    # Target your Java log file here
    TARGET_LOG = 'application.log'
    OUTPUT_FILE = 'error_report.json'
    
    # Create a dummy log file for demonstration if it doesn't exist
    if not Path(TARGET_LOG).exists():
        with open(TARGET_LOG, 'w') as f:
            f.write('INFO 10:00:01 - App startedn')
            f.write('ERROR 10:00:05 - Failed to process requestn')
            f.write('java.lang.NullPointerException: Cannot invoke methodn')
            f.write('tat com.example.Service.process(Service.java:45)n')
            f.write('tat com.example.Controller.handle(Controller.java:12)n')
            f.write('INFO 10:00:10 - Heartbeat okayn')
    
    parse_java_logs(TARGET_LOG, OUTPUT_FILE)

Code walkthrough

The logic follows a state-machine approach. We use **Path** from **pathlib** to handle the file object safely. As we iterate through the file line-by-line, the script looks for a specific signature: a word ending in **Exception** followed by a colon. This is the **header** of our stack trace.

Once a header is found, the script enters a collection state. Every subsequent line that starts with whitespace and the keywords **at** or **Caused by** is appended to the **current_trace** list. If the script encounters a line that does not match this pattern—like a standard **INFO** log—it realizes the stack trace has ended. It then joins the collected lines into a single string and stores it in a **dictionary**, using the trace as a key to automatically count duplicates. Finally, the data is serialized into **JSON** for easy ingestion into other tools.

Sample output

When you run the script against a standard Java log, you will see the following confirmation in your **terminal**:


Analysis complete. Unique exceptions found: 1
Report saved to: /Users/sharanam/projects/error_report.json

The resulting **error_report.json** file will look like this:


[
    {
        "exception": "java.lang.NullPointerException: Cannot invoke methodntat com.example.Service.process(Service.java:45)ntat com.example.Controller.handle(Controller.java:12)",
        "occurrence_count": 1
    }
]

Conclusion

Automating the extraction of errors from large text files is a fundamental skill for maintaining high-availability Java systems. By converting raw, unstructured text into a structured format, you can easily identify the most frequent bugs and prioritize your development efforts. This Python script provides a lightweight, robust foundation for your automated monitoring pipeline.

🚀 Don’t Just Learn Java — Master It.

This tutorial was just the tip of the iceberg. To truly advance your career and build professional-grade systems, you need the full architectural blueprint.

My book, Core Java 23 For Beginners, takes you from “making it work” to “making it scale.” I cover advanced patterns, real-world case studies, and the industry best practices that senior engineers use daily.

📖 Grab Your Copy Now →

How to Automate Java Stack Trace Extraction with Python

Published by admin on February 20, 2026February 20, 2026

How to automate Java stack trace extraction from application logs

The real-world scenario

The solution

Prerequisites

The code

Code walkthrough

Sample output

Conclusion

🚀 Don’t Just Learn Java — Master It.

How to Automate Java EE to Jakarta EE Namespace Migration with Python

How to Categorize Customer Feedback at Scale with Python and LLMs

How to Automate Oracle Database Health Exports with Python

How to Automate Java Stack Trace Extraction with Python

Published by admin on February 20, 2026February 20, 2026

How to automate Java stack trace extraction from application logs

The real-world scenario

The solution

Prerequisites

The code

Code walkthrough

Sample output

Conclusion

🚀 Don’t Just Learn Java — Master It.

Related Posts

How to Automate Java EE to Jakarta EE Namespace Migration with Python

How to Categorize Customer Feedback at Scale with Python and LLMs

How to Automate Oracle Database Health Exports with Python