Skip to content

Fix extended stack trace (i.e., %xEx) rendering performance regression #3123

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Nov 2, 2024

Conversation

alan0428a
Copy link
Contributor

@alan0428a alan0428a commented Oct 26, 2024

@alan0428a
Copy link
Contributor Author

alan0428a commented Oct 26, 2024

@ppkarwasz @vy
baseline
2.24.1 baseline: around 150MB

2 25 0-SNAPSHOT 2.25.0 SNAPSHOT: around 2.5GB initial-fix Initial fix ea9944f reduce to around 300 MB

Will continue to check for improvement

@vy vy self-assigned this Oct 27, 2024
@vy vy added performance Issues or PRs that affect performance, throughput, latency, etc. layouts Affects one or more Layout plugins labels Oct 27, 2024
@jengebr
Copy link
Contributor

jengebr commented Oct 28, 2024

My benchmark shows that these changes are ~10% slower than 2.24.1, and use 3x the heap. That is far better than previous 2.25.0, which was more than 2x slower and used ~15x the heap.

What level of performance regression is acceptable?

Copy link

github-actions bot commented Oct 28, 2024

Job Requested goals Build Tool Version Build Outcome Build Scan®
build-macos-latest clean install 3.9.8 Build Scan PUBLISHED
build-ubuntu-latest clean install 3.9.8 Build Scan PUBLISHED
build-windows-latest clean install 3.9.8 Build Scan PUBLISHED
Generated by gradle/develocity-actions

@ppkarwasz
Copy link
Contributor

My benchmark shows that these changes are ~10% slower than 2.24.1, and use 3x the heap. That is far better than previous 2.25.0, which was more than 2x slower and used ~15x the heap.

What level of performance regression is acceptable?

They seem more than acceptable to me.

These are figures using synchronous logging, right? They seem more than acceptable to me. For asynchronous loggers, the performance regression is probably higher, since the getCurrentStackTrace() trick only gives a meaningful result if it is called near to the point, where the exception was generated.

Regardless of the results of asynchronous loggers, I think that the throwable chapter of 2.25.0 is almost over. I think that we only need to:

  • Replace %xEx with the much improved %ex as default throwable formatter.
  • Remove all the calls to LogEvent.getThrownProxy() from the codebase. They are useless and expensive.

If users complain about the asynchronous performance of %xEx, we could think about introducing an option that collects the packaging information on the calling thread and passes it in the LogEvent.

@vy vy changed the title Fix %xEx Performance regression Fix extended stack trace (i.e., %xEx) rendering performance regression Oct 31, 2024
@vy
Copy link
Member

vy commented Oct 31, 2024

I am still observing a ~10x performance regression. I use the following crude script:

import org.apache.logging.log4j.core.LogEvent;
import org.apache.logging.log4j.core.impl.Log4jLogEvent;
import org.apache.logging.log4j.core.pattern.ExtendedThrowablePatternConverter;

import java.net.Socket;

public class Log4j2Issue3123 {

    public static void main(String[] args) {
        final Exception error = createException();
        final LogEvent logEvent = Log4jLogEvent.newBuilder().setThrown(error).build();
        final ExtendedThrowablePatternConverter converter = ExtendedThrowablePatternConverter.newInstance(null, null);
        long startInstantNanos = System.nanoTime();
        final StringBuilder buffer = new StringBuilder();
        for (int i = 0; i < 5_000_000; i++) {
            buffer.setLength(0);
            converter.format(logEvent, buffer);
        }
        double durationSeconds = (System.nanoTime() - startInstantNanos) / 1e9;
        System.out.format("took %.3f s%n", durationSeconds);
        // 2.24.1 - 1.753 s
        // 2.25.0 - 11.281 s
    }

    private static Exception createException() {
        try {
            new Socket("localhost", -1);
            throw new IllegalStateException("should not have reached here");
        } catch (final Exception error) {
            return error;
        }
    }

}

@vy
Copy link
Member

vy commented Oct 31, 2024

My benchmark shows that ...

@jengebr, mind sharing the details about your benchmark, please?

@jengebr
Copy link
Contributor

jengebr commented Nov 1, 2024

Sure - my benchmark is attached. Be sure to use -Xmx3g -Xms3g -XX:+UseParallelGC.

I vary the scenario by changing the classpath to include the desired log4j version.

log4j_benchmark.zip

@jengebr
Copy link
Contributor

jengebr commented Nov 1, 2024

@vy I confirm your results; profiling shows your script is bottlenecked on the String.format calls which are no longer called by my benchmark (after the Lazy objects were used).

@alan0428a
Copy link
Contributor Author

alan0428a commented Nov 2, 2024

@vy @jengebr
I also can confirm the regression by running vy's script.
But in my profiling, the String.format() still takes a lot of time after we wrap it with Lazy object.
So I replace it with 3fa242a. This can reduce the time for render part. The total time now reduce to around 9 seconds on my PC.

3fa242a_improvement

@alan0428a
Copy link
Contributor Author

alan0428a commented Nov 2, 2024

And I noticed that compare to 2.24.1 baseline, the context creation part 2.25.0 is much more slower.
In 2.24.1, context creation is done in Log4jLogEvent.getThrownProxy(). And in 2.25.0, it's done in ThrowableExtendedStackTraceRenderer#createContext. And the same code takes way more time in 2.25.0.

2 24 1_baseline 2.24.1 2 25 0 2.25.0 commit 3fa242a

@alan0428a
Copy link
Contributor Author

alan0428a commented Nov 2, 2024

Oh...I think I know the reason. The test script is actually not fair. The 2.24.1 will store the ThrowableProxy in the Log4jLogEvent, so context creation only called once. But in the new implementation, the context creation gets called every time.
If we move this line final LogEvent logEvent = Log4jLogEvent.newBuilder().setThrown(error).build(); into the for-loop in the script. I can get similar CPU time result compare to 2.24.1. But the memory allocation is 2x.

Screenshot 2024-11-02 at 2 52 07 PM Screenshot 2024-11-02 at 2 52 42 PM

@alan0428a
Copy link
Contributor Author

alan0428a commented Nov 2, 2024

I found that current way to load class wastes 4GB more memory, compare to old approach(simple if-else way) according to the profiler. I guess it's the overhead of Stream API.
Maybe it's better to switch back to old way?

@alan0428a
Copy link
Contributor Author

@vy
Copy link
Member

vy commented Nov 2, 2024

in my profiling, the String.format() still takes a lot of time after we wrap it with Lazy object.
...
The test script is actually not fair.
...
I found that current way to load class wastes 4GB more memory, compare to old approach (simple if-else way)

Awesome findings @alan0428a! 🤩
Thanks so much for helping with the troubleshooting! 🙇
I will incorporate the changes you suggested and re-run the benchmark.

@vy
Copy link
Member

vy commented Nov 2, 2024

Also the current way of using Throwable as a key for the map in metadata might be memory expensive?

@alan0428a, what do you mean by this? What would be your suggestion instead?

@vy
Copy link
Member

vy commented Nov 2, 2024

I can confirm that with the ClassResourceInfo and ThrowableExtendedStackTraceRenderer#loadClass() fixes @alan0428a tipped, 2.25.0-SNAPSHOT shows an (almost) identical performance to 2.24.1. 🥳 I think the performance regression hunt is over. Let's wrap up this work.

  • Replace %xEx with the much improved %ex as default throwable formatter.
  • Remove all the calls to LogEvent.getThrownProxy() from the codebase. They are useless and expensive.

@ppkarwasz, agreed. I will implement the changes you suggested.

@vy vy merged commit 70f058d into apache:2.x Nov 2, 2024
9 checks passed
vy added a commit that referenced this pull request Nov 2, 2024
@alan0428a alan0428a deleted the fix/issue-3106 branch November 3, 2024 04:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
layouts Affects one or more Layout plugins performance Issues or PRs that affect performance, throughput, latency, etc.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants