-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Fix #23224: Optimize simple tuple extraction #23373
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Split some change into a separate PR #23380 to diagnose the errors. |
423dbad
to
545dd69
Compare
Added a byte code test to ensure there is no tuple creation in the generated code. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any evidence that this is actually better? It should be a straightforward case of unescaping alloc for the JVM to optimize away. The Scala.js optimizer, for example, routinely stack-allocates the produced tuple.
If this doesn't actually improve run time, we're introducing some complexity in the compiler for no reason.
If it does actually improve performance, why do it only for tuple extraction, and not for extraction of other case classes?
@@ -0,0 +1,43 @@ | |||
|
|||
class Test: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the purpose of this test? As a pos
test, it doesn't test any rewrite rule.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It just tests and explains different cases for desugaring, by different number of variables in the pattern.
""".stripMargin | ||
checkBCode(code) { dir => | ||
val c = loadClassNode(dir.lookupName("C.class", directory = false).input) | ||
assertNoInvoke(getMethod(c, "f1"), "scala/Tuple2$", "apply") // no Tuple2.apply call |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should also make sure that there is no new Tuple2
.
@@ -2816,6 +2816,7 @@ class Typer(@constructorOnly nestingLevel: Int = 0) extends Namer | |||
if isStableIdentifierOrLiteral || isNamedTuplePattern then pt | |||
else if isWildcardStarArg(body1) | |||
|| pt == defn.ImplicitScrutineeTypeRef | |||
|| pt.isBottomType |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you comment on why the bottom-related changes are necessary? I find them suspicious. This change should be nothing but an optimization, so it shouldn't change typing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have tests like: val (a, b) = ???
. Since we now bind the tuple pattern to a variable, I want to give the bind variable a tuple type instead of a bottom type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that necessary for correctness? There is really no point in improving the code for such an extraction. By definition, the code will throw before it gets to the tuple extraction.
Absolutely, the JVM can often optimize many cases when it proves that a Tuple is pure and its temporary object doesn't escape. As a compiler, we should still try our best to generate efficient and non-redundent code possible. Consider extracting values from a tuple that contains both an Int and a String. Previously, this required unboxing the Int twice, boxing it again, and creating a throwaway tuple, all for no benefit. This PR doesn’t add extra complexity; instead, it removes unnecessary work introduced by earlier design decision. I chose to optimize tuple handling because tuples are widely used for return multiple values, as well as in compiler itself (see the regular text search results in the codebase). Ideally, we would extend this optimization to other data structure, but it is hard to do before typing. |
Not really. Our job is to generate code that the next compiler in line will be able to optimize. If you're generating assembly, you want to generate code that the processor will be happy about (no unpredictable branches, for example). If you're generating JVM bytecode, you want to generate code that the JVM will be happy about. There is a whole chain of compilers to think about.
It generates code that is simpler. But the compiler code is definitely more complex. The increase of lines of code in the compiler is clear. These are new code paths, that behave in a special way in special situations. If we are making the compiler more complex, but we are not measurably improving performance, it's a net loss. |
We could maybe benchmark on Scala Native? This might be easier to do. And if we get a net win there it would be a justification. |
As a chain of compilers, I think at each stage, we should generate code that makes "sense" with the best effort. We should not generate a bunch of non-sense allocations/boxing and rely on a blackbox to "optimize" every mistake. So this is a principle problem for me and this PR partially fixes the problem, it's not about how many lines are added to the codebase. |
We may benchmark this using jmh to monitor the tuple allocation number and memory usage, if someone can help? |
My main issue with jmh is that JVM optimization is holistic - it might be
that we see no win on some code bases and big wins on others. And we have
no insight why. So instead of relying on jmh we should probably
directly look at what kind of assembly the JIT produces for this. That's
why I think scala native would be much easier to evaluate.
- Martin
…On Tue, Jul 1, 2025 at 3:37 PM noti0na1 ***@***.***> wrote:
*noti0na1* left a comment (scala/scala3#23373)
<#23373 (comment)>
We may benchmark this using jmh, if someone can help?
—
Reply to this email directly, view it on GitHub
<#23373 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAGCKVQ6IGFITJYVAMTYZ4T3GKFIZAVCNFSM6AAAAAB7L4FRJGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTAMRUGA3DGMJSGU>
.
You are receiving this because your review was requested.Message ID:
***@***.***>
--
Martin Odersky
Professor, Programming Methods Group (LAMP)
Faculty IC, EPFL
Station 14, Lausanne, Switzerland
|
OK, I am able to write some recursive code with tuple to stress test the code, and my PR has measurable performance improvement. // P(n) = P(n-2) + P(n-3)
def padovan(n: Int, v: (Int, Int, Int) = (1, 0, 0)): Int =
val (v_n_3, v_n_2, v_n_1) = v
if n == 0 then v_n_3
else padovan(n - 1, (v_n_2, v_n_1, v_n_2 + v_n_3))
@main def Test =
val n = 1000000000
val i = padovan(n)
println(s"padovan of $n is $i") We can ignore the result because of overflow. Without this PR: ~7.8s The results are from many runs. |
@odersky @sjrd @noti0na1 With openjdk-23.0.1. Comparing the two functions below: object JitTest:
def f2(x0: Int, x1: Int, x2: Int, x3: Int): Int =
val (y0, y1, y2, y3) = (x0 * 2, x1 * 3, x2 * 5, x3 * 7)
y0 + y1 + y2 + y3
def f4(x0: Int, x1: Int, x2: Int, x3: Int): Int =
val y0 = x0 * 2
val y1 = x1 * 3
val y2 = x2 * 5
val y3 = x3 * 7
y0 + y1 + y2 + y3 Hotspot(C2) output (after millions of runs): For 'f2':
And for
It's not even close. In f2 we allocate, box, unbox, it's all over the place. You just can't rely on |
Another case is |
Fix #23224:
This PR optimizes simple tuple extraction by avoiding unnecessary tuple allocations and refines the typing of bind patterns for named tuples.
makePatDef
to reduce tuple creation when a pattern uses only simple variables or wildcards.For example:
Before this PR:
After this PR:
Also in genBCode now:
I use the regular expression (
val\s*\(\s*[a-zA-Z_]\w*(\s*,\s*[a-zA-Z_]\w*)*\s*\)\s*=
) to search in the compiler, and found 400+ places which are simple tuple extraction like this.