Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid reusing temps whose refs might be captured #76009

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

jjonescz
Copy link
Member

@jjonescz jjonescz commented Nov 21, 2024

Fixes #67435.

The idea is to have a "heuristic" to detect whether a method might capture the references passed to it. If such method call is detected, and a temporary reference is being emitted, we lift the temp to live for the whole block instead of just the expression.

The block lifetime is enough - ref safety analysis already checks refs to rvalues cannot escape blocks.

The heuristic is implemented by CodeGenerator.MightEscapeTemporaryRefs. It runs on the lowered nodes (because it's the emit layer which decides to emit a temporary). It might have false positives (some calls like M(rvalue, out _) might be marked by the heuristic as dangerous but they are not), but it shouldn't have false negatives.

Without a heuristic, we would need to avoid reusing many more temps, which would be a regression (at least in IL size). But perhaps that's negligible and it would be better to avoid this complexity? I'm not sure.

@dotnet-issue-labeler dotnet-issue-labeler bot added Area-Compilers untriaged Issues and PRs which have not yet been triaged by a lead labels Nov 21, 2024
@jjonescz jjonescz marked this pull request as ready for review November 21, 2024 18:32
@jjonescz jjonescz requested a review from a team as a code owner November 21, 2024 18:32
// At the same time, these expression locals might be lifted to the containing block
// (to avoid reusing them if they might be captured by a ref struct).
// Then we use this map to keep track of the redeclared locals.
private Dictionary<LocalDefinition, LocalDefinition>? _redeclaredLocals;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

private Dictionary<LocalDefinition, LocalDefinition>? _redeclaredLocals;

It is not obvious what do map to what here. Could you, please, elaborate and add a comment?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It maps from the old LocalDefinition to the new LocalDefinition (which is redeclaring/shadowing the old one). Does that make sense?

I will try to explain it in the comment. Thanks.

@@ -887,7 +887,7 @@ public BoundCall Call(BoundExpression? receiver, MethodSymbol method, ImmutableA
return new BoundCall(
Syntax, receiver, initialBindingReceiverIsSubjectToCloning: ThreeState.Unknown, method, args,
argumentNamesOpt: default(ImmutableArray<String?>), argumentRefKindsOpt: refKinds, isDelegateCall: false, expanded: false, invokedAsExtensionMethod: false,
argsToParamsOpt: ImmutableArray<int>.Empty, defaultArguments: default(BitVector), resultKind: LookupResultKind.Viable, type: method.ReturnType)
argsToParamsOpt: default, defaultArguments: default(BitVector), resultKind: LookupResultKind.Viable, type: method.ReturnType)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

argsToParamsOpt: default,

Why is this change necessary?

Copy link
Member Author

@jjonescz jjonescz Nov 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the utility Binder.GetCorrespondingParameter that's used in this PR as part of CodeGenerator.MightEscapeTemporaryRefs would fail on an assert - it expects argsToParamsOpt to be either default or matching the number of arguments:

Debug.Assert(argumentOrdinal < argsToParamsOpt.Length);

Which seems like a reasonable invariant which this callsite was violating.

@@ -673,6 +673,8 @@ private void EmitBlock(BoundBlock block)
{
EmitUninstrumentedBlock(block);
}

ReleaseBlockTemps();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ReleaseBlockTemps();

This feels fragile because we cannot assume that this block necessary maps to one syntactically present in source, which was used for "ref safety analysis already checks refs to rvalues cannot escape blocks". We still might not be done with that block.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we cannot assume that this block necessary maps to one syntactically present in source

Why cannot we assume that? To me it seems we don't synthesize BoundBlocks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why cannot we assume that? To me it seems we don't synthesize BoundBlocks.

Because we shouldn't be assuming that. There is nothing wrong in introducing a bound block.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is nothing wrong in introducing a bound block.

This comment suggests otherwise:

BoundBlock specify SCOPE (visibility) of a variable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment suggests otherwise: ...

I am failing to guess what part of the comment you find relevant. Could you quote that part, etc.?

For example, SyntheticBoundNodeFactory has a bunch of helpers to synthesize blocks and they are used for one reason or the other. There is also a way to synthesize a block manually. So, even if you haven't found an example, it doesn't mean there isn't one already, or that one won't be introduced in the future.

Copy link
Member Author

@jjonescz jjonescz Nov 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I was referring to this line: "BoundBlock specify SCOPE (visibility) of a variable" but that doesn't imply it's wrong to synthesize blocks, you're right.

Do you have suggestions how to best solve this?

I'm thinking if we can assume blocks are never lost during lowering, we could perhaps make sure synthesized blocks are marked as such (e.g., via WasCompilerGenerated; or maybe we could mark user-defined blocks instead) and then we could call ReleaseBlockTemps only for non-compiler-generated blocks.

Copy link
Contributor

@AlekseyTs AlekseyTs Nov 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have suggestions how to best solve this?

I am leaning towards a simpler and a more conservative approach - simply dropping "might escape" temps from _expressionTemps as EmitAssignmentValue does, or not adding them to that list, whichever is more natural from the implementation perspective.

@@ -64,6 +64,23 @@ public override bool Equals(object? obj)
// maps local identities to locals.
private Dictionary<ILocalSymbolInternal, LocalDefinition>? _localMap;

// The lowered tree might define the same local symbol
// in multiple sequences that are part of one expression, for example:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in multiple sequences that are part of one expression, for example:

This doesn't sound right. I think we should fix this, a local symbol should belong to exactly one scope in the bound tree. I assume this is a pre-existing condition, i.e. it is not introduced by this change. If so, I would prefer the fix to go into a separate PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, I will look into that.

Just to confirm: I think this "invalid" tree shape is introduced somewhere in lowering simply by reusing the same node twice, e.g., something like _factory.Call(arguments: [node, node]), and that node happens to contain a BoundSequence which then causes the local to be declared twice in the tree. So are you saying we should never do that (reuse the same node) and if we do it somewhere, it's a bug (hopefully a rare one)?

Copy link
Contributor

@AlekseyTs AlekseyTs Nov 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, we can reuse bound nodes as long as that doesn't violate some invariants. For example, we reuse some primitive nodes: for constants, for local references (BoundLocal), etc. I consider "a local symbol used in a tree should belong to exactly one scope in the tree" as one of the invariants that is violated in the specific scenario.


for (var arg = 0; arg < arguments.Length; arg++)
{
var parameter = Binder.GetCorrespondingParameter(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Binder.GetCorrespondingParameter(

I think post lowering phase all the parameters must be in the right order, no one is going to reorder them (lowering does that). Therefore, we probably do not need to use this helper.

arguments,
method.Parameters,
call.ArgumentRefKindsOpt,
mightEscapeTemporaryRefs: MightEscapeTemporaryRefs(call, used: useKind != UseKind.Unused, receiverAddressKind: null));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mightEscapeTemporaryRefs: MightEscapeTemporaryRefs(call, used: useKind != UseKind.Unused, receiverAddressKind: null)

Instead of threading an extra parameter through many layers, consider simply truncating _expressionTemps after the EmitArguments call as EmitAssignmentValue does.

@@ -1754,7 +1767,11 @@ private void EmitInstanceCallExpression(BoundCall call, UseKind useKind)
}
}

emitArgumentsAndCallEpilogue(call, callKind, receiverUseKind);
emitArgumentsAndCallEpilogue(call, callKind, receiverUseKind,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

emitArgumentsAndCallEpilogue

Similar suggestion about truncating the list vs. passing a parameter.

expression.Arguments,
constructor.Parameters,
expression.ArgumentRefKindsOpt,
mightEscapeTemporaryRefs: MightEscapeTemporaryRefs(expression, used));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mightEscapeTemporaryRefs: MightEscapeTemporaryRefs(expression, used)

Similar suggestion about truncating the list vs. passing a parameter.

@@ -1640,13 +1639,11 @@ private void RewriteArgumentsForComCall(

actualArguments[argIndex] = new BoundSequence(
argument.Syntax,
locals: ImmutableArray<LocalSymbol>.Empty,
sideEffects: ImmutableArray.Create<BoundExpression>(boundAssignmentToTemp),
locals: [boundTemp.LocalSymbol],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

locals: [boundTemp.LocalSymbol],

Strictly speaking this is wrong. It looks like we are going to pass a reference to a local that goes out of scope. I know, we probably have done things like that in other places and EmitSequenceAddress compensates for situations like that by not releasing the locals. But that is likely to lead to a worse IL. I suggest reverting the change.

ImmutableArray<ParameterSymbol> parameters,
ImmutableArray<BoundExpression> arguments,
ImmutableArray<int> argsToParamsOpt,
bool expanded)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The last two parameters should not be relevant post lowering.

{
readonlyRefs++;
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

}

What are the scenarios when we can fail to get a parameter and why they are not interesting?

expanded: false);
}

private static bool MightEscapeTemporaryRefs(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MightEscapeTemporaryRefs

I will let Ref-Safety experts to make a call on completeness/correctness of the code in this method. However, it would be good to add comments about explaining the logic.

@AlekseyTs
Copy link
Contributor

Done with review pass (commit 1), tests are not looked at.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area-Compilers untriaged Issues and PRs which have not yet been triaged by a lead
Projects
None yet
2 participants