Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change Function.prototype.toString() to be ES6 compatible #1139

Merged
merged 4 commits into from
Dec 5, 2016

Conversation

svaarala
Copy link
Owner

@svaarala svaarala commented Dec 3, 2016

ES5.1 requirement is that Function.prototype.toString() result must be parseable as a FunctionDeclaration with no further requirements (it doesn't have to compile into an equivalent function, for instance).

ES6 changes this so that the source code must either parse to an equivalent function, or must else cause a SyntaxError in parsing. See comments starting from: #1135 (comment).

  • Change the .toString() behavior
  • Update testcases
  • Check Emscripten compatibility
  • Migration notes
  • Releases entry

@svaarala svaarala added the es2015 label Dec 3, 2016
@svaarala svaarala added this to the v2.0.0 milestone Dec 3, 2016
@svaarala
Copy link
Owner Author

svaarala commented Dec 4, 2016

Here's the current behavior in this branch, and a comparison to V8.

For a native function:

duk> String(Math.cos)
= "function cos() { [native code] }"

> String(Math.cos)
'function cos() { [native code] }'

So output for non-bound native functions is now the same.

For Ecmascript functions:

duk> String(function foo(a,b) { return a+b; })
= "function foo() { [ecmascript code] }"

> String(function foo(a,b) { return a+b; })
'function foo(a,b) { return a+b; }'

Here there's a natural difference as V8 provides the source. Duktape output won't eval() parse anymore, as required by ES6.

Bound functions behave differently:

duk> String(Math.cos.bind())
= "function bound cos() { [bound code] }"

> String(Math.cos.bind())
'function () { [native code] }'

The main difference is that V8 looks up the final non-bound function and shows its type (native or Ecmascript). For some reason V8 also drops the function name, showing neither "cos" (the target function's .name) nor "bound cos" (the bound function's .name).

The current Duktape output re: function names is "broken" in the sense that invalid function names (like a space in the bound function case) are allowed. But that's actually OK because it just causes a SyntaxError earlier than the function body.

I think I'll change the bound function behavior so that the native/ecmascript type would be shown and the bound status of the function would be lost. However, with that behavior it'd make most sense to show the target function's (non-bound function, that is) .name in the result. In other words, String(Math.cos) and String(Math.cos.bind()) would have the same output.

@svaarala
Copy link
Owner Author

svaarala commented Dec 4, 2016

V8 behavior for bound Ecmascript function is quite interesting and probably leaks a bit of the internals:

> func = function foo(a,b) { return a+b; };
[Function: foo]
> String(func)
'function foo(a,b) { return a+b; }'
> String(func.bind(null, 1))
'function () { [native code] }'

@svaarala
Copy link
Owner Author

svaarala commented Dec 4, 2016

There are some suggestions here (linked by @fatcerberus earlier): http://www.2ality.com/2016/08/function-prototype-tostring.html.

Following that would mean that a function's name would be omitted unless it was an intrinsic, so that:

> (function foo() {}).bind(null).toString()
'function () { [native code] }'

There's no internal distinction between "intrinsic" names and other names now, so this approach isn't possible right now.

@fatcerberus
Copy link
Contributor

The way I read that proposal it seemed like it was just bound functions and native functions for which the name would be omitted.

@svaarala
Copy link
Owner Author

svaarala commented Dec 5, 2016

Sure - but only because the proposal calls for actual source code to be returned for Ecmascript functions?

If that's not implemented, then something else must happen.

@svaarala
Copy link
Owner Author

svaarala commented Dec 5, 2016

So if you take these two:

Whenever possible – source code: If a function was created via ECMAScript source code, toString() must return that source code. In ES2016, whether to do so is left up to engines.

Since that's not possible, at least right now:

Otherwise – standardized placeholder: In ES2016, if toString() could not (or would not) create syntactically valid ECMAScript code, it had to return a string for which eval() throws a SyntaxError. In other words, eval() must not be able to parse the string. This requirement was forward-incompatible – whatever string you come up with, you can never be completely sure that a future version of ECMAScript doesn’t make it syntactically valid. In contrast, the proposal standardizes a placeholder: a function whose body is { [native code] }. Details are explained in the next section.

So I would read this to mean Ecmascript functions would also use { [native code] } if you weren't able to produce the source, so that a standardized placeholder is used.

The proposal of course has no bearing re: compliance, and I'd like to see directly from the function .toString() whether it's a native or a script function, even if source code wasn't available.

I'm not sure about bound functions though. If I had to choose, the current Duktape behavior is worse compared to looking up the target function and showing its type (hiding the bound status of the initial argument). Ideally both would be visible, but that's extra code and maybe not worth it.

A hybrid option would be to show the initial function's name but look up the native/Ecmascript type from the non-bound target. That way e.g. Math.cos.bind() would produce:

function bound cos() { [native code] }

which simultaneously indicates it's a bound function (via the required ES6 name), but also shows that the non-bound target is a native function. The downside is that the name looks funny (with the space) but that has no practical compliance impact because the whole thing is only required to eval() to SyntaxError. It doesn't matter that the SyntaxError comes from the function name rather than the function body.

@svaarala
Copy link
Owner Author

svaarala commented Dec 5, 2016

Specifically what I don't like about the V8 behavior is this:

> (function foo() {}).bind(null).toString()
'function () { [native code] }'

The name is hidden, and "native code" sounds wrong from an Ecmascript target function.

I would Duktape to show at the very least:

function () { [ecmascript code] }

or more preferably include the name of the target:

function foo() { [ecmascript code] }

or maybe the name of the bound function (hybrid approach described above):

function bound foo() { [ecmascript code] }

@fatcerberus
Copy link
Contributor

The intent of the proposal, at least the way I read it, is that Ecmascript functions MUST carry source code. Hence the bit that "in ES2016, whether to do so was left up to engines".

Like you said though, the proposal has no bearing on compliance, and I agree that it's weird to call an Ecmascript function "native code". Anyway for ES6/7 compliance, the [ecmascript code] should be enough as all the spec requires right now is that it cause a SyntaxError.

@svaarala
Copy link
Owner Author

svaarala commented Dec 5, 2016

The intent of the proposal, at least the way I read it, is that Ecmascript functions MUST carry source code. Hence the bit that "in ES2016, whether to do so was left up to engines".

Yes, I understood that :) But if you don't do that, the proposal still tries to say that "In contrast, the proposal standardizes a placeholder: a function whose body is { [native code] }.".

This is used for bound functions, even if they are Ecmascript functions. This is actually exactly what V8 does.

So, all I was saying is that this same approach could be used for bound functions in Duktape (or perhaps even Ecmascript functions). But that doesn't make much sense to me.

@svaarala
Copy link
Owner Author

svaarala commented Dec 5, 2016

Re: that proposal document, note that it discusses two separate things. The first is a concrete proposal on what to return in each case. I'm not talking about that.

The second is about how the ES6 requirement of returning something that's a SyntaxError is actually rather ill-defined because there's no way to know what syntax is added down the line. So they propose a standard placeholder. And that placeholder is used whenever you don't produce an actual source. This placeholder still has relevance even if you don't follow the other parts of the proposal because of the forward-compatibility issue.

But in practical terms [native code], [ecmascript code], [bound code], and (currently) [lightfunc code] are similar enough to very likely be forward compatible enough. It's at least unlikely Duktape will accidentally change its own behavior without taking any future syntax into account :-) But it would theoretically matter if the .toString() output was parsed between versions or between engines.

@fatcerberus
Copy link
Contributor

Ah, I see what you mean now, sorry :)

Regarding V8's behavior for bound functions, I assume it doesn't return source code because that wouldn't meet the requirement of parsing to "a functionally equivalent object", so it just falls back on returning a SyntaxError-inducing stub.

I also remember reading somewhere in the specification that you're not supposed to reveal the target of a bound function, but I don't recall where I saw that now.

@svaarala
Copy link
Owner Author

svaarala commented Dec 5, 2016

I also remember reading somewhere in the specification that you're not supposed to reveal the target of a bound function, but I don't recall where I saw that now.

I don't recall offhand either - that might make sense for the actual function reference. But since the bound function .name already reveals the target's name it'd be quite odd if the target type (native or not) would be off limits.

String output must either (a) eval() back as an equivalent object, OR
(b) eval() to a SyntaxError.

Previous Duktape output, conformant to ES5/ES5.1, matched FunctionDeclaration
syntax but didn't parse back as an equivalent function.
@svaarala
Copy link
Owner Author

svaarala commented Dec 5, 2016

I'll merge this with the behavior above, specifically for bound functions:

function bound cos() { [bound code] }

That will fulfill the ES6 requirement and is analogous to what exists in master currently. This can be improved in follow-ups if there's a good, clear idea for e.g. bound functions.

@svaarala
Copy link
Owner Author

svaarala commented Dec 5, 2016

Emscripten compatibility isn't affected by the change.

@fatcerberus
Copy link
Contributor

In regards to real-world usage of .toString() expecting to get actual function source code, I mentioned a few days ago that AngularJS did this, but didn't have a citation at the time beyond a passing mention in a SO question. Well, here's an article talking about that:
http://perfectionkills.com/state-of-function-decompilation-in-javascript/

@svaarala
Copy link
Owner Author

svaarala commented Dec 5, 2016

There are indeed real world code bases relying on function .toString() - I just find that quite awkward and prone to break when new language features are introduced.

But that's a topic for another pull: eventually Duktape should provide function source code too, but doing so is more than just tacking the compiler input as a property because of the variety of different syntaxes where functions may come from (object literal getters/setters, Function constructor, etc). So to make the source actually work, some processing is needed. That's the main reason I haven't yet implemented it.

Also for low memory targets it'd be nice if the source code could be compressed using some simple text-aware algorithm. Maybe bit packed like the built-in strings. It could then be expanded on request.

@fatcerberus
Copy link
Contributor

Interestingly, as mentioned in the comments of that article, SpiderMonkey used to reconstruct functions by decompiling then bytecode, but around FF 17 that was changed to just store the original source verbatim. I think if anything were done in Duktape towards supporting this, that latter method would make most sense.

On the topic of this pull: It is apparently common to stringify bound functions as though they were native (the article mentions that too). I prefer [bound code] personally, and from a practical standpoint, if [native code] will remain a SyntaxError going forward, substituting any other word in place of "native" will almost certainly remain so as well.

@svaarala
Copy link
Owner Author

svaarala commented Dec 5, 2016

[...] around FF 17 that was changed to just store the original source verbatim. I think if anything were done in Duktape towards supporting this, that latter method would make most sense.

I never considered anything else than using the original source - doing otherwise would be a huge footprint impact because bytecode-to-source translation would be very expensive.

But the thing is, you can't just use the source because of the syntactic variants, so some processing is needed. It's not a huge undertaking though, just manufacturing a function expression around a plain body (which is already done to some extent by the Function constructor).

@svaarala
Copy link
Owner Author

svaarala commented Dec 5, 2016

Opened #1147 for discussing the source code issue; it's out of scope for this pull.

@svaarala svaarala merged commit c4d7dbd into master Dec 5, 2016
@svaarala svaarala deleted the es6-function-tostring branch December 5, 2016 17:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants