Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Numbat lsp #538

Draft
wants to merge 4 commits into
base: master
Choose a base branch
from
Draft

Numbat lsp #538

wants to merge 4 commits into from

Conversation

irevoire
Copy link
Contributor

Hey @sharkdp,

Contributing to the numbat standard was so much a pain in the ass that I started working on an LSP.
Currently, it only outputs some errors, but it’s still very buggy; most of the code I wrote lives in numbat-lsp/src/main.rs. Almost everything else comes from this repo that I cloned.

Screen.Recording.2024-08-15.at.11.31.47.mov

Adding support for completion + gotodefinition doesn't seem impossible.
Maybe we could even push further and get the types to work.

I was wondering if this was the kind of stuff you would be willing to accept in the future?


Currently, I’m hitting this bug a lot: ebkalderon/tower-lsp#413
I’m not sure, but maybe tower-lsp is not really prod ready, and we'll need to move out of it? 😩

@sharkdp
Copy link
Owner

sharkdp commented Aug 15, 2024

Awesome stuff 😄. LSP is definitely something I'd like to look into. I only had a very brief look into how this works and it seems like you run numbat::Context::interpret* and then collect errors from there. Is this a sane approach to building a language server? What if there are side effects? I would imagine a language server only runs the frontend of the compiler (tokenizer-parser-typechecker), but not the backend (code generation and execution), where no errors can occur.

@RossSmyth
Copy link
Contributor

Just popping in because this is a topic I'm familiar with.

Is this a sane approach to building a language server?

It's not too bad. Rust-Analyzer gets most of its diagnostics from running rustc upon saving the file. There are some diagnostics baked in though, and it does have a front-end and part of the middle end in it for semantic analysis. The sema implemented allows for "go-to definition", semantic token highlighting, completion, and "lightbulb" refactors among other things.

So in reality having a front-end is pretty much required.

The front-end should have a lossless view of the syntax, what this really ends up meaning is having a concrete syntax tree rather than an abstract syntax tree so that refactors don't clobber trivia. This is also great for formatters and rustfmt hits this limitation, and rustfmt has been considering switching to a CST.

Other useful posts:
https://rust-analyzer.github.io/blog/2020/09/28/how-to-make-a-light-bulb.html
https://rust-analyzer.github.io/blog/2019/11/13/find-usages.html

@sharkdp
Copy link
Owner

sharkdp commented Aug 29, 2024

It's not too bad. Rust-Analyzer gets most of its diagnostics from running rustc upon saving the file.

Ok, but the rustc backend doesn't execute the code. numbat::Context::interpret runs the whole compiler and the execution on the bytecode VM. This is what I meant by: "is this a sane approach …".

So in reality having a front-end is pretty much required.

I have no issue with that. I would also imagine the Numbat LSP to reuse the tokenizer, parser, and semantic analysis stages (prefix handling, name resolution, type checker) of the compiler.

The front-end should have a lossless view of the syntax, what this really ends up meaning is having a concrete syntax tree rather than an abstract syntax tree so that refactors don't clobber trivia. This is also great for formatters and rustfmt hits this limitation, and rustfmt has been considering switching to a CST.

That's something we definitely don't have at the moment. We lose all whitespace information (and things like parens in expressions, see also #102) during parsing.

@RossSmyth
Copy link
Contributor

I would definitely look into generating CST structures with ungrammar as I do find that construction pretty valuable for your parser, so you don't throw information away that would be useful for LSP things and error reporting.

Another thing would be making the lexer infallible in a similar vein to how your parser is already. The most common way of doing that is making an error token that can be emitted. So then the type signature of the lexer would become fn scan(&mut self) -> Vec<Token>. Adding a debug config that checks that all tokens are contiguous is also a good idea for testing. Here's an example of how I've done it in the past.
https://github.com/RossSmyth/meowfile/blob/2a89f23f1a34a27b3275de9d004113deed3f6932/crates/lex/src/lib.rs#L16-L19

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants