-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
What is the status of per-chunk caching? Is it supported/planned? #48
Comments
No, there is not per-chunk caching, since that's not really practical across languages. In the future, it may be possible to have per chunk caching for some languages, as long as there is pre-existing software that can manage caching. I believe knitr has per-chunk caching for R and possibly Julia and Python...there might be a way to leverage some existing solutions. |
I am working on a project called Pylightnix which in theory should handle the required caching. If you don't mind, I could try to add this feature to the codebraid. I plan to find the place where python blocks are executed and attempt to wrap it into pylightnix "stages". |
I looked at Pylightnix, and was also wondering about dealing with global state. You can basically think about the code chunks as a list of strings, with each string being the code from a code chunk. If you can come up with a function that takes such a list of strings and executes them with caching, then that function can be incorporated into Codebraid. If you want to try to implement caching, I'd suggest working on a function that operates on a list like this first, before trying to build something within Codebraid itself. (Also, I'm working on adding new features to Codebraid that involve a lot of modifications, so the existing code is about to change significantly.) There are a few ways that we might get some caching without full per-chunk caching. Let me know if any of these are of interest for what you are doing.
|
Thanks for your advices. I agree that indeed it would be better for me to do a simplified proof-of-concept first. I thought a bit more on the problem: I don't like Jupyter because I think it is too heavy to be manageable. Instead, it may be just fine to open a pipe pointing to the python shell running in the background and save this pipe as a file. Then I could require users to pass the name of this file a an argument and call it a poor-man's serialization of the interpreter state:) The rest of the demo should not be hard - I think we could assume that (a) lines of code in each chunk are "prerequisits" for this chunk; (b) the output recevied from the pipe during the last execution of the chunk is the "artifact" of this chunke that needs to be cached (I'll ignore stderr for simplicity); (c) the job now is to build dependencies between chunks, e.g. by saying that each chunk depends on all previous chunks in a file. That could be a bit fragile, but I think it could work. |
A pipe might work. Saving the pipe and then passing it as an argument for the next document build might not be necessary. I'm interested in adding a new mode where Codebraid runs continuously in the background and automatically rebuilds the document under various conditions. For example, when the document is saved it could be rebuilt with all code replaced by the text "waiting for results", and then every 10 seconds it could be rebuilt with all code results that are available by that time. This will ultimately allow for a (nearly) live preview mode. |
Got it. I'm aware that there are compilers which work this way, some of my colleagues used one for compiling Haskell code in the background. However, I have an impression that Python environment will never be stable enough to withstand a moderatly-long editing session: as an example, I have to restart my IPython console from time to time to let it re-load files and fix some internal problems with multiple versions of classes. Apart from these doubts, I agree that it could be a nice feature. Meanwhile I've uploaded a small proof-of-concept application called MDRUN. It processes Markdown documents by sending code sections through the Python interpreter. It runs everything in one-pass, uses non-trivial POSIX-plumbing to keep the interpreter alive between sessions. It also uses Pylightnix for the per-chunk cache management, as planned. At every run the program evaluates only the changed sections and their successors. An example input document is here and there is the result. I'm going to keep the master branch of Pylightnix in a working state for some time, including this sample. Feel free to let me know when/if you think I could help with adding a similar feature to the codebraid. |
The current built-in code execution system is based on templates. The code from the Markdown document is extracted from the Pandoc AST, then inserted into templates to create a source file that is executed. For this approach, adding new code execution features means creating new templates. This isn't ideal for what you need. I've been working on adding support for running code with interactive subprocesses like a Python interactive shell for some time. I'm currently in the midst of modifying the built-in code execution system to add better support for this as well as some async-related features. Once this is finished, adding new code-execution features will be possible by specifying an executable that will read code from stdin (or potentially a file) and write (properly formatted) code output to stdout (or potentially a file). This should make it straightforward to use a slightly modified version of your MDRUN.py with the built-in code execution system. It will probably be at least a few weeks till the new features are finished...it's part of a larger set of features I've been working on for months. I will try to remember to add a note in this issue when that's available for experimentation. If I don't add a note in the next month or so, you might check back about progress. |
FYI The request to cache the results was mainly to enjoy the partial document evaluation. Now I've implemented the latter feature as a separate project, see LitREPL. The editor (currently - vim) sends the whole document to the backend which extracts code/result sections using a lightweight parser, pipes the code through the background interpreter (Python and IPython are supported), produces the result and finally sends the document back to the editor. The communication is performed via Unix pipes, thus, there is a POSIX-compatible OS requirement for now. I found that the Lark library greatly simplifies the parsing business. With its help, the tool supports both Markdown and Latex document formats. Feel free to borrow the code if needed, I've used the same BSDv3 license as you do in Codebraid. |
Hi. I quickly reviewed the documentation but found no clues about per-chunk caching. I suppose it is not supported, is it?
The text was updated successfully, but these errors were encountered: