reuse in flight the project configuration data for a while
put a configurable hard limit (e.g., 100?) on the number of users that can be added to the sandbox project, and change the timeout to 5 minutes. This should probably be in the configuration in the project.
Delaying this to see what we can handle... WHEN I'm not AFK
update frontend app in prod, since it is slightly old and has a bug I hit as admin looking at project settings (that I fixed days ago in the code).
update cocalc-docker on arm and x86_64
So... how would this work? Maybe that's all I should do today is think about this?
A big difference from nbviewer is that there is the possibility of sharing (and soon watching). Thus we need a database entry, i.e., a public path. It would also be nice to make things fully work, e.g., if the notebook links to data in the same repo, it does work.
To do this, we could make it so when you visit such a URL, it clones or updates the repo to files on the share server, and then just uses that. What does nbviewer do?
Idea - clone to project
We could create a special cocalc project that is for the purposes of proxying github. it has some id and in admin settings you enter that in a box.
When a request for github comes in to the share server, we use slightly different logic to handle it. That involves:
check if clone exists -- if so, serve it, but also fire off an update (git pull) in the project itself
if clone doesn't exist, start project, then exec code to clone from github.
This is nice since it would work the same way in dev/docker/cocalc.com
This is very bad, since it would be easy to hit the quota for a project, or basically make one project get really large. Also, in kucalc, the time to go is (1) clone to project, then (2) rsync from project to share server. That's a major increase in the time, and waste of space.
Idea - clone to share server directly
We still have a project dedicated to this
But in kucalc it's just a placeholder to get the project_id so know where to put things in share server
(In cc-in-cc-dev and cocalc-docker, files do end up in project, since things are the same.)
Cloning is actually done by hub-share in kucalc. in cocalc-docker, same code is run directly by share server; will need logic to accomplish this and it'll be a pain to develop. But doable and probably not too hard. It means hub-share needs git installed.
Update/clone: (1) first try git pull , (2) if fails, switch to rm then git clone .
I wonder if trying to use git at all is a terrible idea? The repo could be big... and maybe we just want to work with a single file easily.
Pure memory version
when request comes to share server, we find the corresponding raw url (e.g., with a simple transform)
we fetch that raw content into memory, with some limit on size, e.g., get the first xMB and never more.
also do the other things, e.g., directory listing, projects for org, etc., like nbviewer does.
the edit button in cocalc does something different in the case of github, namely it starts or creates a projects, then runs git clone in that project and finally opens the relevant file.
contributing back to upstream as a PR would then be something we can implement later.
If the notebook refers to a file it still could be possible to work via similar fetch, etc., on that file.
An advantage to all this is that it will be identical in cocalc-docker, dev, kucalc, etc. It also will be optimal in speed, wastes no space, etc. This is clearly the right solution.
I'm going to implement this today.
Rule: we will proxy github url's if and only if there is an organization with the name github. Obviously, only an admin can create such an organization.
Definition: for github url, the public_path_id is sha1 hash of organization id and rest of the github path. Record gets automatically created in the database whenever requested.
I'm causing myself a lot of confusion by trying to design something that will solve a very general problem, e.g., proxying for CUP via a non-github approach. I could implement the most straightforward approach to just github as a special case, then once it works and I get experience, rewrite it to be general if there is design.
#next when grabbing content to render, check if the project_id is githubProxyProjectId and if so, get via fetch.
when copying content to file to edit, check if the project_id is githubProxyProjectId and if so, run a git clone command in the user's project instead of copying from the proxy project.
add link to upstream github repo at top.
With this, staring will still work, though we'll need to special case watch. Also, the target will appear in the list of shared files, which is actually pretty cool, and will provide potentially massive value regarding SEO that we just wouldn't get otherwise.