CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutSign UpSign In
williamstein

Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more,
all in one place.

GitHub Repository: williamstein/scratch
Path: blob/main/2022-06-27-ws.board
Views: 90

Monday

sandbox todo

  • sage worksheets are totally broken!

    • probably also broken in cocalc-docker, so update those images too

  • #now make it so a click is needed to active the sandbox or cocalc embed in all cases (we can preload the app, but only open the project it is pointed out when a link is clicked)? or do nothing.

    • add the sandbox at the top of all the landing pages

    • add a little text about what the sandbox is.

    • re-enable when the above is done.

webassembly python

  • get lvma to build for wasm

Tuesday

Jupyter timeout issues ticket

Try to reproduce in dev project.

I absolutely cannot reproduce this. No clue.

I did make some random hopefully improvements.

I also reuseinflighted all api calls.

Meetings

  • ipywidgets at 9:30am

  • vantage at 10am

  • startup guys from USC at 1pm

Sandbox

reuse in flight the project configuration data for a while

put a configurable hard limit (e.g., 100?) on the number of users that can be added to the sandbox project, and change the timeout to 5 minutes. This should probably be in the configuration in the project.

Delaying this to see what we can handle... WHEN I'm not AFK

Nextjs upgrade

https://nextjs.org/blog/next\-12\-2

upgraded code

build and release

switch to make share search by stars first, then timestamp

also fix bug in rendering this notebook: https://cocalc.com/share/public_paths/dc5b8a2570b3ffe695b57982c397aefb26d8faf9

Wednesday

side/personal things

wapython

get wasm python build past the "no emscripten.h" step.

Ipywidgets message and buffer support

Watch button

for shared files: along with explicit updates only for shares (i.e., you click a button when you want the shared content copied to the share server, and also optionally enter a commit message)

Updates

update frontend app in prod, since it is slightly old and has a bug I hit as admin looking at project settings (that I fixed days ago in the code).

update cocalc-docker on arm and x86_64

So... how would this work? Maybe that's all I should do today is think about this?

A big difference from nbviewer is that there is the possibility of sharing (and soon watching). Thus we need a database entry, i.e., a public path. It would also be nice to make things fully work, e.g., if the notebook links to data in the same repo, it does work.

To do this, we could make it so when you visit such a URL, it clones or updates the repo to files on the share server, and then just uses that. What does nbviewer do?

Idea - clone to project

  • We could create a special cocalc project that is for the purposes of proxying github. it has some id and in admin settings you enter that in a box.

  • When a request for github comes in to the share server, we use slightly different logic to handle it. That involves:

    • check if clone exists -- if so, serve it, but also fire off an update (git pull) in the project itself

    • if clone doesn't exist, start project, then exec code to clone from github.

  • This is nice since it would work the same way in dev/docker/cocalc.com

  • This is very bad, since it would be easy to hit the quota for a project, or basically make one project get really large. Also, in kucalc, the time to go is (1) clone to project, then (2) rsync from project to share server. That's a major increase in the time, and waste of space.

Idea - clone to share server directly

  • We still have a project dedicated to this

  • But in kucalc it's just a placeholder to get the project_id so know where to put things in share server

  • (In cc-in-cc-dev and cocalc-docker, files do end up in project, since things are the same.)

  • Cloning is actually done by hub-share in kucalc. in cocalc-docker, same code is run directly by share server; will need logic to accomplish this and it'll be a pain to develop. But doable and probably not too hard. It means hub-share needs git installed.

  • Update/clone: (1) first try git pull , (2) if fails, switch to rm then git clone .

  • I wonder if trying to use git at all is a terrible idea? The repo could be big... and maybe we just want to work with a single file easily.

Pure memory version

  • when request comes to share server, we find the corresponding raw url (e.g., with a simple transform)

  • we fetch that raw content into memory, with some limit on size, e.g., get the first xxMB and never more.

  • render it.

  • also do the other things, e.g., directory listing, projects for org, etc., like nbviewer does.

  • the edit button in cocalc does something different in the case of github, namely it starts or creates a projects, then runs git clone in that project and finally opens the relevant file.

  • contributing back to upstream as a PR would then be something we can implement later.

If the notebook refers to a file it still could be possible to work via similar fetch, etc., on that file.

An advantage to all this is that it will be identical in cocalc-docker, dev, kucalc, etc. It also will be optimal in speed, wastes no space, etc. This is clearly the right solution.

I'm going to implement this today.

Rule: we will proxy github url's if and only if there is an organization with the name github. Obviously, only an admin can create such an organization.

Definition: for github url, the public_path_id is sha1 hash of organization id and rest of the github path. Record gets automatically created in the database whenever requested.

I'm causing myself a lot of confusion by trying to design something that will solve a very general problem, e.g., proxying for CUP via a non-github approach. I could implement the most straightforward approach to just github as a special case, then once it works and I get experience, rewrite it to be general if there is design.

Plan to implement GitHub proxy:

https://github.com/sagemathinc/cocalc/pull/6026

This is hopefully a "less than one day" project.

  • make it so in site settings admin can set a github proxy project_id; github_proxy_project_id and githubProxyProjectId

    • also add githubProxyProjectId to the customize stuff.

  • in https://cocalc.com/projects/10f0e544-313c-4efe-8718-2142ac97ad11/files/cocalc/src/packages/next/lib/names/public-path.ts make it so it has a special case when the owner is github to use that project_id always, and automatically create the public_path_id record.

  • #next when grabbing content to render, check if the project_id is githubProxyProjectId and if so, get via fetch.

  • when copying content to file to edit, check if the project_id is githubProxyProjectId and if so, run a git clone command in the user's project instead of copying from the proxy project.

  • add link to upstream github repo at top.

With this, staring will still work, though we'll need to special case watch. Also, the target will appear in the list of shared files, which is actually pretty cool, and will provide potentially massive value regarding SEO that we just wouldn't get otherwise.