Google Summer of Code 2019 Project Report
![]() |
![]() |
Introduction
SWAN (Service for Web based ANalysis) is a platform developed by CERN to perform interactive data analysis in the cloud, without the need to install any software. It is built upon Jupyter notebooks and allows users to code and run their scripts via a web interface.
In SWAN, users can create projects and store their notebooks, input/output files, etc. Internally, a project is a special type of folder that can be used as a container for your work. It is also the unit of sharing in SWAN: you can share a project with your colleagues and thus provide them with everything they need to run the notebooks you created.
Software packages in SWAN are retrieved on the fly from an HTTP-based file system called CVMFS (CernVM File System) allowing users to forget about installation, configuration, and compatibility of packages. Due to the centrally managed software, provided by CVMFS, a user trying to open the shared notebook doesn’t have to worry about compatibility with its software stack, since they are the same. However, SWAN is also used by users who need to install their own packages, breaking this seamless sharing experience.
The objective of this GSoC project is to create a package manager for SWAN (in the shape of a Jupyter extension), that will allow users to keep track of the packages needed by their “projects”. This lets users manage their projects and dependencies.
The extension allows users to install python modules (and their respective versions) via a user interface, making them available inside the corresponding project. People who open a shared project would automatically get a prompt to install all the required missing packages. It thus encourages reproducible results and collaborative analysis.
Each project is internally mapped to a separate conda environment and the project metadata is stored on a hidden file as a part of the project itself. This helps abstract the configuration while mapping an independent environment for each project.
SWAN allows the creation of notebooks in four different languages: Python (2 or 3, depending on the software stack chosen during the session configuration), C++, R, and Octave. However, this extension would only work for python projects. For other kernels, it is not certain that the environments guarantee isolated versions of packages.
Features
- View packages installed for a specific project
- Update / Delete existing packages
- Search for new packages and install them
- Sync your project if any of your packages are missing or misconfigured
Setup Instructions
-
This project assumes a SWAN setup. The APIs require certain actions as prerequisites, which are already fulfilled by SWAN.
-
Please find the install instructions here
Usage Instructions
-
From the Projects tab, you can create a folder by clicking on the
+
button. Internally, this will create a new conda environment for all the notebooks inside it. -
To configure the project, click the cog button. This would reveal a side panel listing down the installed packages, along with their corresponding versions.
-
If in case the project metadata and the underlying environment are not in sync, the sidebar will also list the packages that need to be additionally installed. This is fundamental to share projects and collaborate with peers. By default, when a user clones a shared project, the required packages are not installed. This extension will let users install them.
-
To install a new package, the user can search for them (an autocomplete feature is available). The selected packages are installed only when the user clicks the ‘install’ button. This allows the selection of other packages before issuing the ‘install’ command, which might take a while.
-
Users can check for updates, for all or only the selected packages, by clicking the small cog button beside the list of installed packages. A pop-up modal will list the packages that need to be updated, along with their versions. Similarly, users can select one or more packages and uninstall them by clicking the bin icon.
-
In order to create a notebook, click on the
+
button from inside a project or a regular folder. A list will then appear with the available kernels. Users will be able to launch notebooks only using the kernel corresponding to that project. Any external notebook (requiring a python kernel) placed under the project will also be using the same kernel.
Documentation
- Please find the API Specification here
- The code is documented with necessary inline comments and docstrings
List of completed deliverables
- Server Extension with all the necessary API endpoints
- Frontend Extension that lets users interact with the package manager
- Dockerfile that automates deployment
- SWAN integration (Partially done)
Link to extension repository: https://github.com/techtocore/Jupyter-Package-Manager
Link to modified SWAN setup: https://github.com/techtocore/jupyter
List of tasks that is yet to be completed
- Integration with EOS
- Optimization of kernel list filtering
- Rigorous end to end testing
Screenshots
About
This extension is made for the purpose of fulfillment of the GSoC 2019 project at CERN-HSF (Project Summary)
Student
Akash Ravi
- Email ID: akashkravi@gmail.com
- Profile: https://akashravi.github.io/
Mentors
- Diogo Castro
- Enrico Bocchi
- Enric Tejedor
- Jakub Moscicki