Google Summer of Code 2019 Project Report

Introduction

SWAN (Service for Web based ANalysis) is a platform developed by CERN to perform interactive data analysis in the cloud, without the need to install any software. It is built upon Jupyter notebooks and allows users to code and run their scripts via a web interface.

In SWAN, users can create projects and store their notebooks, input/output files, etc. Internally, a project is a special type of folder that can be used as a container for your work. It is also the unit of sharing in SWAN: you can share a project with your colleagues and thus provide them with everything they need to run the notebooks you created.

Software packages in SWAN are retrieved on the fly from an HTTP-based file system called CVMFS (CernVM File System) allowing users to forget about installation, configuration, and compatibility of packages. Due to the centrally managed software, provided by CVMFS, a user trying to open the shared notebook doesn’t have to worry about compatibility with its software stack, since they are the same. However, SWAN is also used by users who need to install their own packages, breaking this seamless sharing experience.

The objective of this GSoC project is to create a package manager for SWAN (in the shape of a Jupyter extension), that will allow users to keep track of the packages needed by their “projects”. This lets users manage their projects and dependencies.

The extension allows users to install python modules (and their respective versions) via a user interface, making them available inside the corresponding project. People who open a shared project would automatically get a prompt to install all the required missing packages. It thus encourages reproducible results and collaborative analysis.

Each project is internally mapped to a separate conda environment and the project metadata is stored on a hidden file as a part of the project itself. This helps abstract the configuration while mapping an independent environment for each project.

SWAN allows the creation of notebooks in four different languages: Python (2 or 3, depending on the software stack chosen during the session configuration), C++, R, and Octave. However, this extension would only work for python projects. For other kernels, it is not certain that the environments guarantee isolated versions of packages.

Features

Setup Instructions

Usage Instructions

Documentation

List of completed deliverables

Link to extension repository: https://github.com/techtocore/Jupyter-Package-Manager

Link to modified SWAN setup: https://github.com/techtocore/jupyter

List of tasks that is yet to be completed

Screenshots

Alt text

Alt text

About

This extension is made for the purpose of fulfillment of the GSoC 2019 project at CERN-HSF (Project Summary)

Student

Akash Ravi

Mentors