Overview of hub features#

2i2c builds and operates distributions of JupyterHubs that are tailored for particular use-cases. These services share many of the same infrastructure components, but have customizations and optimizations that are more domain- or community-specific.

https://drive.google.com/uc?export=download&id=1vL8ekAtUQ4TEik4-oWIn36VAOITdlmpR

A high-level technical overview of an Interactive Computing Service collaboratively run by 2i2c and a community of practice. Each hub is a JupyterHub Distribution with a collection of community-led open source projects that are customized for a particular use-case.#

Here is a brief overview of the major features that are present in each.

name

description

research

education

Authentication 🔍

Access control

Hub administrators can control over who has access to your hub

✔️

✔️

GitHub Logon

Authenticate with a list of GitHub usernames

✔️

✔️

Google OAuth logon

Authenticate with email addresses that use Google OAuthentication

✔️

✔️

GitHub Teams Logon

Authenticate via membership in a GitHub Team that you control

✔️

User Environment ⚒️

Custom user environment

Communities may bring their own Docker images for user environments.

✔️

✔️

Host content in repositories

Use nbgitpuller to store content in online repositories and distribute them to users with a click

✔️

✔️

Jupyter Interfaces

Jupyter Lab and Notebook interfaces are designed for interactive data science environments

✔️

✔️

RStudio

RStudio is an integrated development environment (IDE) for R

✔️

✔️

Configurable resources 📈

User storage

Users have their own filesystem that persists between sessions.

up to 20GB

up to 20GB

Configurable RAM

Configure the RAM available to users from the hub UI

2-64GB

1-4GB

Configurable CPU

Configure the CPU available to users from the hub UI

2+ dedicated CPUs

1-2 shared CPUs

Shared storage

Administrators can place files in a shared folder that all users may access.

up to 100GB

up to 100GB

Cloud infrastructure ☁️

Use commercial cloud

Hubs can run on either AWS, GKE, or Azure

AWS/GKE/Azure

AWS/GKE/Azure

Connect with cloud data

Access cloud-hosted data from your hub

✔️

Scalable Dask Clusters

Scale your computing with Dask Gateway clusters

✔️

Bring your own credits

Communities can run 2i2c Hubs on their cloud accounts and projects.

ask us

ask us

Service Level 👷‍♀️

Operations Support

2i2c provides a dedicated support channel for all hubs

✔️

✔️

Hub Uptime

2i2c has a team of Hub Engineers that keep the infrastructure up-to-date, upgraded, and running smoothly

98%

98%

User Privacy

Hubs follow best practices in user privacy, and 2i2c retains no user data.

✔️

✔️

Connect with communities

2i2c provides a communications channel in Slack for Community Representatives to connect with one another

✔️

✔️

Open Source 💗

Right to Replicate

Hubs are designed to be replicable by anybody on their own infrastructure.

✔️

✔️

Open Source Stack

Hubs are built entirely with open source and community-driven tooling

✔️

✔️

Open Source Support

Hub fees fund open source engineers to do development and community work across the stack.

✔️

✔️

JupyterHub in the cloud#

At the core of a community service is one or more JupyterHubs that provide an access point for interactive computing and cloud infrastructure for your community members.

You may access your community JupyterHub at a URL with the following form (though you may choose a custom URL if you wish):

<hub-name>.<community-name>.2i2c.cloud

JupyterHub provides interactive computing sessions for each of your users, and connect to the other infrastructure in the cloud. Our JupyterHubs can run on Google Cloud, Amazon AWS, or Microsoft Azure.

Authentication#

We use auth0 and CILogon for authenticating users, which can connect to a number of other authentication protocols (such as OAuth2).

User interfaces#

Each 2i2c JupyterHub has two main interactive interfaces: Jupyter interfaces (Notebook and Lab), and RStudio. Each of them is accessible from your session via /tree, /lab, and /rstudio endpoints in your URL.

Custom user environments#

Your 2i2c JupyterHub has an environment that has been created for your particular use-case. It exists as a Docker image that your JupyterHub loads when a user starts a new session. These images can either be built with the tool repo2docker, or pulled directly from a Docker registry. The environment also comes pre-loaded with some tools that are helpful for working with JupyterHub, such as nbgitpuller. See Customize your user environment for more information.

Transparent infrastructure and operations#

All of the configuration and deployment scripts for the 2i2c JupyterHub can be found at the infrastructure/ repository. This repository contains both the deployment code as well as documentation that explains how it works. It should be treated as “for advanced users only”, and is provided for transparency and as a guide for the community to follow if they wish to manage their own infrastructure similar to 2i2c JupyterHub.

To learn about how the infrastructure/ repository works, we recommend checking out the infrastructure documentation.

See the next sections for more information about each hub distribution.

Secure out of the box#

The cloud infrastructure that we manage follows best-practices in deploying cloud applications in a secure manner. The Zero to JupyterHub Helm Chart is the community standard in deploying JupyterHub in the cloud, and is what 2i2c uses in all of its cloud hubs. This project follows the principle of “secure by default”, and has a number of configuration and design decisions that properly isolate user environments from one another, and prevent them from being able to access resources or data that is forbidden to them.

As members of the JupyterHub team, we are constantly looking for ways to improve the security of Zero to JupyterHub, and use our experience running these hubs to further improve JupyterHub’s security.

Data privacy#

2i2c will not collect user data for any purpose beyond what is required in order to run a JupyterHub. Depending on the choices of your community the hub might contain identifiable information (e.g., e-mail addresses used as usernames for authentication), but this will remain within your hub’s configuration and is not shared publicly.

Our Site Reliability Engineers will have access to all of the information that is inside a hub (which it requires in order to debug problems and and assist with upgrades), however we will not retain any of this data or move it outside of the hub, and will not retain it once the hub is shut down (except in order to transfer data to you at your request).

Monitored for abuse and unexpected costs#

We deploy Grafana Dashboards along with a Prometheus Server to continuously monitor the usage across all of our hubs. This provides visual dashboards that allow us to identify abnormal behavior on a hub (such as a single user using unusual amounts of RAM, using a lot of CPU, or making unusual networking requests).

Cryptocurrency mining#

Cryptocurrency mining abuse occurs when users take advantage of cloud CPU in order to make money by mining cryptocurrency. It is a common problem with cloud-based services and platforms.

There are many different cryptocurrencies out there, but the most common by-far for abuse is the Monero cryptocurrency due to its anonymous nature.

We deploy an open-source tool called cryptnono to each of the clusters we manage. This tool monitors any process that runs on the 2i2c hubs, and automatically kills any that are associated with Monero.