Serverless Jupyter Hub with AWS Fargate and CDK
Jupyter notebooks are a useful piece of software. They have a variety of use-cases — from demonstrating some python code on a live manner, data visualizations, machine learning algorithm development, and even incident response automation. I found myself using Jupyter notebooks more than once and started to think on their part in my life and my heart in a wider scope.
Going Corp
When I realized that a VS Code instance of my Jupyter notebooks is not sufficient anymore to my team, and a larger group of people need access to upload their notebooks and running them on a larger scale, I started to think — how could I build a solution that complies with corporate requirements like:
- Integrating them with corporate authentication mechanisms like SAML SSO — especially in a corporate environment, you can’t just “throw a service” to the cloud and expect everyone to log in and start putting their information in. You have to think about security mechanisms like authentication and integration with corporate tools like IDP and AD.
- Handling performance & scaling issues — when you spin up your notebooks servers, sometimes your tasks will be small enough to be contained on a small instance, and sometimes you’ll need to do some heavy calculations and GPU. For that, elasticity is an important manner in order to cope with all the requirements.
- Save the servers management effort from IT and DevOps — Organizations are increasingly adopting Serverless. So the idea was to use cloud native services in order to avoid the trouble of managing servers and taking care for their availability and security while providing reliable service for the users.
Solution Architecture
We’re going to use a fairly-common architecture for availability, reliability and scale. The domain name for the environment will be hosted on Route 53, when an A record will point to the Application Load Balancer that will spawn on 2 availability zones in the selected region. The traffic will be terminated on the ALB and routed from there to the ECS Fargate tasks, which will spawn containers upon the system usage.
Because Fargate task containers are ephemeral, I had to think about a shared, persistent storage. For that, I chose EFS, a serverless NFS storage service because it is capable of providing important security features like security groups and encryption at rest. Notebooks are stored on the users home folder, so mounting that as a volume to the containers eliminated the need of keeping the local storage of the Jupyter server.
For authentication to the application, I utilized the OAuth capabilities of Jupyter Hub in conjunction with Cognito user pool for managing users and authentication. This allows integration with corporate authentication systems like SAML providers and IDPs, and provides just-in-time service to the Jupyter Hub because it involves both the auto_login, create_system_users and shutdown_on_logout properties of the LocalGenericOAuthenticator of Jupyter.
# Configuration file for Jupyter Hub
...
c.JupyterHub.authenticator_class = LocalGenericOAuthenticator
c.JupyterHub.shutdown_on_logout = True
c.OAuthenticator.oauth_callback_url = os.environ.get('OAUTH_CALLBACK_URL')
c.OAuthenticator.client_id = os.environ.get('OAUTH_CLIENT_ID')
c.OAuthenticator.client_secret = os.environ.get('OAUTH_CLIENT_SECRET')
c.LocalGenericOAuthenticator.auto_login = True
c.LocalGenericOAuthenticator.create_system_users = True
c.LocalGenericOAuthenticator.add_user_cmd = ['adduser', '-q', '--gecos', '', '--disabled-password', '--force-badname']
c.LocalGenericOAuthenticator.login_service = os.environ.get('OAUTH_LOGIN_SERVICE_NAME')
c.LocalGenericOAuthenticator.username_key = os.environ.get('OAUTH_LOGIN_USERNAME_KEY')
c.LocalGenericOAuthenticator.authorize_url = os.environ.get('OAUTH_AUTHORIZE_URL')
c.LocalGenericOAuthenticator.token_url = os.environ.get('OAUTH_TOKEN_URL')
c.LocalGenericOAuthenticator.userdata_url = os.environ.get('OAUTH_USERDATA_URL')
c.LocalGenericOAuthenticator.scope = os.environ.get('OAUTH_SCOPE').split(',')
CDK To the Rescue
If you thought that i’m going to walk you through provisioning the infrastructure by going to the AWS Console — You, my friend, have a mistake. In my world, infrastructure is written as code. While Terraform is cool among DevOps, I chose CDK (Cloud Development Kit) powered by AWS.
The decision is being made because, well, we’re using only AWS services in here and CDK is very powerful in terms of providing both the possibility of writing your infrastructure as a code, but true code, means you can evaluate conditions and custom logic into your infrastructure decisions.
CDK uses CloudFormation which is a declarative way of writing your infrastructure, similarly to Terraform, but instead of writing those templates, CDK is managing these on behalf of you. For example, adding an IAM role or changing the ECS containers memory or CPU will be done on code, and the changes will propagate to the CloudFormation template that manages the infrastructure while you will have the visibility of the exact change by CloudFormation ChangeSets.
jupyter_ecs_container = jupyter_ecs_task_definition.add_container(
f'{BASE_NAME}Container',
image=ecs.ContainerImage.from_registry(
config_yaml['container_image']),
privileged=False,
port_mappings=[
ecs.PortMapping(
container_port=8000,
host_port=8000,
protocol=ecs.Protocol.TCP
)
],
logging=ecs.LogDriver.aws_logs(
stream_prefix=f'{BASE_NAME}ContainerLogs-',
log_retention=logs.RetentionDays.ONE_WEEK
),
environment={
'OAUTH_CALLBACK_URL': 'https://' + jupyter_route53_record.domain_name + '/hub/oauth_callback',
'OAUTH_CLIENT_ID': cognito_app_client.user_pool_client_id,
'OAUTH_CLIENT_SECRET': cognito_user_pool_client_secret,
'OAUTH_LOGIN_SERVICE_NAME': config_yaml['oauth_login_service_name'],
'OAUTH_LOGIN_USERNAME_KEY': config_yaml['oauth_login_username_key'],
'OAUTH_AUTHORIZE_URL': 'https://' + cognito_user_pool_domain.domain_name + '.auth.' + self.region + '.amazoncognito.com/oauth2/authorize',
'OAUTH_TOKEN_URL': 'https://' + cognito_user_pool_domain.domain_name + '.auth.' + self.region + '.amazoncognito.com/oauth2/token',
'OAUTH_USERDATA_URL': 'https://' + cognito_user_pool_domain.domain_name + '.auth.' + self.region + '.amazoncognito.com/oauth2/userInfo',
'OAUTH_SCOPE': ','.join(config_yaml['oauth_scope'])
}
)
Thoughts, Issues and Concerns
After building the solution, I had the following thoughts and concerns to improve the solution:
- Is there a better storage solution, from security, speed scale and costs perspective, rather than EFS?
- Does the fact that the EFS instance is encrypted with CMK impacts the costs of the system significantly because of the decryption operations during the runtime of the system?
- I had some issues with OAuth when working with more than one container, probably because of the OAuth implementation in Jupyter Hub, so maybe it would be useful to add session stickiness to the ECS configuration.
- I thought about adding an example of integration between Cognito and an IDP, such as AWS SSO or Okta, but I thought that for simplicity I would leave this part up to your consideration.
Deploying the Solution
After reading all above, I think that now you are ready to move forward and try deploying the solution on your AWS environment. Just head to the Git repository below and follow the deployment instructions, and you’ll have your Jupyter Hub serverless solution up and running in about 10 minutes.
Contributions and feature suggestions are happily accepted!