Skip to content

How to use Hadoop within binderhub instances (mybinder.org etc.)

Notifications You must be signed in to change notification settings

thedatasociety/binderhub-hadoop

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

The Data Science and Engineering Society


github organization our docker hub organization our slack our twitter quilt packages

binderhub-hadoop

A minimal repository for running Apache™ Hadoop® on binderhub instances.

The tutorial for downloading, starting it in pseudo-distributed mode, running a map-reduce job, etc. is presented in the running-hadoop.ipynb. For a better visualization of the notebook please see the running-hadoop.ipynb via nbviewer.

Table of contents

Launching this repository on a binderhub instance now

Binderhub uses repo2docker for launching and serving the computational environments. The list below contains the known binderhub instances.

The list below provides the badges and the hyperlinks for launching Jupyter/Jupyter Lab in one of the binderhub instances.

JupyterLab

  • launch @ gke.mybinder.org

  • launch @ ovh.mybinder.org

  • launch @ gesis.org

  • launch @ pangeo.io

Jupyter

  • launch @ gke.mybinder.org

  • launch @ ovh.mybinder.org

  • launch @ gesis.org

  • launch @ pangeo.org

Launching this repository locally using repo2docker

You can also launch this repository locally using Docker and repo2docker. Please refer to this link for installing Docker and this link for installing repo2docker.

The command below launches a container on port 8888. It also creates a Docker volume that maps the user's home into the container (the local-home folder).

Before running it, make sure your local user is in the docker group. Please refer to this Docker documentation for more details. It is strongly advised to not to run the container as root. Please also be aware that the --ip 0.0.0.0 directive will start a sever which will accept connections from any ip. For security purposes the --NotebookApp.token='dstoken1234567' directive forces the need of a token for accessing any interface. Use the dstoken1234567 to login or feel free to set a stronger token.

repo2docker -p 8888:8888 \
            -v $(echo ~):$(echo ~)/local-home \
            https://github.com/thedatasociety/binderhub-hadoop \
            jupyter lab --ip 0.0.0.0 --NotebookApp.token='dstoken1234567'

Each interface will be available at a specific path, as follows:

See the repo2docker documentation for more details regarding the use of multiple interfaces.

Contributing

License

About

How to use Hadoop within binderhub instances (mybinder.org etc.)

Resources

Stars

Watchers

Forks

Packages

No packages published