Hello there, I hope you find this easy and self-explanatory guide to get you ready to code in Python or R for data science.
Jupyter notebook or what we call IPython Notebook is a widely used application in Data Science field for coding, testing, writing equations, plotting graphs. It is great practice to write a single line of code and run it rather than writing hundred lines and then running them as a whole. More on an incremental learning side jupyter provides you an easy go-to interface for experimenting with your code. It is a server-client application that runs on any web browser of your choice without the need of internet access, on a localhost.
+++ Awesome Trick: For Mac OS, open a Jupyter Notebook (.ipynb) file by just double clicking! +++
By the end of this, you will be able to setup your system ready for deep learning or NLP projects in a root or virtual environment depending upon your need!
Contents
I. Part 1: Installing Jupyter Notebook for Python
- Download Anaconda
- Install Anaconda
- Using Jupyter Notebooks
- Customizing your Jupyter Notebook
- Installing libraries or packages
- Creating multiple virtual environments with different Python version for different ML needs
- Installing Jupyter Notebook (kernel) in virtual environments
- Share your virtual environment.
- *** Opening IPython notebooks in Mac with a double click ***
II. Part 2: Installing Jupyter Notebook for R
Part 1: Installing Jupyter Notebook for Python
Step 1. Download Anaconda
We will use Anaconda Distribution from Contiuum to run jupyter notebooks. Let’s download it, shall we ?
- Click on this link to download: https://www.anaconda.com/distribution
- If you download Python 3.X version, it will install your ROOT environment (Base or default environment) with Python 3.X along with a set of many standard packages (like numpy, pandas, etc) that comes with Anaconda. Then you can create multiple virtual environments to work with different python versions or for different needs.
- If you download Python 2.X version, it will install your ROOT environment with Python 2.X along with a set of standard packages. Similarly you can create multiple virtual environments.
- So remember the version you install will determine which version of Python you are going to use more and extensively as your base. In this example, I am installing an older version of Anaconda Anaconda3–4.2.0-Windows-x86_64.exe found in https://repo.continuum.io/archive/
Step 2. Install Anaconda
Follow the normal instructions as depicted in images below:
- Click Next on the Welcome Screen.
2. Select a Destination Folder Path for installation.
3. If you choose a destination folder path containing spaces, it will show you a pop up warning, so it will be better to choose a path which does not contains spaces.
4. After choosing your installation path, Advanced Installation Options screen will show up, select both the options here.
5. Done.
Step 3. Using Jupyter Notebooks
Congratulations, now you are all set to start coding. Anaconda provides an easy user friendly graphical interface to interact with instead of using command prompt. There are two ways to open up a notebook:
Method 1: Using Anaconda Navigator
Open Anaconda Navigator. Here’s what the welcome screen looks like.
Just select the Jupyter notebook Launch button. For the first time it will take some time, it opens a command prompt with an authentication token and then will open a Jupyter Dashboard on your default web browser.
Method 2: Using a terminal
Alternatively, you can also open an Anaconda prompt or a Terminal on Mac, and type jupyter notebook
and hit Enter. It will again open the same localhost tree like seen in the above method.
Here’s what the Jupyter Notebook Default Tree looks like:
The Jupyter dashboard opens in a Tree location by default which is generally C:\Users\username\
mentioning various folders, files in that location. Now you can create a new notebook by clicking NEW located in top-right which gives you a drop-down mentioning your installed python version.
This opens a jupyter notebook. Voila! :)
✮ Some useful shortcuts to get started
Press Escape key and then press
D — Delete a cell
M — Convert a cell into Markdown (text)
A — Add a cell below current cell
B — Add a cell above current cell
Shift+Enter — To execute (Run)
❉ TIP ❉: You can fully customize your notebook like the color, background, etc by editing the jupyter_notebook_config file.
Step 4. Customizing your Jupyter Notebook
Whenever you open the notebook, it always open in the same parent directory Tree. Suppose you want it to open your notebook in a project directory or in some different location of your choice then? Let’s do it!
1) Open “Anaconda Prompt” and type jupyter notebook --generate-config
2) You find the file in C:\Users\username\.jupyter\jupyter_notebook_config.py
3) Find this line, #c.NotebookApp.notebook_dir = ''
and change it to a directory you wish the dashboard to open at, like c.NotebookApp.notebook_dir = 'c:\Your Dir'
4) Then go to the shortcut of Jupyter Notebook generally located in C:\Users\User_name\AppData\Roaming\Microsoft\Windows\Start Menu\Programs\Anaconda3 (64-bit)
and put it somewhere you will often use to open Jupyter Notebooks. Opening this shortcut will only enable this.
5) Hit the right click and selectProperties
6) In the Target
field, remove %USERPROFILE%
in the end.
7) Then in the Start in
field, type the same directory you just typed above c:\Your Dir
8) Now, Use this shortcut to open Jupyter Dashboard.
9) Done!
Now it’ll open the dashboard in your desired directory.
Step 5. Installing libraries or packages
Let’s say you want to install Pandas library package in your Root/Base, simply open a command prompt or Anaconda prompt and use PIP package manager:
(base) $:>pip install pandas
You can also install multiple libraries at once, mentioning their versions too!
(base) $:>pip install pandas==0.21.0 numpy==1.16.0
Step 6. Creating a virtual environment
Suppose you want to write your code in a different version of Python or use a specific version of any package? And this mere curiosity brings us to what we call virtual environments.
Python applications will often use packages and modules that don’t come as part of the standard library. Applications will sometimes need a specific version of a library, or a different Python version.This means it may not be possible for one Python installation to meet the requirements of every application.
If application A needs version 1 of Python or of a particular module but application B needs version 2, then the requirements are in conflict and installing either version 1 or 2 will leave one application unable to run.
The solution for this problem is to create a virtual environment, a self-contained directory tree that contains a Python installation for a particular version of Python, plus a number of additional packages.
Virtual Environment Setup:
Suppose we want to create something like this:
We have already installed Python 3.5.6 as Base environment. Now we will create three virtual environments (env) for different needs.
- The first env is Python_2.7.13 which will be having Python version 2.7.13 with some basic libraries. For Natural Language Processing (NLP) tasks we will use NLTK library.
- The second env is Python_3.6.3 which will be having Python version 3.6.3 with some basic libraries. For Deep Learning (DL) tasks we will use Keras (on top of Tensorflow) and for NLP we will have Spacy library.
- The third env is Python_3.7.3 which will be having Python version 3.7.3 with some basic libraries. For Deep Learning we will use Pytorch and for NLP we will use fastai library.
NLP — There are many open-source libariees you can use like NLTK, Gensim, Spacy.
DL — There are many platforms depending upon various factors which makes each of them unique like Keras, Tensorflow, Pytorch, etc.
Virtual Environment “Python_2.7.13”
- Open “Anaconda Prompt” and type
conda create -n <ENV_NAME> python=<VERSION> <LIBRARY1> <LIBRARY2>
like in this case:
(base) $:>conda create -n Python_2.7.13 python=2.7.13 numpy pandas scikit-learn nltk notebook ipykernel
You can also install your env in the desired location (e.g. D:\Python\envs) instead of the default location which is
C:\ProgramData\Anaconda3\envs
like this:
(base) $:>conda create --prefix=D:\Python\envs\Python_2.7.13 python=2.7.13 numpy pandas scikit-learn nltk notebook ipykernel
and then add the desired location path in conda config file like this:
(base) $:>conda config --append envs_dirs D:\Python\envs
2. Activate your env: (base) $:>activate Python_2.7.13
3. Once activated, you can install jupyter notebook and register it on kernelspec (to use all notebook kernels at one place) in one shot:-
python -m ipykernel install --user --name <NAME> --display-name <DISPLAY NAME ON JUPYTER KERNEL LIST>
(Python_2.7.13) $:>python -m ipykernel install --user --name Python_2.7.13_Notebook --display-name "Python_2.7.13"
4. Now you can also install other libraries using PIP
(Python_2.7.13) $:>pip install seaborn
5. Now just type (Python_2.7.13) $:>jupyter notebook
to launch jupyter notebook.
Virtual Environment “Python_3.6.3”
- Open Anaconda Prompt and type
(base) $:>conda create --prefix=D:\Python\envs\Python_3.6.3 python=3.6.3 numpy pandas scikit-learn nltk notebook ipykernel
2. Activate your env: (base) $:>activate Python_3.6.3
3. Install pytorch and fastai libraries:
(Python_3.6.3) $:>conda install fastai pytorch=1.0.0 -c fastai -c pytorch -c conda-forge
4. Install jupyter notebook
(Python_3.6.3) $:>python -m ipykernel install --user --name Python_3.6.3_Notebook --display-name "Python_3.6.3_fastai"
5. Enter(Python_3.6.3) $:>jupyter notebook
to launch jupyter notebook.
Virtual Environment “Python_3.7.3”
- Open Anaconda Prompt and type
(base) $:>conda create --prefix=D:\Python\envs\Python_3.7.3 python=3.7.3 numpy pandas scikit-learn nltk notebook ipykernel
2. Activate your env: (base) $:>activate Python_3.7.3
3. Now install Keras and Spacy libraries using PIP
(Python_3.7.3) $:>pip install keras
(Python_3.7.3) $:>pip install spacy
Spacy comes with pre-trained “English” models. For downloading small model:
# Small Model (37 MB)
(Python_3.7.3) $:>python -m spacy download en_core_web_sm# Medium Model (120 MB)
(Python_3.7.3) $:>python -m spacy download en_core_web_md# Large Model (838 MB)
(Python_3.7.3) $:>python -m spacy download en_core_web_lg# Can load any of the above models using:
> import spacy
> nlp = spacy.load("en_core_web_md") # for using medium models
4. Install jupyter notebook
(Python_3.7.3) $:>python -m ipykernel install --user --name Python_3.7.3_Notebook --display-name "Python_3.7.3"
5. Enter(Python_3.7.3) $:>jupyter notebook
to launch jupyter notebook.
You can check the installed virtual environments using:
(base) $:>conda env list
You can also check the installed kernels in Jupyter Notebook using:
(base) $:>jupyter kernelspec list
In the above image, the left column are the names of Jupyter Notebook Kernels (the display name on screen may be different as specified while installing) and the right column are the location of these kernels.
The final Jupyter Notebook with multiple kernels looks something like this:
8. Sharing your virtual environment
To share a virtual environment from your existing virtual environment is a simple task but it gets trickier in getting the same package versions along with a similar kind of working environment.
STEPS:
- In your current/existing machine, generate an environment.yml and a requirements.txt file.
(base) $:>conda activate my_ENV
(my_ENV) $:>conda env export > environment.yml
(my_ENV) $:>pip freeze > requirements.txt
The issue in doing this way is, sometimes, the environment.yml file contains old or initial python package version numbers which creates an implementation error in a second machine. To prevent this, a workaround can be to create your own environment.yml file using requirements.txt, like mine below. (Just put all the requirement-packages under pip in yml file)
Email/copy this file to/in another machine.
2. In your second machine, create using the above environment.yml file:
(base) $:>conda env create -f environment.yml
3. List your virtual envs:
(base) $:>conda env list
# conda environments:
#
base * C:\ProgramData\Anaconda3
Python_2.7.13 D:\Python\envs\Python_2.7.13
Python_3.6.3_fastai D:\Python\envs\Python_3.6.3_fastai
Python_3.7.3 D:\Python\envs\Python_3.7.3
python_3.5.6_stable D:\Python\envs\python_3.5.6_stable
4. Install a jupyter kernel for this virtual env
(base) $:>conda activate python_3.5.6_stable(python_3.5.6_stable) $:>pip install jupyter notebook(python_3.5.6_stable) $:>python -m ipykernel install --user --name "Python_3.5.6_Notebook" --display-name “Python_3.5.6”
5. List your jupyter kernels:
(base) $:>jupyter kernelspec listAvailable kernels:Python_2.7.13 C:\Users\...\kernels\Python_2.7.13
Python_3.6.3_fastai C:\Users\...\kernels\Python_3.6.3_fastai
Python_3.7.3 C:\Users\...\kernels\Python_3.7.3
Python_3.5.6_Notebook C:\Users\...\kernels\Python_3.5.6_Notebook
6. That’s all. Now you have a identical virtual environment with jupyter running on that installed in your another machine.
9. Opening IPython notebooks in Mac with a double click
- Open Automator
- Select a New Document > Application
3. Search “Run Shell Script” under Actions tab. Drag and Drop to your working window.
4. Set Pass inputs drop down menu to as arguments
5. Copy and paste the following code in the Shell Window:
variable="'$1'"
the_script='tell application "terminal" to do script "jupyter lab '
osascript -e "${the_script}${variable}\""
6. Click Save (Save As) and save the file as “Open Jupyter Notebook App” and choose File Format Application.
Save/move this file to your Applications in Main Drive.
7. Find a Jupyter Notebook that you may want to open. Right click on the file and click on Get Info. Click the Open with dropdown menu and select other
. Then navigate to your applications folder and select Open Jupyter Notebook App
. Click Change All... so that all .ipynb
files are opened with Jupyter lab by default when double clicked.
8. Done!
Part 2: Installing Jupyter Notebook for R
- First, find where R is installed on your system.
- For Mac OS, typically, the R directory is
/Library/Frameworks/R.framework/Versions/<VERSION>/Resources/bin
3. Open a Command Prompt/Terminal and cd to the above location:
$:> cd /Library/Frameworks/R.framework/Versions/4.2/Resources/bin/
$:> ./R
Once launched the R console:
> install.packages("devtools")
> ...
> Enter Selection: 1
>
4. Install IRkernel for Jupyter:
> devtools::install_github("IRkernel/IRkernel")
> IRkernel::installspec()
> quit()
5. Done! Now open a Jupyter Notebook and Select “R” as a Kernel.