Lab 1 - Installation#

Th. 03.10.2024 15:00-17:00


Instructions for installing Jupyter Notebook with Anaconda#

STEP 1: Download Anaconda#

Download the Anaconda installer based on your operating system:

  • Windows

  • Mac OS

  • Linux

installer

STEP 2: Install Anaconda#

Select the default options when prompted during the installation.

install_complete

STEP 3: Create a new conda environment#

It’s recommended to create a separate conda environment for this class to avoid potential conflicts between packages.

1. Open Anaconda Prompt#

prompt

2. Type in Command Line#

  • conda create -n gir #you can replace gir with your desired environment name

  • conda activate gir #activate the newly created environment

  • conda config --env --add channels conda-forge

  • conda config --env --set channel_priority strict

  • conda install jupyter notebook #install jupyter notebook

  • conda install pandas # the library we will use today, same for other libraries

  • conda install ipykernel # Install ipykernel to allow the environment to be used as a Jupyter kernel

  • python -m ipykernel install --name gir

  • jupyter notebook #launch jupyter notebook

cl1 conda

STEP 4: Getting started with Jupyter Notebook#

To create a new notebook, click on the “New” button in the top right corner of the Jupyter Notebook interface and select Python 3 from the drop-down menu.

jupyter notebook

Try the following code blocks:#

print('Hello World!')
Hello World!
help(print)
Help on built-in function print in module builtins:

print(...)
    print(value, ..., sep=' ', end='\n', file=sys.stdout, flush=False)
    
    Prints the values to a stream, or to sys.stdout by default.
    Optional keyword arguments:
    file:  a file-like object (stream); defaults to the current sys.stdout.
    sep:   string inserted between values, default a space.
    end:   string appended after the last value, default a newline.
    flush: whether to forcibly flush the stream.

By convention, we import all libraries at the very top of the notebook. There are also a set of standard aliases that are used to shorten the library names. Pandas is one of the most widely used Python libraries in data science.

Some commonly used data wrangling operations in Pandas (here for more information):

Creating DataFrames#

DataFrame: a two-dimensional data structure that holds data like a two-dimension array or a table with rows and columns.

import pandas as pd

data = {
    'fruit': ['apple', 'orange', 'banana', 'mango'],
    'amount': [1, 3, 5, 2],
    'color':['red','orange','yellow','yellow']
}

purchases = pd.DataFrame(data)
purchases
fruit amount color
0 apple 1 red
1 orange 3 orange
2 banana 5 yellow
3 mango 2 yellow

Selecting rows and columns#

  • Selection using label/index: .loc

  • Selection using integer location: .iloc

# To select a column of a DataFrame by column label,
# the safest and fastest way is to use the .loc method.
# General usage looks like df.loc[rowname,colname].

purchases.loc[:, ['fruit']] #the colon : here means "everything"

# try also:
# purchases.loc[[0],:]
# purchases.loc[0:2, ['fruit']]
fruit
0 apple
1 orange
2 banana
3 mango
# iloc[] lets you slice the dataframe by
# row position and column position

purchases.iloc[0, 2]
'red'

Filtering Data#

purchases[purchases['amount'] > 2]
fruit amount color
1 orange 3 orange
2 banana 5 yellow
purchases[purchases['color'] == 'yellow']
fruit amount color
2 banana 5 yellow
3 mango 2 yellow

Submission#

For the participation grade, create a DataFrame of your own and submit the .ipynb file