Practical Machine Learning with Python


As part of my MSc I'm taking a short course in Practical Machine Learning via QA.com.

The first three days were just about basic stats visualisation using Python. It was great to have a refresher - but I would have expected that to be a pre-requisite.

The tutor was excellent - very patient at explaining complex concepts. And the use of Jupyter Notebooks is a gamechanger for taught courses like this.

Ultimately, it was a useful course - although I expected a lot more time to be spent on training for machine learning models rather than the underlying statistics.

These are mostly notes to myself to help consolidate my knowledge - and to provide some more information on the course itself if you are thinking of taking it.

Day 1

All done in Python, pretty standard.

Python

Python Virtual Environments.

  • Create python3 -m venv SomeName
  • Activate source path/to/SomeName/bin/activate
  • Install packages pip install -U whatever
  • Run python
  • To exit the venv, deactivate

Jupyter Notebooks

  • In the venv pip install -U jupyterlab
  • Run it with jupyter notebook
  • Don't forget to "Close and Halt" to stop the notebooks running in the background.
  • Don't pip install from within Jupyter

Anaconda

Possibly the easiest way to do everything (debatable!)
* Install Anaconda for Linux

Why

The process of automatically extracting meaning from data.

Data can be raw and unstructured. Lots of modern data is structured - e.g. tabular, database. But unstructured data - mostly media - isn't easy to classify and extract information from.

Exponential growth of data. 90% of the world's data has been created in the last few months. Rise of sensor data, etc.

Use of modern data science tools to invite others to reproduce your results.

Data Science = Turn data into a valuable asset, gain insight, make decisions and take actions.
Data Analyst explains. Data Scientists predict and visualise.

Python basics and Jupyter basics

Different data types. Tuples, Dicts.

Some reasons not to use notebooks.

Basic Python and Markdown syntax.

Stats

The usual intro to mean, median, mode, and basic statistical techniques. Basically enough to make sure everyone is on the same page.

Day 2

Numpy. What it is, how it works, how fast it is. N-Dimensional Arrays

Pandas. Again, overview of the basics. Series and DataFrames.

Day 3

Matplotlib and Seaborn. Again, basic interfaces for drawing graphs and getting data out of DataFrames.

Day 4

Linear regression, best fit. K-Nearest-Neighbours - use of random samples to test the K.

Day 5

Voronoi Diagrams and basic clustering. Mixture models.


Leave a Reply

Your email address will not be published. Required fields are marked *