The first three days were just about basic stats visualisation using Python. It was great to have a refresher - but I would have expected that to be a pre-requisite.
The tutor was excellent - very patient at explaining complex concepts. And the use of Jupyter Notebooks is a gamechanger for taught courses like this.
Ultimately, it was a useful course - although I expected a lot more time to be spent on training for machine learning models rather than the underlying statistics.
These are mostly notes to myself to help consolidate my knowledge - and to provide some more information on the course itself if you are thinking of taking it.
All done in Python, pretty standard.
python3 -m venv SomeName
- Install packages
pip install -U whatever
- To exit the venv,
- In the venv
pip install -U jupyterlab
- Run it with
- Don't forget to "Close and Halt" to stop the notebooks running in the background.
pip installfrom within Jupyter
Possibly the easiest way to do everything (debatable!)
* Install Anaconda for Linux
The process of automatically extracting meaning from data.
Data can be raw and unstructured. Lots of modern data is structured - e.g. tabular, database. But unstructured data - mostly media - isn't easy to classify and extract information from.
Exponential growth of data. 90% of the world's data has been created in the last few months. Rise of sensor data, etc.
Use of modern data science tools to invite others to reproduce your results.
Data Science = Turn data into a valuable asset, gain insight, make decisions and take actions.
Data Analyst explains. Data Scientists predict and visualise.
Different data types. Tuples, Dicts.
Basic Python and Markdown syntax.
The usual intro to mean, median, mode, and basic statistical techniques. Basically enough to make sure everyone is on the same page.
Numpy. What it is, how it works, how fast it is. N-Dimensional Arrays
Pandas. Again, overview of the basics. Series and DataFrames.
Matplotlib and Seaborn. Again, basic interfaces for drawing graphs and getting data out of DataFrames.
Linear regression, best fit. K-Nearest-Neighbours - use of random samples to test the K.
Voronoi Diagrams and basic clustering. Mixture models.