<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet href="https://shkspr.mobi/blog/wp-content/themes/edent-wordpress-theme/rss-style.xsl" type="text/xsl"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	    xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	     xmlns:dc="http://purl.org/dc/elements/1.1/"
	   xmlns:atom="http://www.w3.org/2005/Atom"
	     xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	  xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>
<channel>
	<title>r &#8211; Terence Eden’s Blog</title>
	<atom:link href="https://shkspr.mobi/blog/tag/r/feed/" rel="self" type="application/rss+xml" />
	<link>https://shkspr.mobi/blog</link>
	<description>Regular nonsense about tech and its effects 🙃</description>
	<lastBuildDate>Tue, 19 Aug 2025 08:20:25 +0000</lastBuildDate>
	<language>en-GB</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.9.4</generator>

<image>
	<url>https://shkspr.mobi/blog/wp-content/uploads/2023/07/cropped-avatar-32x32.jpeg</url>
	<title>r &#8211; Terence Eden’s Blog</title>
	<link>https://shkspr.mobi/blog</link>
	<width>32</width>
	<height>32</height>
</image> 
	<item>
		<title><![CDATA[MSc Assignment 2 - Data Analytics Principles]]></title>
		<link>https://shkspr.mobi/blog/2021/08/msc-assignment-2-data-analytics-principles/</link>
					<comments>https://shkspr.mobi/blog/2021/08/msc-assignment-2-data-analytics-principles/#comments</comments>
				<dc:creator><![CDATA[@edent]]></dc:creator>
		<pubDate>Thu, 19 Aug 2021 11:28:30 +0000</pubDate>
				<category><![CDATA[/etc/]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[MSc]]></category>
		<category><![CDATA[r]]></category>
		<category><![CDATA[treemap]]></category>
		<guid isPermaLink="false">https://shkspr.mobi/blog/?p=39929</guid>

					<description><![CDATA[I&#039;m doing an apprenticeship MSc in Digital Technology. In the spirit of openness, I&#039;m blogging my research and my assignments.  This is my paper from the Data Analytics module. I enjoyed it far more than the previous module.  This was my second assignment, and I was amazed to score 72%. In the English system 50% is a pass, 60% is a commendation, 70% is distinction. Nice!  A few disclaimers:   I…]]></description>
										<content:encoded><![CDATA[<p>I'm doing an apprenticeship <a href="https://shkspr.mobi/blog/tag/msc/">MSc</a> in Digital Technology. In the spirit of openness, I'm blogging my research and my assignments.</p>

<p>This is my paper from the Data Analytics module. I enjoyed it far more than <a href="https://shkspr.mobi/blog/2021/05/msc-first-assignment-technical-and-digital-leadership/">the previous module</a>.</p>

<p>This was my second assignment, and I was amazed to score 72%. In the English system 50% is a pass, 60% is a commendation, 70% is distinction. Nice!</p>

<p>A few disclaimers:</p>

<ul>
<li>I don't claim it to be brilliant. I am not very good at academic-style writing. I was marked down for over-reliance on bullet points.</li>
<li>This isn't how I'd write a normal document for work - and the numbers have not been independently verified.</li>
<li>This isn't the policy of my employer, nor does it represent their opinions. It has only been assessed from an academic point of view.</li>
<li>It has not been peer reviewed, nor are the data guaranteed to be an accurate reflection of reality. Cite at your own peril.</li>
<li>I've quickly converted this from Google Docs + <a href="https://shkspr.mobi/blog/2021/05/zotero-citations-to-markdown-via-csl/">Zotero into MarkDown</a>. Who knows what weird formatting that'll introduce!</li>
<li>All references are clickable - going straight to the source. Reference list is at the end.</li>
</ul>

<p>And, once more, this is not official policy. It was not commissioned by anyone. It is an academic exercise. Adjust your expectations accordingly.</p>

<hr>

<p></p><nav role="doc-toc"><ul><li><h2 id="table-of-contents"><a href="https://shkspr.mobi/blog/2021/08/msc-assignment-2-data-analytics-principles/#table-of-contents">Table of Contents</a></h2><ul><li><a href="https://shkspr.mobi/blog/2021/08/msc-assignment-2-data-analytics-principles/#abstract">Abstract</a></li><li><a href="https://shkspr.mobi/blog/2021/08/msc-assignment-2-data-analytics-principles/#1-business-challenge-context">1. Business Challenge Context</a></li><li><a href="https://shkspr.mobi/blog/2021/08/msc-assignment-2-data-analytics-principles/#2-data-analytics-principles">2. Data Analytics Principles</a><ul><li><a href="https://shkspr.mobi/blog/2021/08/msc-assignment-2-data-analytics-principles/#how-key-algorithms-and-models-are-applied-in-developing-analytical-solutions-and-how-analytical-solutions-can-deliver-benefits-to-organisations">How key algorithms and models are applied in developing analytical solutions and how analytical solutions can deliver benefits to organisations.</a></li></ul></li><li><a href="https://shkspr.mobi/blog/2021/08/msc-assignment-2-data-analytics-principles/#2-the-information-governance-requirements-that-exist-in-the-uk-and-the-relevant-organisational-and-legislative-data-protection-and-data-security-standards-that-exist-the-legal-social-and-ethical-c">2. The information governance requirements that exist in the UK, and the relevant organisational and legislative data protection and data security standards that exist. The legal, social and ethical concerns involved in data management and analysis.</a></li><li><a href="https://shkspr.mobi/blog/2021/08/msc-assignment-2-data-analytics-principles/#3-the-properties-of-different-data-storage-solutions-and-the-transmission-processing-and-analytics-of-data-from-an-enterprise-system-perspective-this-should-include-the-platform-choices-available">3. The properties of different data storage solutions, and the transmission, processing and analytics of data from an enterprise system perspective. This should include the platform choices available for designing and implementing solutions for data storage, processing and analytics in different data scenarios.</a></li><li><a href="https://shkspr.mobi/blog/2021/08/msc-assignment-2-data-analytics-principles/#4-how-relevant-data-hierarchies-or-taxonomies-are-identified-and-properly-documented">4. How relevant data hierarchies or taxonomies are identified and properly documented.</a></li><li><a href="https://shkspr.mobi/blog/2021/08/msc-assignment-2-data-analytics-principles/#3-product-design-development-evaluation">3. Product Design, Development &amp; Evaluation</a><ul><li><a href="https://shkspr.mobi/blog/2021/08/msc-assignment-2-data-analytics-principles/#the-application-of-data-analysis-principles-include-here-the-approach-the-selected-data-the-fitted-models-and-evaluations-used-in-the-development-of-your-product">The application of data analysis principles. Include here the approach, the selected data, the fitted models and evaluations used in the development of your product.</a><ul><li><a href="https://shkspr.mobi/blog/2021/08/msc-assignment-2-data-analytics-principles/#business-implications-and-benefits">Business Implications and Benefits</a></li></ul></li></ul></li><li><a href="https://shkspr.mobi/blog/2021/08/msc-assignment-2-data-analytics-principles/#6-the-application-of-concepts-tools-and-techniques-for-data-visualisation-including-how-this-provides-a-qualitative-understanding-of-the-information-on-which-decisions-can-be-based-include-here-th">6. The application of concepts, tools and techniques for data visualisation, including how this provides a qualitative understanding of the information on which decisions can be based. Include here the visualisation aspects applicable to your product.</a></li><li><a href="https://shkspr.mobi/blog/2021/08/msc-assignment-2-data-analytics-principles/#4-personal-reflection">4. Personal Reflection</a><ul><li><a href="https://shkspr.mobi/blog/2021/08/msc-assignment-2-data-analytics-principles/#a-reflective-evaluation-of-the-implications-of-conducting-this-investigation-for-your-learning-development-on-this-programme">A reflective evaluation of the implications of conducting this investigation for your learning development on this programme.</a></li></ul></li><li><a href="https://shkspr.mobi/blog/2021/08/msc-assignment-2-data-analytics-principles/#appendix-code">Appendix: Code</a></li><li><a href="https://shkspr.mobi/blog/2021/08/msc-assignment-2-data-analytics-principles/#references">References</a></li><li><a href="https://shkspr.mobi/blog/2021/08/msc-assignment-2-data-analytics-principles/#copyright-and-copyleft">Copyright and Copyleft</a></li></ul></li></ul></nav><p></p>

<h2 id="abstract"><a href="https://shkspr.mobi/blog/2021/08/msc-assignment-2-data-analytics-principles/#abstract">Abstract</a></h2>

<p>We describe a method of visualising change of data over time using an animated TreeMap. This is used to display how government policies affect the type of files that government departments publish.</p>

<p>This research forms part of a MSc project to better analyse the data and metadata generated by the UK government.  It will inform future storage space requirements and what incentives are needed to help the government meet its Open Government Partnership commitments.</p>

<p>This project was brought about following the Prime Minister and Cabinet's recent commitment to ensure that data are published in formats which are as open as possible (<a href="https://www.gov.uk/government/publications/declaration-on-government-reform">Johnson, 2021</a>).</p>

<p>The use of time-series visualisations forms an essential part of our department's ability to monitor the effectiveness of our policies and guidance in an intuitive way.</p>

<p>The end result is a short video which shows the volume of files uploaded over time, and whether they meet our department's standards for openness.</p>

<iframe title="Animated TreeMap - MSc Coursework" width="620" height="465" src="https://www.youtube.com/embed/-_ecmTC2hRc?feature=oembed" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen=""></iframe>

<h2 id="1-business-challenge-context"><a href="https://shkspr.mobi/blog/2021/08/msc-assignment-2-data-analytics-principles/#1-business-challenge-context">1. Business Challenge Context</a></h2>

<p>The UK Government publishes tens of thousands of documents per year. Each government department has responsibility for their own publishing. The Data Standards Authority (DSA) is responsible for ensuring that documents are published in an open format.</p>

<p>There is growing concern about the number of documents being published as PDF files (<a href="https://gds.blog.gov.uk/2018/07/16/why-gov-uk-content-should-be-published-in-html-and-not-pdf/">Williams, 2018</a>).</p>

<p>The following questions have been identified as important to the organisation:</p>

<ul>
<li>How many different document formats are regularly used?</li>
<li>How is the ratio between open:closed changing over time?</li>
<li>Are the number of PDF files published increasing or decreasing?</li>
<li>Which departments publish in non-standard formats?</li>
<li>How can we visualise the data?</li>
</ul>

<p>Answering these questions will involve analysing a large amount of data to produce a visualisation which can be used by the organisation to improve its business practices and quality of output.</p>

<p>The analysis is undertaken with the understanding that retaining the trust of our users and community is paramount (<a href="https://shkspr.mobi/blog/2020/11/book-review-privacy-is-power-carissa-veliz/">Véliz, 2020</a>).</p>

<h2 id="2-data-analytics-principles"><a href="https://shkspr.mobi/blog/2021/08/msc-assignment-2-data-analytics-principles/#2-data-analytics-principles">2. Data Analytics Principles</a></h2>

<h3 id="how-key-algorithms-and-models-are-applied-in-developing-analytical-solutions-and-how-analytical-solutions-can-deliver-benefits-to-organisations"><a href="https://shkspr.mobi/blog/2021/08/msc-assignment-2-data-analytics-principles/#how-key-algorithms-and-models-are-applied-in-developing-analytical-solutions-and-how-analytical-solutions-can-deliver-benefits-to-organisations">How key algorithms and models are applied in developing analytical solutions and how analytical solutions can deliver benefits to organisations.</a></h3>

<p>Machine Learning (ML) provides the following key benefits:</p>

<ul>
<li>ML allows us to dedicate cheap and reliable computational resources to problems which would otherwise use expensive human resources of inconsistent quality.</li>
<li>For example, ML can be used to extract text from documents and perform sentiment analysis to determine how they should be classified (<a href="https://dftdigital.blog.gov.uk/2018/04/09/the-write-stuff-how-we-used-ai-to-help-us-handle-correspondence/">Arundel, 2018</a>).</li>
</ul>

<p>There are two main models of ML:</p>

<ul>
<li>Supervised learning:

<ul>
<li>If data has already been gathered and classified by humans, it can be used in ML training.</li>
<li>A portion of the data is used as a training set - for the ML algorithm to process and derive rules.</li>
<li>A random sampling of the data is withheld from the training set. The ML process is run against this validation set to see if it can correctly predict their classifications.</li>
</ul></li>
<li>Unsupervised learning:

<ul>
<li>Uses unclassified data to discover inherent clustering patterns within the data.</li>
<li>There are multiple clustering algorithms, each with a different bias, and so choosing the correct one can be difficult (<a href="https://doi.org/10.1109/TNN.2005.845141">Xu and WunschII, 2005</a>).</li>
</ul></li>
</ul>

<p>In both cases, ML seeks correlation between multiple variables:</p>

<ul>
<li>Pearson correlation coefficient assigns a value between -1 and +1 based on the relationship between two variables, divided by the sum of their standard deviations (<a href="https://doi.org/10.1098/rspl.1895.0041">Pearson, 1895</a>)</li>
<li>Linear Regression attempts to find a "best fit" linear relationship, usually via a sum of squared errors.</li>
<li>With any results it is important for analysts to understand that correlation does not imply causation.</li>
</ul>

<p>Another model is Big Data:</p>

<ul>
<li>Although facetiously referred to as "larger than can fit in excel" (<a href="https://twitter.com/peteskomoroch/status/1290703113">Skomoroch, 2009</a>), Big Data usually refers to the "Four Vs" of Velocity, Volume, Variety, and Veracity (<a href="https://doi.org/10.1109/HPEC.2014.7040946">Kepner <em>et al.</em>, 2014</a>).</li>
<li>While Big Data can be helpful in gaining insights, complex approaches such as MapReduce (<a href="https://doi.org/10.1145/1327452.1327492">Dean and Ghemawat, 2008</a>) are needed to efficiently analyse datasets using parallel processing.</li>
<li>As a government department, we need to be mindful of the dangers to the public caused by Big Data (<a href="https://shkspr.mobi/blog/2020/02/book-review-the-age-of-surveillance-capitalism/">Zuboff, 2020</a>).</li>
</ul>

<p>All algorithms suffer from learning bias:</p>

<ul>
<li>Systemic bias is present when data are captured, when decisions are made about how to classify data, and when data are analysed (<a href="https://www.degruyter.com/isbn/9781479833641">Noble, 2018</a>).</li>
<li>Failure to correct for this bias could be unlawful (<a href="https://www.legislation.gov.uk/ukpga/2010/15/contents"><em>Equality Act</em>, 2010</a>).</li>
<li>Finally, we must recognise that a training dataset will always be inferior to the full data record. This is commonly known as the "no such thing as a free lunch" problem (<a href="https://doi.org/10.1109/4235.585893">Wolpert and Macready, 1997</a>).</li>
</ul>

<h2 id="2-the-information-governance-requirements-that-exist-in-the-uk-and-the-relevant-organisational-and-legislative-data-protection-and-data-security-standards-that-exist-the-legal-social-and-ethical-c"><a href="https://shkspr.mobi/blog/2021/08/msc-assignment-2-data-analytics-principles/#2-the-information-governance-requirements-that-exist-in-the-uk-and-the-relevant-organisational-and-legislative-data-protection-and-data-security-standards-that-exist-the-legal-social-and-ethical-c">2. The information governance requirements that exist in the UK, and the relevant organisational and legislative data protection and data security standards that exist. The legal, social and ethical concerns involved in data management and analysis.</a></h2>

<p>The UK Government is bound by several pieces of legislation. The following are the key areas which impact data management and analysis:</p>

<ul>
<li>GDPR is the main legislation covering the processing of personal data.

<ul>
<li>It also covers automated decision making and the right of data subjects to view, correct, and port their data (<a href="https://www.legislation.gov.uk/ukpga/2018/12/contents/enacted"><em>Data Protection Act</em>, 2018</a>).</li>
<li>This project does not use personal data and is thus exempt.</li>
</ul></li>
<li>Data created by public authorities is subject to laws concerning transparency and openness (<a href="https://www.legislation.gov.uk/ukpga/2000/36/contents"><em>Freedom of Information Act</em>, 2000</a>) - colloquially known as FOI.

<ul>
<li>The data and reports created from this dashboard will be subject to FOI.</li>
<li>By publishing the data and code in the open on GitHub, we believe that our obligations under FOI are satisfied under a s21 exemption (<a href="https://www.nationalarchives.gov.uk/documents/information-management/freedom-of-information-exemptions.pdf">National Archives, 2019</a>).</li>
</ul></li>
<li>Accessibility Requirements.  All government websites must meet a minimum level of accessibility (<a href="https://www.gov.uk/guidance/accessibility-requirements-for-public-sector-websites-and-apps">CDDO, 2021</a>).

<ul>
<li>If this were to be published, they would need to be tested against modern standards for features like alt text, colour contrast, and keyboard accessibility (<a href="https://www.w3.org/TR/WCAG21/">W3C, 2018</a>).</li>
</ul></li>
<li>Members of the Houses of Commons and Lords are able to ask Parliamentary Questions of our department (<a href="https://doi.org/10.1080/13572339908420584">Cole, 1999</a>).

<ul>
<li>The ability to quickly and accurately answer members' questions is part of our core business activity.</li>
<li>It is a civil servant's responsibility to allow Ministers to give accurate and truthful information (<a href="https://www.gov.uk/government/publications/drafting-answers-to-parliamentary-questions-guidance">Cabinet Office, 2011</a>)</li>
</ul></li>
<li>The Centre for Data Ethics and Innovation is a department which acts as a "watchdog" for ethical use of government data.

<ul>
<li>They consider "dashboard" style reports as having an important role in communicating data effectively (<a href="https://www.gov.uk/government/publications/local-government-use-of-data-during-the-pandemic">CDEI, 2021</a>).</li>
</ul></li>
<li>The UK is a multilingual country. We have an obligation to consider whether the reports should be published in Welsh (<a href="https://www.legislation.gov.uk/ukpga/1993/38/contents"><em>Welsh Language Act</em>, 1993</a>)

<ul>
<li>A further exercise may need to be carried out to determine how many publications are <span lang="cy"><i>wedi'i ysgrifennu yn y Gymraeg</i></span>.</li>
</ul></li>
</ul>

<h2 id="3-the-properties-of-different-data-storage-solutions-and-the-transmission-processing-and-analytics-of-data-from-an-enterprise-system-perspective-this-should-include-the-platform-choices-available"><a href="https://shkspr.mobi/blog/2021/08/msc-assignment-2-data-analytics-principles/#3-the-properties-of-different-data-storage-solutions-and-the-transmission-processing-and-analytics-of-data-from-an-enterprise-system-perspective-this-should-include-the-platform-choices-available">3. The properties of different data storage solutions, and the transmission, processing and analytics of data from an enterprise system perspective. This should include the platform choices available for designing and implementing solutions for data storage, processing and analytics in different data scenarios.</a></h2>

<p>Governments around the world have adopted a Cloud First strategy (<a href="https://opus.lib.uts.edu.au/handle/10453/121604">Busch <em>et al.</em>, 2014</a>), and the UK is no exception.  Rather than traditional on-premises equipment or dedicated remote servers, we use a set of scalable resources which automatically adapt to our needs. Along with edge processing, caching, and associated features, this paradigm is commonly known as "Dew Computing" (<a href="https://doi.org/10.1109/ACCESS.2017.2775042">Ray, 2018</a>).  This allows us to tightly control our costs, reduce our carbon footprint, and scale to meet demand.  We can "shard" our data (that is, store it across multiple, redundant locations) (<a href="https://www.raphkoster.com/2009/01/08/database-sharding-came-from-uo/">Koster, 2009</a>). However, this requires careful management of transactions to ensure eventual consistency (<a href="https://doi.org/10.1016/0306-4379(92)90027-K">Tewari and Adam, 1992</a>).</p>

<p>The majority of our documents are stored as unstructured data in an Amazon S3 Bucket. Because of the lack of structured data, they are retrieved via a key-value store using NoSQL (Sadalage and Fowler, 2012). This reduces the overhead of creating and maintaining a schema, at the expense of the ability to construct deterministic queries.</p>

<p>Our enterprise storage needs generally follow the "Inmon Model" (<a href="https://www.wiley.com/en-gb/Building+the+Data+Warehouse%2C+4th+Edition-p-9780764599446">Inmon, 2005</a>) where multiple data publishers store their data in a "data warehouse" - then further processing takes place in separate datamarts.</p>

<p></p><div id="attachment_39938" style="width: 1034px" class="wp-caption aligncenter"><img aria-describedby="caption-attachment-39938" src="https://shkspr.mobi/blog/wp-content/uploads/2021/08/Data_Warehouse_Feeding_Data_Mart.jpg" alt="Diagram showing the logical connections in a datamart." width="1252" height="494" class="size-full wp-image-39938"><p id="caption-attachment-39938" class="wp-caption-text">Fig 01 Enterprise Data Warehouse diagram © (<a href="https://commons.wikimedia.org/wiki/File:Data_Warehouse_Feeding_Data_Mart.jpg">vlntn, 2009</a>)</p></div><p></p>

<p>The ability to Extract, Transform, and Load (ETL) data allows users to construct their own queries rather than relying on predefined methods (<a href="https://doi.org/10.1016/j.datak.2017.08.004">Theodorou <em>et al.</em>, 2017</a>).  This increases their utility and decreases our support costs.</p>

<p>Because of the lack of detailed metadata, a search index is generated using term frequency–inverse document frequency (TF-IDF) (<a href="https://doi.org/10.1108/eb026526">Spärck Jones, 1972</a>). TF-IDF is the most common method of building recommendation engines (<a href="https://doi.org/10.1007/s00799-015-0156-0">Beel <em>et al.</em>, 2016</a>), and allows us to sort search results by relevance.</p>

<p>Basic metadata is stored in relational databases. This gives us the ability to use Structured Query Language to interrogate the database and retrieve information.</p>

<p>Manually exchanging data using outdated formats or standards is unreliable. During the COVID-19 crisis, contact-tracing data was lost due to the limitations of Microsoft's proprietary Excel format (<a href="https://papers.ssrn.com/abstract=3753893">Fetzer and Graeber, 2020</a>). This has accelerated our adoption of APIs which should not suffer from such data loss. As per the National Data Strategy (<a href="https://www.gov.uk/government/publications/uk-national-data-strategy/national-data-strategy">Dowden, 2020</a>), we now advocate an API first strategy to ensure that data can flow freely.</p>

<p>Storage and transfer of data aren't the only factors to consider. We also have a mandate to provide "Linked Data" (<a href="https://doi.org/10.1109/MIS.2012.23">Shadbolt <em>et al.</em>, 2012</a>).</p>

<h2 id="4-how-relevant-data-hierarchies-or-taxonomies-are-identified-and-properly-documented"><a href="https://shkspr.mobi/blog/2021/08/msc-assignment-2-data-analytics-principles/#4-how-relevant-data-hierarchies-or-taxonomies-are-identified-and-properly-documented">4. How relevant data hierarchies or taxonomies are identified and properly documented.</a></h2>

<p>Taxonomies are useful tools for helping both humans and machines navigate data.</p>

<p>The taxonomy currently used in our data warehouse was identified through a process of ongoing user research involving stakeholders inside and outside the organisation.</p>

<p>Content stored on GOV.UK is subject to a sophisticated taxonomy which was intentionally designed to capture published information within a specific domain (<a href="https://www.gov.uk/government/publications/govuk-topic-taxonomy-principles/govuk-taxonomy-principles">GDS, 2019</a>).</p>

<p>In order to meet the criteria for a well-defined taxonomy, the taxonomy is documented in a format which complies with ANSI/NISO Z39.19-2005 (<a href="https://www.niso.org/publications/ansiniso-z3919-2005-r2010">National Information Standards Organization, 2010</a>).  The taxonomy documentation must also retain compatibility with other vocabularies used worldwide (<a href="https://www.iso.org/cms/render/live/en/sites/isoorg/contents/data/standard/05/36/53657.html">ISO, 2011</a>).</p>

<p>Taxonomies within the content area generally fall into multiple categories:</p>

<ul>
<li>Department - who published the content (e.g. Department for Education),</li>
<li>Topic - what the content relates to (e.g. Starting a Business),</li>
<li>Metadata - information about the content (e.g. filetype, date of publication, etc).</li>
</ul>

<p>Documentation is stored in GitHub (<a href="https://github.com/alphagov/govuk-developer-docs/blob/faabf3ecceed0443db1d5243feecfd6d8ca4b0f8/source/manual/taxonomy.html.md">GDS, 2021</a>) in order to be easily discoverable and editable.</p>

<p>A supervised Machine Learning process was used to classify content which had not been tagged (<a href="https://dataingovernment.blog.gov.uk/2018/10/19/how-we-used-deep-learning-to-structure-gov-uks-content/">Zachariou et al, 2018</a>). This has led to an increase in correctly identified content.</p>

<p>The use of Convolutional Neural Networks to assist in Natural-Language Processing increases the accuracy of the identified data while reducing the human resources needed to keep the taxonomy up to date.</p>

<p>Because individual users and departments are free to create their own tagging structure, the overall effect is that of an unstructured "Folksonomy" (<a href="https://vanderwal.net/folksonomy.html">Vander Wal, 2004</a>).</p>

<p>In the future, we may allow users of our service to create their own tags in a collaborative environment. We recognise that this may introduce challenges around diversity and inclusiveness (<a href="https://doi.org/10.1007/11758532_152">Lambiotte and Ausloos, 2006</a>).</p>

<h2 id="3-product-design-development-evaluation"><a href="https://shkspr.mobi/blog/2021/08/msc-assignment-2-data-analytics-principles/#3-product-design-development-evaluation">3. Product Design, Development &amp; Evaluation</a></h2>

<h3 id="the-application-of-data-analysis-principles-include-here-the-approach-the-selected-data-the-fitted-models-and-evaluations-used-in-the-development-of-your-product"><a href="https://shkspr.mobi/blog/2021/08/msc-assignment-2-data-analytics-principles/#the-application-of-data-analysis-principles-include-here-the-approach-the-selected-data-the-fitted-models-and-evaluations-used-in-the-development-of-your-product">The application of data analysis principles. Include here the approach, the selected data, the fitted models and evaluations used in the development of your product.</a></h3>

<p>Approach:</p>

<ul>
<li>In order to create a beneficial analytic visualisation, it is important to understand how the graphic will enable interpretation and comprehension of the underlying data (<a href="https://uk.sagepub.com/en-gb/eur/data-visualisation/book266150">Kirk, 2021</a>).

<ul>
<li>I carried out a brief research exercise to understand the organisational need.</li>
<li>Stakeholders wanted a way to dynamically visualise the change in proportion of open:closed document formats uploaded to the GOV.UK publishing platform.</li>
</ul></li>
</ul>

<p>Selected Data:</p>

<ul>
<li>Both Structured and Unstructured data were available.

<ul>
<li>The Structured data was in the metadata of the files - date of upload, name of uploading party, and Media Type (<a href="https://www.iana.org/assignments/media-types/media-types.xhtml">IANA, 2021</a>).</li>
<li>The Unstructured data was the contents of documents. Further analysis may be taken on this to determine the conformance of the documents to their purported standard.</li>
</ul></li>
<li>I decided to create a time-series visualisation based on the following structured data:

<ul>
<li>Nominal data - the Media Type of the document,</li>
<li>Ratio-scale numeric data - the number of files uploaded,</li>
<li>Interval-scale numeric data - the date the file was uploaded.</li>
</ul></li>
</ul>

<p>Fitted Models:</p>

<ul>
<li>Representing data in an area, such as a Pie Chart, was <em>conceptually</em> understood by stakeholders. But there are numerous problems with people being able to correctly <em>evaluate</em> the area represented in charts - especially small slices (<a href="https://dl.acm.org/doi/10.5555/4084">Cleveland, 1985</a>)</li>
<li>An attempt was made at "Data Sonification" (<a href="https://doi.org/10.1109/5992.774840">Kaper, Wiebel and Tipei, 1999</a>). This converts the data into an audio wave so that changes and patterns can be discerned by ear, rather than by eye. The stakeholders considered the resultant "music" to be too experimental to be useful.</li>
</ul>

<p>Evaluation:</p>

<p>An evaluation was undertaken using the Seven Hats of Visualisation Design (<a href="https://uk.sagepub.com/en-gb/eur/data-visualisation/book266150">Kirk, 2021</a>).</p>

<ul>
<li>Constraints

<ul>
<li>Time

<ul>
<li>Due to the recent changes in departmental priorities, it was uncertain whether newer, more complete, data could be obtained in time.</li>
<li>In order to quickly produce experimental results, I reused an existing dataset. Once the complete data were available, I was able to run my developed code against it.</li>
</ul></li>
<li>Budget

<ul>
<li>The visualisation tools available (R and Python) were both cost-free.</li>
<li>The cost of obtaining and storing the data was met out of the existing analytics budget.</li>
</ul></li>
<li>Technology

<ul>
<li>We retain enough cloud computing resources for both the storage and processing of this data.</li>
<li>Reports can be periodically run via a scheduled task manager like <code>cron</code>, or on demand.</li>
<li>Hosting and transcoding of video content is best suited to a dedicated resource like YouTube.</li>
</ul></li>
<li>Politics

<ul>
<li>Any visualisation of data which identified individual departments could be interpreted as a rebuke from our department.</li>
<li>Our department's role sometimes involves having difficult conversations with other departments; however we strive to do this privately.</li>
<li>It was agreed that any public visualisation should anonymise departments as far as possible, and that publication would not occur without consultation.</li>
</ul></li>
</ul></li>
<li>Deliverables

<ul>
<li>Presentation

<ul>
<li>While a static visualisation is common for changes over time, stakeholders suggested a dynamic presentation would be more engaging.</li>
<li>Animation requires specific consideration of accessibility needs.</li>
</ul></li>
<li>Actionable items

<ul>
<li>Stakeholders wanted to know whether past policies had any effect on the publication rate of documents by media type.</li>
<li>Stakeholders wanted to see the scale of the problem so they could know how much resource to dedicate to it.</li>
</ul></li>
<li>Proactive alerting

<ul>
<li>Any report would have to be run periodically.</li>
<li>Large changes will be immediately visible and can be used as the basis for further investigation.</li>
</ul></li>
</ul></li>
</ul>

<p>Our department's mandate is to directly influence the future behaviour of people contributing to the data set.  After assessing the various predictive models, stakeholders determined that the ability to predict future data using ML on current trends was not useful. We expect those behavioural patterns to change following our interventions.</p>

<p>Similarly, the ability to predict media type or openness based on filename or publisher was considered irrelevant due to the metadata being already available.</p>

<p>Pie Charts were felt to be an old-fashioned way of representing data - with their use in the UK Government first being popularised by Florence Nightingale during the mid-1800s (<a href="https://doi.org/10.1124/mi.11.2.1">Anderson, 2011</a>).</p>

<p>For these reasons, I decided to implement the TreeMap algorithm (<a href="https://doi.org/10.1145/102377.115768">Shneiderman, 1992</a>).</p>

<p></p><div id="attachment_39942" style="width: 1034px" class="wp-caption aligncenter"><img aria-describedby="caption-attachment-39942" src="https://shkspr.mobi/blog/wp-content/uploads/2021/08/treemap.png" alt="Greyscale drawing of several squares and rectangles bunched together to show proportionate space." width="637" height="410" class="size-full wp-image-39942"><p id="caption-attachment-39942" class="wp-caption-text">Fig 02 A-Z Treemap with Common Child Offsets © (Johnson, 1993)</p></div><p></p>

<p>TreeMap provides several advantages over visualisations like Pie Charts:</p>

<ul>
<li>TreeMap allows users to more quickly understand large and/or complex data structures (<a href="https://doi.org/10.1016/j.procs.2017.12.136">Long <em>et al.</em>, 2017</a>).</li>
<li>TreeMap supports sub-groups. That is, a section of the graph can contain sub-sections. This allows for more detail to be displayed.</li>
</ul>

<p>Initial views of the TreeMap produced non-deterministic results which made comparing a time-series challenging.</p>

<table>
  <tbody><tr>
   <td><img src="https://shkspr.mobi/blog/wp-content/uploads/2021/08/map1.png" alt="Greyscale treemap." width="1024" height="1024" class="aligncenter size-full wp-image-39943"></td>
   <td><img src="https://shkspr.mobi/blog/wp-content/uploads/2021/08/map2.png" alt="Another greyscale map." width="1024" height="1024" class="aligncenter size-full wp-image-39944"></td>
  </tr>
  <tr>
   <td colspan="2">Fig 03 Demonstration TreeMaps showing random ordering of results across time periods.
   </td>
  </tr>
</tbody></table>

<p>I investigated alternative approaches and discovered that using a squarifying algorithm (<a href="https://doi.org/10.1007/978-3-7091-6783-0_4">Bruls, Huizing and van Wijk, 2000</a>) produced results which were more easily comparable.</p>

<table>
  <tbody><tr>
   <td><img src="https://shkspr.mobi/blog/wp-content/uploads/2021/08/map3.png" alt="Greyscale map with specific ordering." width="1024" height="1024" class="aligncenter size-full wp-image-39946"></td>
   <td><img src="https://shkspr.mobi/blog/wp-content/uploads/2021/08/map4.png" alt="Map from a later time period with the same logical ordering." width="1024" height="1024" class="aligncenter size-full wp-image-39947"></td>
  </tr>
  <tr>
   <td colspan="2">Fig 04 Demonstration TreeMaps showing results in the same order across time periods.
   </td>
  </tr>
</tbody></table>

<h4 id="business-implications-and-benefits"><a href="https://shkspr.mobi/blog/2021/08/msc-assignment-2-data-analytics-principles/#business-implications-and-benefits">Business Implications and Benefits</a></h4>

<ul>
<li>The results of this analysis have been shared with the organisation. We now have a coherent understanding of the size of the problem, and its current trajectory.</li>
<li>Individual departments will be notified about their results.</li>
<li>Where departments are continuing to publish data in an unsuitable format, we are able to provide them with support.</li>
<li>The report can now be run on a regular basis, and used to monitor ongoing compliance.</li>
</ul>

<h2 id="6-the-application-of-concepts-tools-and-techniques-for-data-visualisation-including-how-this-provides-a-qualitative-understanding-of-the-information-on-which-decisions-can-be-based-include-here-th"><a href="https://shkspr.mobi/blog/2021/08/msc-assignment-2-data-analytics-principles/#6-the-application-of-concepts-tools-and-techniques-for-data-visualisation-including-how-this-provides-a-qualitative-understanding-of-the-information-on-which-decisions-can-be-based-include-here-th">6. The application of concepts, tools and techniques for data visualisation, including how this provides a qualitative understanding of the information on which decisions can be based. Include here the visualisation aspects applicable to your product.</a></h2>

<p>Our goal when producing visualisations is to be "Trustworthy, Accessible, Elegant" (<a href="https://uk.sagepub.com/en-gb/eur/data-visualisation/book266150">Kirk, 2021</a>).</p>

<p>Trustworthy:</p>

<ul>
<li>Accurate and definitive data was retrieved from the warehouse.</li>
<li>Raw data was stored for others to verify our results.</li>
<li>Open source scripts showed how the data are transformed into usable dataframes.</li>
<li>Ensured the output is deterministic so that others can reproduce the results.</li>
</ul>

<p>Accessible</p>

<ul>
<li>In this context, accessible means both meeting legal accessibility requirements (<a href="https://www.w3.org/TR/WCAG21/">W3C, 2018</a>) and available to those that need to see the data.

<ul>
<li>Accessibility:

<ul>
<li>Care was taken to meet minimum contrast guidelines between text and background.</li>
<li>Due to the high prevalence of colour blindness in the population (<a href="https://www.nei.nih.gov/learn-about-eye-health/eye-conditions-and-diseases/color-blindness">National Eye Institute, 2019</a>), it was necessary to add labels to ensure that the visualisation was accessible.</li>
<li>The minimum font size was challenging because, by its nature, some segments of the diagram are small.</li>
<li>The frame-rate of the animation was set at a suitable frequency to ensure photosensitivity needs were met.</li>
<li>The final animation was provided as a series of still frames for those that required it.</li>
</ul></li>
<li>Access

<ul>
<li>The animations were made available internally via our Wiki, email, Slack, and during presentations.</li>
<li>The data and graphs were published to GitHub under a permissive licence to encourage access and reuse.</li>
</ul></li>
</ul></li>
</ul>

<p>Elegant</p>

<ul>
<li>As well as accessibility concerns (see above) there were æsthetic considerations.  The default colouring provided by the TreeMap library was used.</li>
<li>User research showed that viewers intuitively understood that area size was proportional to volume of published documents.</li>
</ul>

<p>When testing with users, the TreeMap performed well compared to the common issues of visualising noted in literature (<a href="https://dl.acm.org/doi/10.5555/4084">Cleveland, 1985</a>):</p>

<ul>
<li>Over-emphasising small results was not a problem. Due to the visually smaller sizes of the area, and the proportionately smaller font size, insignificant results were ignored.</li>
<li>Many charts use multiple similar colours. By restricting the TreeMap to four colours there was less distracting visual noise.</li>
<li>While users were able to intuitively understand that size is proportionate to volume of documents, some shapes produced by the squarified algorithm were not always easily comparable.</li>
</ul>

<p></p><div id="attachment_39949" style="width: 1034px" class="wp-caption aligncenter"><img aria-describedby="caption-attachment-39949" src="https://shkspr.mobi/blog/wp-content/uploads/2021/08/colourmap.png" alt="A colourful map where every item is a different shape." width="1750" height="1750" class="size-full wp-image-39949"><p id="caption-attachment-39949" class="wp-caption-text">Fig 05 In this example, the tall and thin shape (doc) has a similar area than the more square shape on its left (xls), but users felt it looked significantly smaller.</p></div><p></p>

<p>Due to the use of animation, I followed the example of the animated bubble maps demonstrated by Hans Rosling (<a href="https://www.gapminder.org/fw/world-health-chart/">Rosling, 2019</a>).</p>

<p>The use of animation was crucial to showing both the scale of the data, and the momentum of change.</p>

<p>Clustering and grouping:</p>

<ul>
<li>Another advantage of TreeMap is the ability to subdivide data into groups.</li>
<li>In this test image, there is no grouping, so it is not possible to see which Media Types are open and which are closed.</li>
</ul>

<p></p><div id="attachment_39952" style="width: 1034px" class="wp-caption aligncenter"><img aria-describedby="caption-attachment-39952" src="https://shkspr.mobi/blog/wp-content/uploads/2021/08/greymap.png" alt="All the segments of the map are the same colour." width="626" height="572" class="size-full wp-image-39952"><p id="caption-attachment-39952" class="wp-caption-text">Fig 06 Uncoloured TreeMap.</p></div><p></p>

<ul>
<li>Once groupings were added, similar Media Types were congruent - which made assessing their relative volume easier:</li>
</ul>

<p></p><div id="attachment_39953" style="width: 1034px" class="wp-caption aligncenter"><img aria-describedby="caption-attachment-39953" src="https://shkspr.mobi/blog/wp-content/uploads/2021/08/colourfulmap.png" alt="A map where square relating to a specific group all have the same colour." width="876" height="805" class="size-full wp-image-39953"><p id="caption-attachment-39953" class="wp-caption-text">Fig 07 Coloured TreeMap demonstrating grouping.</p></div><p></p>

<p>Alternate view on data:</p>

<ul>
<li>The data can be arranged in a multidimensional array and be considered as a "Data Cube" (<a href="https://doi.org/10.1109/ICDE.1996.492099">Gray <em>et al.</em>, 1996</a>).</li>
<li>This allows us to manipulate and slice the data in a variety of ways to more deeply examine relationships between data facets.</li>
<li>The most requested view by stakeholders was the ability to cluster by department.</li>
<li>Being able to "slice and dice" this data (<a href="https://webdocs.cs.ualberta.ca/~zaiane/courses/cmput690/glossary.html#D">Zaïane, 1999</a>) gives us a more detailed view of the data.</li>
</ul>

<p></p><div id="attachment_39955" style="width: 1034px" class="wp-caption aligncenter"><img aria-describedby="caption-attachment-39955" src="https://shkspr.mobi/blog/wp-content/uploads/2021/08/labelmap.png" alt="Large map, each square is subdivided into smaller squares." width="2048" height="1365" class="size-full wp-image-39955"><p id="caption-attachment-39955" class="wp-caption-text">Fig 08 Each black-bordered cell shows a single department, and the number and type of documents they published this year. Department names have been redacted for publication.</p></div><p></p>

<p>The final visualisation was a 30 second animation which demonstrated the rate of change of volume of uploads of different Media Types and their categories.</p>

<iframe title="Animated TreeMap - MSc Coursework" width="620" height="465" src="https://www.youtube.com/embed/-_ecmTC2hRc?feature=oembed" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen=""></iframe>

<h2 id="4-personal-reflection"><a href="https://shkspr.mobi/blog/2021/08/msc-assignment-2-data-analytics-principles/#4-personal-reflection">4. Personal Reflection</a></h2>

<h3 id="a-reflective-evaluation-of-the-implications-of-conducting-this-investigation-for-your-learning-development-on-this-programme"><a href="https://shkspr.mobi/blog/2021/08/msc-assignment-2-data-analytics-principles/#a-reflective-evaluation-of-the-implications-of-conducting-this-investigation-for-your-learning-development-on-this-programme">A reflective evaluation of the implications of conducting this investigation for your learning development on this programme.</a></h3>

<p>I will evaluate my experience using Gibbs' Model of reflection (Gibbs, 1998)</p>

<p>Description of the experience:</p>

<ul>
<li>This was an excellent module which helped me learn useful concepts, tools and techniques.</li>
<li>Data Science is seen by Number 10 as a hugely important civil service competency (<a href="https://dominiccummings.com/2020/01/02/two-hands-are-a-lot-were-hiring-data-scientists-project-managers-policy-experts-assorted-weirdos/">Cummings, 2020</a>). The ability to quickly gather, analyse, and interpret data is a key skill in the modern civil service.</li>
</ul>

<p>Feelings and thoughts about the experience:</p>

<ul>
<li>This module introduced me to new algorithmic ideas, and provided me with valuable insights into how they can be effectively applied.</li>
<li>I enjoyed sharing my expertise with classmates during our workshops, and explaining some of the vaugeries and limitations of the various languages we learned.</li>
<li>I would have preferred to have gone into more depth on a single tool, rather than having a variety of tools and techniques to learn.</li>
</ul>

<p>Evaluation of the experience, both good and bad:</p>

<ul>
<li>I had previous experience with both R and Python, but I had never used Azure or PowerBI.</li>
<li>I discovered I have the ability to quickly apply learnings from other domains when I am confronted with a new piece of software or programming paradigm.</li>
<li>I found some of the lessons a little too focussed on the syntax of tools, rather than understanding the underlying principles.</li>
</ul>

<p>Analysis to make sense of the situation:</p>

<ul>
<li>This module has strengthened my belief that data science alone isn't the answer to government's problems.</li>
<li>To tackle new and existing problems, we need expertise which goes beyond data science and which encompasses ethics, psychology, and social sciences (<a href="https://doi.org/10.1038/d41586-020-00064-x">Shah, 2020</a>)</li>
</ul>

<p>Conclusion about what you learned and what you could have done differently:</p>

<ul>
<li>I underestimated the amount of time needed to get the precise data that I required, so I initially relied on an older data set. I should have been more explicit around timescales in my initial request.</li>
<li>Visualisations are not well understood in our department, so I should have spent more time explaining their value to my team.</li>
</ul>

<p>Action plan:</p>

<ul>
<li>Work with the existing cross-government R community to better understand how I can integrate R into our department's workflow.</li>
<li>Introduce TreeMaps into more presentations.</li>
<li>Improve my existing knowledge of Python and commit to blogging about my experience of this module.</li>
</ul>

<h2 id="appendix-code"><a href="https://shkspr.mobi/blog/2021/08/msc-assignment-2-data-analytics-principles/#appendix-code">Appendix: Code</a></h2>

<p>This R code reads in a generated CSV and then produces the TreeMap images used in the animation:</p>

<pre><code class="language-R">library(gganimate)
library(treemapify)
library(plotly)

#   Read in the data
uploaded_files &lt;- read.csv("report.csv", header=TRUE)

#   Get a vector of file extensions
mime_types &lt;- unique(uploaded_files[c("Filetype")])
rownames(mime_types) = NULL

#   Get a vector of Organisations
organisations &lt;- unique(uploaded_files[c("Organisation")])
rownames(organisations) = NULL

#   Get a vector of each week
file_dates &lt;- unique(uploaded_files[c("Published.Date")])
rownames(file_dates) = NULL

#   Find broken dates
broken_dates &lt;- subset(uploaded_files, Published.Date == "")

#   Remove rows with null dates
if (count(broken_dates)[,1] &gt; 0 ) {
    uploaded_files &lt;- uploaded_files[uploaded_files$Published.Date != "", ] 
}

#   Convert timestamps to Date objects
uploaded_files &lt;- mutate(uploaded_files, Published.Date = as.Date(Published.Date))

#   Remove rows with too old dates
uploaded_files &lt;- uploaded_files[as.Date(uploaded_files$Published.Date) &gt;= as.Date("2013-01-01"), ] 

#   Add Week Column
uploaded_files["Week"] &lt;- format(uploaded_files$Published.Date, format = "%Y-%W")

#   Get the file extensions - more accurate than MIME type
file_ext &lt;- gsub("^.*\\.", "", uploaded_files$Filename)
file_ext &lt;- sapply(file_ext, toupper)
uploaded_files$Filetype &lt;- file_ext

#   Add category
uploaded_files["Category"] &lt;- ""

uploaded_files$Category[uploaded_files$Filetype == "PDF"] &lt;- "PDFs"

uploaded_files$Category[uploaded_files$Filetype == "DXF"] &lt;- "Other"
uploaded_files$Category[uploaded_files$Filetype == "PS"]  &lt;- "Other"
uploaded_files$Category[uploaded_files$Filetype == "RDF"] &lt;- "Other"
uploaded_files$Category[uploaded_files$Filetype == "RTF"] &lt;- "Other"
uploaded_files$Category[uploaded_files$Filetype == "XSD"] &lt;- "Other"
uploaded_files$Category[uploaded_files$Filetype == "XML"] &lt;- "Other"
uploaded_files$Category[uploaded_files$Filetype == "ZIP"] &lt;- "Other"

uploaded_files$Category[uploaded_files$Filetype == "JPG"] &lt;- "Image"
uploaded_files$Category[uploaded_files$Filetype == "EPS"] &lt;- "Image"
uploaded_files$Category[uploaded_files$Filetype == "PNG"] &lt;- "Image"
uploaded_files$Category[uploaded_files$Filetype == "GIF"] &lt;- "Image"

uploaded_files$Category[uploaded_files$Filetype == "CSV"] &lt;- "Open"
uploaded_files$Category[uploaded_files$Filetype == "ODP"] &lt;- "Open"
uploaded_files$Category[uploaded_files$Filetype == "ODS"] &lt;- "Open"
uploaded_files$Category[uploaded_files$Filetype == "ODT"] &lt;- "Open"
uploaded_files$Category[uploaded_files$Filetype == "TXT"] &lt;- "Open"

uploaded_files$Category[uploaded_files$Filetype == "DOC"]  &lt;- "Closed"
uploaded_files$Category[uploaded_files$Filetype == "DOCX"] &lt;- "Closed"
uploaded_files$Category[uploaded_files$Filetype == "DOT"]  &lt;- "Closed"
uploaded_files$Category[uploaded_files$Filetype == "PPT"]  &lt;- "Closed"
uploaded_files$Category[uploaded_files$Filetype == "PPTX"] &lt;- "Closed"
uploaded_files$Category[uploaded_files$Filetype == "XLS"]  &lt;- "Closed"
uploaded_files$Category[uploaded_files$Filetype == "XLSB"] &lt;- "Closed"
uploaded_files$Category[uploaded_files$Filetype == "XLSM"] &lt;- "Closed"
uploaded_files$Category[uploaded_files$Filetype == "XLSX"] &lt;- "Closed"
uploaded_files$Category[uploaded_files$Filetype == "XLT"]  &lt;- "Closed"

#   Vector of all filetypes
file_extensions &lt;- unique(uploaded_files[c("Filetype")])[,1]

#   Vector of the filetype's category
file_categories &lt;- data.frame(Category=character())
for(e in file_extensions) { 
    temp_row &lt;-  (uploaded_files[uploaded_files$Filetype == e,]$Category[1])
    file_categories &lt;- rbind(file_categories, temp_row)
}
colnames(file_categories)[1] = "Category"

#   Sort by date, then file type
uploaded_files &lt;- uploaded_files[order(uploaded_files$Week, uploaded_files$Filetype),]

#   Weekly graph
weeks &lt;- unique(uploaded_files$Week)

#   Create weekly summary
weekly_data &lt;- data.frame(
    Week     =character(), 
    Filetype =character(), 
    Count    =integer(),
    Category =character(),
    stringsAsFactors=FALSE)

#   Loop through the weeks
for(week in weeks) {
    temp_data &lt;- subset(uploaded_files, Week == week)


    # Loop through and add up all the previous uploads of this filetype
    for (ex in file_extensions) {
        #   How many of this file are there?
        file_count &lt;- count( temp_data[temp_data$Filetype == ex,])


        #   What category is it in?
        cat &lt;- subset(temp_data, Filetype == ex)$Category[1]


        #   Populate the row
        temp_row &lt;- data.frame(Week = week, Filetype = ex, Count = file_count, Category = cat )


        #   Add the row to the existing data
        weekly_data &lt;- rbind(weekly_data, temp_row)
    }
}

# Ensure the column names are right
colnames(weekly_data)[1] = "Week"
colnames(weekly_data)[2] = "Filetype"
colnames(weekly_data)[3] = "Count"
colnames(weekly_data)[4] = "Category"

# Ensure Count is an integer
weekly_data$Count &lt;- as.integer(weekly_data$Count)

#   Generate the images

#  Keep track of the running total
running_total &lt;- data.frame(
    file_extensions,
    file_categories,
    total = integer(length(file_extensions)), 
    stringsAsFactors=FALSE)

#   Loop through the weeks
for(week in weeks) {
    #   Data used for the output image
    image_data &lt;- subset(weekly_data, Week == week)
    print(week)     #   Keep track of where we are


    # Loop through and add up all the previous uploads of this filetype
    for (ex in file_extensions) {
        current_count &lt;- running_total[ running_total$file_extensions == ex, ]$total
        new_data &lt;- image_data[ image_data$Filetype == ex, ]$Count
        new_total &lt;- current_count + new_data
        running_total[ running_total$file_extensions == ex, ]$total &lt;- new_total
    }


    #   Small images don't render - force the smallest ones to a valid size
    size &lt;- sqrt(sum(running_total$total)) / 2
    if (size &lt; 40) {
        size &lt;- 40
    }

    #   Remove PDF (optional)
    # running_total &lt;- running_total[running_total$file_extensions != "PDF", ]


                                 #  Optional layouts
    layout_style &lt;- "squarified" #"fixed" "squarified" "scol" "srow"


    #   Colour scheme   Closed    Other     Open        Image      PDF
    fill_colours &lt;- c("#f8766d","#00b0f6", "#00bf7d", "#a3a500", "#ff6bf3")


    #   Generate the TreeMap
    map &lt;- ggplot(running_total, 
                  aes(area = total, 
                     label = paste( file_extensions, formatC(total, big.mark = ",") ,sep = "\n" ), 
                     subgroup = Category, fill=Category)) +
        geom_treemap(layout = layout_style, size = 2, color = "white") + #  Border of internal rectangles
        scale_fill_manual(values = fill_colours)+
        geom_treemap_text(colour = "white", place = "centre", grow = TRUE, min.size = 0.5, layout = layout_style) +
               geom_treemap_subgroup_border(colour = "white", size = 2, layout = layout_style) +
        ggtitle( paste("Total number of files uploaded to GOV.UK:", week, sep = "\n") )


    file_name &lt;- paste("media/", week, ".png", sep = "")
    ggsave(file_name, 
           map, 
           width  = size, 
           height = size, 
           units  = "mm")
}
</code></pre>

<h2 id="references"><a href="https://shkspr.mobi/blog/2021/08/msc-assignment-2-data-analytics-principles/#references">References</a></h2>

<p><span itemscope="" itemtype="http://schema.org/ScholarlyArticle"><span itemprop="citation"><span itemprop="author" itemscope="" itemtype="http://schema.org/Person"><span itemprop="name"><span itemprop="familyName">Anderson</span><span>, </span><span itemprop="givenName">R. J.</span></span></span> <q><cite itemprop="headline">Florence Nightingale: The Biostatistician</cite></q> <span>(</span><time itemprop="datePublished" datetime="2011">2011</time><span>)</span> <span itemprop="publisher" itemscope="" itemtype="http://schema.org/Organization"><span itemprop="name">CLOCKSS Archive</span></span><span>.</span> <span itemprop="publication" itemscope="" itemtype="http://schema.org/Journal"><span itemprop="name">Molecular Interventions</span></span><span>.</span> <span> Page: </span><span itemprop="pageStart">63</span><span>. </span>DOI: <a itemprop="url" href="https://doi.org/10.1124/mi.11.2.1">https://doi.org/10.1124/mi.11.2.1</a></span></span></p>

<p>Arundel, Z. (2018) <em>The Write Stuff: how we used AI to help us handle correspondence - Department for Transport digital</em>. Available at: <a href="https://dftdigital.blog.gov.uk/2018/04/09/the-write-stuff-how-we-used-ai-to-help-us-handle-correspondence/"></a><a href="https://dftdigital.blog.gov.uk/2018/04/09/the-write-stuff-how-we-used-ai-to-help-us-handle-correspondence/">https://dftdigital.blog.gov.uk/2018/04/09/the-write-stuff-how-we-used-ai-to-help-us-handle-correspondence/</a> (Accessed: 16 June 2021).</p>

<p><span itemscope="" itemtype="http://schema.org/ScholarlyArticle"><span itemprop="citation"><span itemprop="author" itemscope="" itemtype="http://schema.org/Person"><span itemprop="name"><span itemprop="familyName">Beel</span><span>, </span><span itemprop="givenName">Joeran</span></span></span><span> &amp; </span><span itemprop="author" itemscope="" itemtype="http://schema.org/Person"><span itemprop="name"><span itemprop="familyName">Gipp</span><span>, </span><span itemprop="givenName">Bela</span></span></span><span> &amp; </span><span itemprop="author" itemscope="" itemtype="http://schema.org/Person"><span itemprop="name"><span itemprop="familyName">Langer</span><span>, </span><span itemprop="givenName">Stefan</span></span></span><span> &amp; </span><span itemprop="author" itemscope="" itemtype="http://schema.org/Person"><span itemprop="name"><span itemprop="familyName">Breitinger</span><span>, </span><span itemprop="givenName">Corinna</span></span></span> <q><cite itemprop="headline">Research-paper recommender systems: a literature survey</cite></q> <span>(</span><time itemprop="datePublished" datetime="2015">2015</time><span>)</span> <span itemprop="publisher" itemscope="" itemtype="http://schema.org/Organization"><span itemprop="name">Springer Science and Business Media LLC</span></span><span>.</span> <span itemprop="publication" itemscope="" itemtype="http://schema.org/Journal"><span itemprop="name">International Journal on Digital Libraries</span></span><span>.</span> <span> Page: </span><span itemprop="pageStart">305</span><span>. </span>DOI: <a itemprop="url" href="https://doi.org/10.1007/s00799-015-0156-0">https://doi.org/10.1007/s00799-015-0156-0</a></span></span></p>

<p><span itemscope="" itemtype="http://schema.org/ScholarlyArticle"><span itemprop="citation"><span itemprop="author" itemscope="" itemtype="http://schema.org/Person"><span itemprop="name"><span itemprop="familyName">Bruls</span><span>, </span><span itemprop="givenName">Mark</span></span></span><span> &amp; </span><span itemprop="author" itemscope="" itemtype="http://schema.org/Person"><span itemprop="name"><span itemprop="familyName">Huizing</span><span>, </span><span itemprop="givenName">Kees</span></span></span><span> &amp; </span><span itemprop="author" itemscope="" itemtype="http://schema.org/Person"><span itemprop="name"><span itemprop="familyName">van Wijk</span><span>, </span><span itemprop="givenName">Jarke J.</span></span></span> <q><cite itemprop="headline">Squarified Treemaps</cite></q> <span>(</span><time itemprop="datePublished" datetime="2021">2021</time><span>)</span> <span itemprop="publisher" itemscope="" itemtype="http://schema.org/Organization"><span itemprop="name">Springer Science and Business Media LLC</span></span><span>.</span> <span itemprop="publication" itemscope="" itemtype="http://schema.org/BookSeries"><span itemprop="name">Eurographics</span></span><span>.</span> DOI: <a itemprop="url" href="https://doi.org/10.1007/978-3-7091-6783-0_4">https://doi.org/10.1007/978-3-7091-6783-0_4</a></span></span></p>

<p>Busch, P. <em>et al.</em> (2014) ‘A study of government cloud adoption: The Australian context’. Available at: <a href="https://opus.lib.uts.edu.au/handle/10453/121604"></a><a href="https://opus.lib.uts.edu.au/handle/10453/121604">https://opus.lib.uts.edu.au/handle/10453/121604</a> (Accessed: 6 June 2021).</p>

<p>Cabinet Office (2011) <em>Drafting answers to parliamentary questions: guidance</em>, <em>GOV.UK</em>. Available at: <a href="https://www.gov.uk/government/publications/drafting-answers-to-parliamentary-questions-guidance"></a><a href="https://www.gov.uk/government/publications/drafting-answers-to-parliamentary-questions-guidance">https://www.gov.uk/government/publications/drafting-answers-to-parliamentary-questions-guidance</a> (Accessed: 6 June 2021).</p>

<p>CDDO (2021) <em>Understanding accessibility requirements for public sector bodies</em>, <em>GOV.UK</em>. Available at: <a href="https://www.gov.uk/guidance/accessibility-requirements-for-public-sector-websites-and-apps"></a><a href="https://www.gov.uk/guidance/accessibility-requirements-for-public-sector-websites-and-apps">https://www.gov.uk/guidance/accessibility-requirements-for-public-sector-websites-and-apps</a> (Accessed: 6 June 2021).</p>

<p>CDEI (2021) <em>Local government use of data during the pandemic</em>, <em>GOV.UK</em>. Available at: <a href="https://www.gov.uk/government/publications/local-government-use-of-data-during-the-pandemic"></a><a href="https://www.gov.uk/government/publications/local-government-use-of-data-during-the-pandemic">https://www.gov.uk/government/publications/local-government-use-of-data-during-the-pandemic</a> (Accessed: 6 June 2021).</p>

<p>Cleveland, W. S. (1985) <em>The elements of graphing data</em>. Monterey, Calif: Wadsworth Advanced Books and Software. doi: <a href="https://dl.acm.org/doi/10.5555/4084">10.5555/4084</a></p>

<p><span itemscope="" itemtype="http://schema.org/ScholarlyArticle"><span itemprop="citation"><span itemprop="author" itemscope="" itemtype="http://schema.org/Person"><span itemprop="name"><span itemprop="familyName">Cole</span><span>, </span><span itemprop="givenName">Michael</span></span></span> <q><cite itemprop="headline">Accountability and quasi‐government: The role of parliamentary questions</cite></q> <span>(</span><time itemprop="datePublished" datetime="1999">1999</time><span>)</span> <span itemprop="publisher" itemscope="" itemtype="http://schema.org/Organization"><span itemprop="name">Informa UK Limited</span></span><span>.</span> <span itemprop="publication" itemscope="" itemtype="http://schema.org/Journal"><span itemprop="name">The Journal of Legislative Studies</span></span><span>.</span> <span> Page: </span><span itemprop="pageStart">77</span><span>. </span>DOI: <a itemprop="url" href="https://doi.org/10.1080/13572339908420584">https://doi.org/10.1080/13572339908420584</a></span></span></p>

<p>Cummings, D. (2020) ‘“Two hands are a lot”’, <em>Dominic Cummings’s Blog</em>, 2 January. Available at: <a href="https://dominiccummings.com/2020/01/02/two-hands-are-a-lot-were-hiring-data-scientists-project-managers-policy-experts-assorted-weirdos/"></a><a href="https://dominiccummings.com/2020/01/02/two-hands-are-a-lot-were-hiring-data-scientists-project-managers-policy-experts-assorted-weirdos/">https://dominiccummings.com/2020/01/02/two-hands-are-a-lot-were-hiring-data-scientists-project-managers-policy-experts-assorted-weirdos/</a> (Accessed: 18 April 2021).</p>

<p><em>Data Protection Act</em> (2018). Queen’s Printer of Acts of Parliament. Available at: <a href="https://www.legislation.gov.uk/ukpga/2018/12/contents/enacted"></a><a href="https://www.legislation.gov.uk/ukpga/2018/12/contents/enacted">https://www.legislation.gov.uk/ukpga/2018/12/contents/enacted</a> (Accessed: 6 June 2021).</p>

<p><span itemscope="" itemtype="http://schema.org/ScholarlyArticle"><span itemprop="citation"><span itemprop="author" itemscope="" itemtype="http://schema.org/Person"><span itemprop="name"><span itemprop="familyName">Dean</span><span>, </span><span itemprop="givenName">Jeffrey</span></span></span><span> &amp; </span><span itemprop="author" itemscope="" itemtype="http://schema.org/Person"><span itemprop="name"><span itemprop="familyName">Ghemawat</span><span>, </span><span itemprop="givenName">Sanjay</span></span></span> <q><cite itemprop="headline">MapReduce</cite></q> <span>(</span><time itemprop="datePublished" datetime="2008">2008</time><span>)</span> <span itemprop="publisher" itemscope="" itemtype="http://schema.org/Organization"><span itemprop="name">Association for Computing Machinery (ACM)</span></span><span>.</span> <span itemprop="publication" itemscope="" itemtype="http://schema.org/Journal"><span itemprop="name">Communications of the ACM</span></span><span>.</span> <span> Page: </span><span itemprop="pageStart">107</span><span>. </span>DOI: <a itemprop="url" href="https://doi.org/10.1145/1327452.1327492">https://doi.org/10.1145/1327452.1327492</a></span></span></p>

<p>Dowden, O. (2020) <em>National Data Strategy</em>, <em>GOV.UK</em>. Available at: <a href="https://www.gov.uk/government/publications/uk-national-data-strategy/national-data-strategy"></a><a href="https://www.gov.uk/government/publications/uk-national-data-strategy/national-data-strategy">https://www.gov.uk/government/publications/uk-national-data-strategy/national-data-strategy</a> (Accessed: 12 June 2021).</p>

<p><em>Equality Act</em> (2010). Statute Law Database. Available at: <a href="https://www.legislation.gov.uk/ukpga/2010/15/contents"></a><a href="https://www.legislation.gov.uk/ukpga/2010/15/contents">https://www.legislation.gov.uk/ukpga/2010/15/contents</a> (Accessed: 16 June 2021).</p>

<p>Fetzer, T. and Graeber, T. (2020) <em>Does Contact Tracing Work? Quasi-Experimental Evidence from an Excel Error in England</em>. SSRN Scholarly Paper ID 3753893. Rochester, NY: Social Science Research Network. Available at: <a href="https://papers.ssrn.com/abstract=3753893"></a><a href="https://papers.ssrn.com/abstract=3753893">https://papers.ssrn.com/abstract=3753893</a> (Accessed: 12 June 2021).</p>

<p><em>Freedom of Information Act</em> (2000). Statute Law Database. Available at: <a href="https://www.legislation.gov.uk/ukpga/2000/36/contents"></a><a href="https://www.legislation.gov.uk/ukpga/2000/36/contents">https://www.legislation.gov.uk/ukpga/2000/36/contents</a> (Accessed: 6 June 2021).</p>

<p>GDS (2019) <em>GOV.UK Taxonomy principles</em>, <em>GOV.UK</em>. Available at: <a href="https://www.gov.uk/government/publications/govuk-topic-taxonomy-principles/govuk-taxonomy-principles"></a><a href="https://www.gov.uk/government/publications/govuk-topic-taxonomy-principles/govuk-taxonomy-principles">https://www.gov.uk/government/publications/govuk-topic-taxonomy-principles/govuk-taxonomy-principles</a> (Accessed: 6 June 2021).</p>

<p>GDS (2021) <em>alphagov/govuk-developer-docs</em>. Available at: <a href="https://github.com/alphagov/govuk-developer-docs/blob/faabf3ecceed0443db1d5243feecfd6d8ca4b0f8/source/manual/taxonomy.html.md"></a><a href="https://github.com/alphagov/govuk-developer-docs/blob/faabf3ecceed0443db1d5243feecfd6d8ca4b0f8/source/manual/taxonomy.html.md">https://github.com/alphagov/govuk-developer-docs/blob/faabf3ecceed0443db1d5243feecfd6d8ca4b0f8/source/manual/taxonomy.html.md</a> (Accessed: 6 June 2021).</p>

<p>Gibbs, G. (1998) ‘Learning by Doing’, p. 134.</p>

<p><span itemscope="" itemtype="http://schema.org/ScholarlyArticle"><span itemprop="citation"><span itemprop="author" itemscope="" itemtype="http://schema.org/Person"><span itemprop="name"><span itemprop="familyName">Gray</span><span>, </span><span itemprop="givenName">J.</span></span></span><span> &amp; </span><span itemprop="author" itemscope="" itemtype="http://schema.org/Person"><span itemprop="name"><span itemprop="familyName">Bosworth</span><span>, </span><span itemprop="givenName">A.</span></span></span><span> &amp; </span><span itemprop="author" itemscope="" itemtype="http://schema.org/Person"><span itemprop="name"><span itemprop="familyName">Lyaman</span><span>, </span><span itemprop="givenName">A.</span></span></span><span> &amp; </span><span itemprop="author" itemscope="" itemtype="http://schema.org/Person"><span itemprop="name"><span itemprop="familyName">Pirahesh</span><span>, </span><span itemprop="givenName">H.</span></span></span> <q><cite itemprop="headline">Data cube: a relational aggregation operator generalizing GROUP-BY, CROSS-TAB, and SUB-TOTALS</cite></q> <span>(</span><time itemprop="datePublished" datetime="2002">2002</time><span>)</span> <span itemprop="publisher" itemscope="" itemtype="http://schema.org/Organization"><span itemprop="name">Institute of Electrical and Electronics Engineers (IEEE)</span></span><span>.</span> DOI: <a itemprop="url" href="https://doi.org/10.1109/icde.1996.492099">https://doi.org/10.1109/icde.1996.492099</a></span></span></p>

<p>IANA (2021) <em>Media Types</em>. Available at: <a href="https://www.iana.org/assignments/media-types/media-types.xhtml"></a><a href="https://www.iana.org/assignments/media-types/media-types.xhtml">https://www.iana.org/assignments/media-types/media-types.xhtml</a> (Accessed: 12 June 2021).</p>

<p>Inmon, W. (2005) <em>Building the Data Warehouse, 4th Edition | Wiley</em>, <em>Wiley.com</em>. Available at: <a href="https://www.wiley.com/en-gb/Building+the+Data+Warehouse%2C+4th+Edition-p-9780764599446"></a><a href="https://www.wiley.com/en-gb/Building+the+Data+Warehouse%2C+4th+Edition-p-9780764599446">https://www.wiley.com/en-gb/Building+the+Data+Warehouse%2C+4th+Edition-p-9780764599446</a> (Accessed: 12 June 2021).</p>

<p>ISO (2011) <em>ISO 25964-1:2011</em>, <em>ISO</em>. Available at: <a href="https://www.iso.org/cms/render/live/en/sites/isoorg/contents/data/standard/05/36/53657.html"></a><a href="https://www.iso.org/cms/render/live/en/sites/isoorg/contents/data/standard/05/36/53657.html">https://www.iso.org/cms/render/live/en/sites/isoorg/contents/data/standard/05/36/53657.html</a> (Accessed: 6 June 2021).</p>

<p>Johnson, B. (2021) <em>Declaration on Government Reform</em>, <em>GOV.UK</em>. Available at: <a href="https://www.gov.uk/government/publications/declaration-on-government-reform"></a><a href="https://www.gov.uk/government/publications/declaration-on-government-reform">https://www.gov.uk/government/publications/declaration-on-government-reform</a> (Accessed: 15 June 2021).</p>

<p>Johnson, B. S. (1993) ‘Title of Dissertation: Treemaps: Visualizing Hierarchical and Categorical Data’, p. 272.</p>

<p><span itemscope="" itemtype="http://schema.org/ScholarlyArticle"><span itemprop="citation"><span itemprop="author" itemscope="" itemtype="http://schema.org/Person"><span itemprop="name"><span itemprop="familyName">Kaper</span><span>, </span><span itemprop="givenName">H.G.</span></span></span><span> &amp; </span><span itemprop="author" itemscope="" itemtype="http://schema.org/Person"><span itemprop="name"><span itemprop="familyName">Wiebel</span><span>, </span><span itemprop="givenName">E.</span></span></span><span> &amp; </span><span itemprop="author" itemscope="" itemtype="http://schema.org/Person"><span itemprop="name"><span itemprop="familyName">Tipei</span><span>, </span><span itemprop="givenName">S.</span></span></span> <q><cite itemprop="headline">Data sonification and sound visualization</cite></q> <span>(</span><time itemprop="" datetime=""></time><span>)</span> <span itemprop="publisher" itemscope="" itemtype="http://schema.org/Organization"><span itemprop="name">Institute of Electrical and Electronics Engineers (IEEE)</span></span><span>.</span> <span itemprop="publication" itemscope="" itemtype="http://schema.org/Journal"><span itemprop="name">Computing in Science &amp; Engineering</span></span><span>.</span> <span> Page: </span><span itemprop="pageStart">48</span><span>. </span>DOI: <a itemprop="url" href="https://doi.org/10.1109/5992.774840">https://doi.org/10.1109/5992.774840</a></span></span></p>

<p><span itemscope="" itemtype="http://schema.org/ScholarlyArticle"><span itemprop="citation"><span itemprop="author" itemscope="" itemtype="http://schema.org/Person"><span itemprop="name"><span itemprop="familyName">Kepner</span><span>, </span><span itemprop="givenName">Jeremy</span></span></span><span> &amp; </span><span itemprop="author" itemscope="" itemtype="http://schema.org/Person"><span itemprop="name"><span itemprop="familyName">Gadepally</span><span>, </span><span itemprop="givenName">Vijay</span></span></span><span> &amp; </span><span itemprop="author" itemscope="" itemtype="http://schema.org/Person"><span itemprop="name"><span itemprop="familyName">Michaleas</span><span>, </span><span itemprop="givenName">Pete</span></span></span><span> &amp; </span><span itemprop="author" itemscope="" itemtype="http://schema.org/Person"><span itemprop="name"><span itemprop="familyName">Schear</span><span>, </span><span itemprop="givenName">Nabil</span></span></span><span> &amp; </span><span itemprop="author" itemscope="" itemtype="http://schema.org/Person"><span itemprop="name"><span itemprop="familyName">Varia</span><span>, </span><span itemprop="givenName">Mayank</span></span></span><span> &amp; </span><span itemprop="author" itemscope="" itemtype="http://schema.org/Person"><span itemprop="name"><span itemprop="familyName">Yerukhimovich</span><span>, </span><span itemprop="givenName">Arkady</span></span></span><span> &amp; </span><span itemprop="author" itemscope="" itemtype="http://schema.org/Person"><span itemprop="name"><span itemprop="familyName">Cunningham</span><span>, </span><span itemprop="givenName">Robert K.</span></span></span> <q><cite itemprop="headline">Computing on masked data: a high performance method for improving big data veracity</cite></q> <span>(</span><time itemprop="datePublished" datetime="2014">2014</time><span>)</span> <span itemprop="publisher" itemscope="" itemtype="http://schema.org/Organization"><span itemprop="name">Institute of Electrical and Electronics Engineers (IEEE)</span></span><span>.</span> DOI: <a itemprop="url" href="https://doi.org/10.1109/hpec.2014.7040946">https://doi.org/10.1109/hpec.2014.7040946</a></span></span></p>

<p>Kirk, A. (2021) <em>Data Visualisation</em>, <em>SAGE Publications Ltd</em>. Available at: <a href="https://uk.sagepub.com/en-gb/eur/data-visualisation/book266150"></a><a href="https://uk.sagepub.com/en-gb/eur/data-visualisation/book266150">https://uk.sagepub.com/en-gb/eur/data-visualisation/book266150</a> (Accessed: 6 June 2021).</p>

<p>Koster, R. (2009) <em>Database “sharding” came from UO?</em>, <em>Raph’s Website</em>. Available at: <a href="https://www.raphkoster.com/2009/01/08/database-sharding-came-from-uo/"></a><a href="https://www.raphkoster.com/2009/01/08/database-sharding-came-from-uo/">https://www.raphkoster.com/2009/01/08/database-sharding-came-from-uo/</a> (Accessed: 20 June 2021).</p>

<p><span itemscope="" itemtype="http://schema.org/ScholarlyArticle"><span itemprop="citation"><span itemprop="author" itemscope="" itemtype="http://schema.org/Person"><span itemprop="name"><span itemprop="familyName">Lambiotte</span><span>, </span><span itemprop="givenName">Renaud</span></span></span><span> &amp; </span><span itemprop="author" itemscope="" itemtype="http://schema.org/Person"><span itemprop="name"><span itemprop="familyName">Ausloos</span><span>, </span><span itemprop="givenName">Marcel</span></span></span> <q><cite itemprop="headline">Collaborative Tagging as a Tripartite Network</cite></q> <span>(</span><time itemprop="datePublished" datetime="2021">2021</time><span>)</span> <span itemprop="publisher" itemscope="" itemtype="http://schema.org/Organization"><span itemprop="name">Springer Science and Business Media LLC</span></span><span>.</span> <span itemprop="" itemscope="" itemtype="http://schema.org/BookSeries"><span itemprop="name">Lecture Notes in Computer Science</span></span><span>.</span> DOI: <a itemprop="url" href="https://doi.org/10.1007/11758532_152">https://doi.org/10.1007/11758532_152</a></span></span></p>

<p><span itemscope="" itemtype="http://schema.org/ScholarlyArticle"><span itemprop="citation"><span itemprop="author" itemscope="" itemtype="http://schema.org/Person"><span itemprop="name"><span itemprop="familyName">Long</span><span>, </span><span itemprop="givenName">Lim Kian</span></span></span><span> &amp; </span><span itemprop="author" itemscope="" itemtype="http://schema.org/Person"><span itemprop="name"><span itemprop="familyName">Hui</span><span>, </span><span itemprop="givenName">Lim Chien</span></span></span><span> &amp; </span><span itemprop="author" itemscope="" itemtype="http://schema.org/Person"><span itemprop="name"><span itemprop="familyName">Fook</span><span>, </span><span itemprop="givenName">Gim Yeong</span></span></span><span> &amp; </span><span itemprop="author" itemscope="" itemtype="http://schema.org/Person"><span itemprop="name"><span itemprop="familyName">Wan Zainon</span><span>, </span><span itemprop="givenName">Wan Mohd Nazmee</span></span></span> <q><cite itemprop="headline">A Study on the Effectiveness of Tree-Maps as Tree Visualization Techniques</cite></q> <span>(</span><time itemprop="datePublished" datetime="2021">2021</time><span>)</span> <span itemprop="publisher" itemscope="" itemtype="http://schema.org/Organization"><span itemprop="name">Elsevier BV</span></span><span>.</span> <span itemprop="publication" itemscope="" itemtype="http://schema.org/Journal"><span itemprop="name">Procedia Computer Science</span></span><span>.</span> <span> Page: </span><span itemprop="pageStart">108</span><span>. </span>DOI: <a itemprop="url" href="https://doi.org/10.1016/j.procs.2017.12.136">https://doi.org/10.1016/j.procs.2017.12.136</a></span></span></p>

<p>National Archives (2019) ‘Freedom of Information exemptions’, <a href="https://www.nationalarchives.gov.uk/documents/information-management/freedom-of-information-exemptions.pdf"></a><a href="https://www.nationalarchives.gov.uk/documents/information-management/freedom-of-information-exemptions.pdf">https://www.nationalarchives.gov.uk/documents/information-management/freedom-of-information-exemptions.pdf</a>.</p>

<p>National Eye Institute (2019) <em>Color Blindness | National Eye Institute</em>. Available at: <a href="https://www.nei.nih.gov/learn-about-eye-health/eye-conditions-and-diseases/color-blindness"></a><a href="https://www.nei.nih.gov/learn-about-eye-health/eye-conditions-and-diseases/color-blindness">https://www.nei.nih.gov/learn-about-eye-health/eye-conditions-and-diseases/color-blindness</a> (Accessed: 13 June 2021).</p>

<p>National Information Standards Organization (2010) <em>ANSI/NISO Z39.19-2005 (R2010) Guidelines for the Construction, Format, and Management of Monolingual Controlled Vocabularies | NISO website</em>. Available at: <a href="https://www.niso.org/publications/ansiniso-z3919-2005-r2010"></a><a href="https://www.niso.org/publications/ansiniso-z3919-2005-r2010">https://www.niso.org/publications/ansiniso-z3919-2005-r2010</a> (Accessed: 6 June 2021).</p>

<p>Noble, S. U. (2018) <em>Algorithms of oppression: how search engines reinforce racism</em>. Available at: <a href="https://www.degruyter.com/isbn/9781479833641"></a><a href="https://www.degruyter.com/isbn/9781479833641">https://www.degruyter.com/isbn/9781479833641</a> (Accessed: 16 June 2021).</p>

<p><span itemscope="" itemtype="http://schema.org/ScholarlyArticle"><span itemprop="citation"><span itemprop="author" itemscope="" itemtype="http://schema.org/Person"><span itemprop="name"><span itemprop="familyName">Pearson</span><span>, </span><span itemprop="givenName">K</span></span></span> <q><cite itemprop="headline">VII. Note on regression and inheritance in the case of two parents</cite></q> <span>(</span><time itemprop="datePublished" datetime="1895">1895</time><span>)</span> <span itemprop="publisher" itemscope="" itemtype="http://schema.org/Organization"><span itemprop="name">The Royal Society</span></span><span>.</span> <span itemprop="publication" itemscope="" itemtype="http://schema.org/Journal"><span itemprop="name">Proceedings of the Royal Society of London</span></span><span>.</span> <span> Page: </span><span itemprop="pageStart">240</span><span>. </span>DOI: <a itemprop="url" href="https://doi.org/10.1098/rspl.1895.0041">https://doi.org/10.1098/rspl.1895.0041</a></span></span></p>

<p><span itemscope="" itemtype="http://schema.org/ScholarlyArticle"><span itemprop="citation"><span itemprop="author" itemscope="" itemtype="http://schema.org/Person"><link itemprop="url" href="https://orcid.org/0000-0003-2306-2792"><span itemprop="name"><span itemprop="familyName">Ray</span><span>, </span><span itemprop="givenName">Partha Pratim</span></span></span> <q><cite itemprop="headline">An Introduction to Dew Computing: Definition, Concept and Implications</cite></q> <span>(</span><time itemprop="datePublished" datetime="2021">2021</time><span>)</span> <span itemprop="publisher" itemscope="" itemtype="http://schema.org/Organization"><span itemprop="name">Institute of Electrical and Electronics Engineers (IEEE)</span></span><span>.</span> <span itemprop="publication" itemscope="" itemtype="http://schema.org/Journal"><span itemprop="name">IEEE Access</span></span><span>.</span> <span> Page: </span><span itemprop="pageStart">723</span><span>. </span>DOI: <a itemprop="url" href="https://doi.org/10.1109/access.2017.2775042">https://doi.org/10.1109/access.2017.2775042</a></span></span></p>

<p>Rosling, H. (2019) ‘World Health Chart | Gapminder’. Available at: <a href="https://www.gapminder.org/fw/world-health-chart/"></a><a href="https://www.gapminder.org/fw/world-health-chart/">https://www.gapminder.org/fw/world-health-chart/</a> (Accessed: 12 June 2021).</p>

<p>Sadalage, P. J. and Fowler, M. (2012) <em>NoSQL Distilled: A Brief Guide to the Emerging World of Polyglot Persistence</em>. 1st edn. Addison-Wesley Professional.</p>

<p><span itemscope="" itemtype="http://schema.org/ScholarlyArticle"><span itemprop="citation"><span itemprop="author" itemscope="" itemtype="http://schema.org/Person"><span itemprop="name"><span itemprop="familyName">Shadbolt</span><span>, </span><span itemprop="givenName">Nigel</span></span></span><span> &amp; </span><span itemprop="author" itemscope="" itemtype="http://schema.org/Person"><span itemprop="name"><span itemprop="familyName">O'Hara</span><span>, </span><span itemprop="givenName">Kieron</span></span></span><span> &amp; </span><span itemprop="author" itemscope="" itemtype="http://schema.org/Person"><span itemprop="name"><span itemprop="familyName">Berners-Lee</span><span>, </span><span itemprop="givenName">Tim</span></span></span><span> &amp; </span><span itemprop="author" itemscope="" itemtype="http://schema.org/Person"><span itemprop="name"><span itemprop="familyName">Gibbins</span><span>, </span><span itemprop="givenName">Nicholas</span></span></span><span> &amp; </span><span itemprop="author" itemscope="" itemtype="http://schema.org/Person"><span itemprop="name"><span itemprop="familyName">Glaser</span><span>, </span><span itemprop="givenName">Hugh</span></span></span><span> &amp; </span><span itemprop="author" itemscope="" itemtype="http://schema.org/Person"><span itemprop="name"><span itemprop="familyName">Hall</span><span>, </span><span itemprop="givenName">Wendy</span></span></span><span> &amp; </span><span itemprop="author" itemscope="" itemtype="http://schema.org/Person"><span itemprop="name"><span itemprop="familyName">schraefel</span><span>, </span><span itemprop="givenName">m.c.</span></span></span> <q><cite itemprop="headline">Linked Open Government Data: Lessons from Data.gov.uk</cite></q> <span>(</span><time itemprop="datePublished" datetime="2012">2012</time><span>)</span> <span itemprop="publisher" itemscope="" itemtype="http://schema.org/Organization"><span itemprop="name">Institute of Electrical and Electronics Engineers (IEEE)</span></span><span>.</span> <span itemprop="publication" itemscope="" itemtype="http://schema.org/Journal"><span itemprop="name">IEEE Intelligent Systems</span></span><span>.</span> <span> Page: </span><span itemprop="pageStart">16</span><span>. </span>DOI: <a itemprop="url" href="https://doi.org/10.1109/mis.2012.23">https://doi.org/10.1109/mis.2012.23</a></span></span></p>

<p><span itemscope="" itemtype="http://schema.org/ScholarlyArticle"><span itemprop="citation"><span itemprop="author" itemscope="" itemtype="http://schema.org/Person"><span itemprop="name"><span itemprop="familyName">Shah</span><span>, </span><span itemprop="givenName">Hetan</span></span></span> <q><cite itemprop="headline">Global problems need social science</cite></q> <span>(</span><time itemprop="datePublished" datetime="2020">2020</time><span>)</span> <span itemprop="publisher" itemscope="" itemtype="http://schema.org/Organization"><span itemprop="name">Springer Science and Business Media LLC</span></span><span>.</span> <span itemprop="publication" itemscope="" itemtype="http://schema.org/Journal"><span itemprop="name">Nature</span></span><span>.</span> <span> Page: </span><span itemprop="pageStart">295</span><span>. </span>DOI: <a itemprop="url" href="https://doi.org/10.1038/d41586-020-00064-x">https://doi.org/10.1038/d41586-020-00064-x</a></span></span></p>

<p><span itemscope="" itemtype="http://schema.org/ScholarlyArticle"><span itemprop="citation"><span itemprop="author" itemscope="" itemtype="http://schema.org/Person"><span itemprop="name"><span itemprop="familyName">Shneiderman</span><span>, </span><span itemprop="givenName">Ben</span></span></span> <q><cite itemprop="headline">Tree visualization with tree-maps</cite></q> <span>(</span><time itemprop="datePublished" datetime="1992">1992</time><span>)</span> <span itemprop="publisher" itemscope="" itemtype="http://schema.org/Organization"><span itemprop="name">Association for Computing Machinery (ACM)</span></span><span>.</span> <span itemprop="publication" itemscope="" itemtype="http://schema.org/Journal"><span itemprop="name">ACM Transactions on Graphics</span></span><span>.</span> <span> Page: </span><span itemprop="pageStart">92</span><span>. </span>DOI: <a itemprop="url" href="https://doi.org/10.1145/102377.115768">https://doi.org/10.1145/102377.115768</a></span></span></p>

<p>Skomoroch, P. (2009) ‘@jakehofman was pondering a blog post on that, often people contact me about “big data” where big = slightly larger than can fit in excel :)’, <em>@peteskomoroch</em>, 6 March. Available at: <a href="https://twitter.com/peteskomoroch/status/1290703113"></a><a href="https://twitter.com/peteskomoroch/status/1290703113">https://twitter.com/peteskomoroch/status/1290703113</a> (Accessed: 23 June 2021).</p>

<p><span itemscope="" itemtype="http://schema.org/ScholarlyArticle"><span itemprop="citation"><span itemprop="author" itemscope="" itemtype="http://schema.org/Person"><span itemprop="name"><span itemprop="familyName">SPARCK JONES</span><span>, </span><span itemprop="givenName">KAREN</span></span></span> <q><cite itemprop="headline">A STATISTICAL INTERPRETATION OF TERM SPECIFICITY AND ITS APPLICATION IN RETRIEVAL</cite></q> <span>(</span><time itemprop="datePublished" datetime="1972">1972</time><span>)</span> <span itemprop="publisher" itemscope="" itemtype="http://schema.org/Organization"><span itemprop="name">Emerald</span></span><span>.</span> <span itemprop="publication" itemscope="" itemtype="http://schema.org/Journal"><span itemprop="name">Journal of Documentation</span></span><span>.</span> <span> Page: </span><span itemprop="pageStart">11</span><span>. </span>DOI: <a itemprop="url" href="https://doi.org/10.1108/eb026526">https://doi.org/10.1108/eb026526</a></span></span></p>

<p><span itemscope="" itemtype="http://schema.org/ScholarlyArticle"><span itemprop="citation"><span itemprop="author" itemscope="" itemtype="http://schema.org/Person"><span itemprop="name"><span itemprop="familyName">Tewari</span><span>, </span><span itemprop="givenName">Rajiv</span></span></span><span> &amp; </span><span itemprop="author" itemscope="" itemtype="http://schema.org/Person"><span itemprop="name"><span itemprop="familyName">Adam</span><span>, </span><span itemprop="givenName">Nabil R</span></span></span> <q><cite itemprop="headline">Using semantic knowledge of transactions to improve recovery and availability of replicated data</cite></q> <span>(</span><time itemprop="datePublished" datetime="1992">1992</time><span>)</span> <span itemprop="publisher" itemscope="" itemtype="http://schema.org/Organization"><span itemprop="name">Elsevier BV</span></span><span>.</span> <span itemprop="publication" itemscope="" itemtype="http://schema.org/Journal"><span itemprop="name">Information Systems</span></span><span>.</span> <span> Page: </span><span itemprop="pageStart">477</span><span>. </span>DOI: <a itemprop="url" href="https://doi.org/10.1016/0306-4379(92)90027-k">https://doi.org/10.1016/0306-4379(92)90027-k</a></span></span></p>

<p><span itemscope="" itemtype="http://schema.org/ScholarlyArticle"><span itemprop="citation"><span itemprop="author" itemscope="" itemtype="http://schema.org/Person"><span itemprop="name"><span itemprop="familyName">Theodorou</span><span>, </span><span itemprop="givenName">Vasileios</span></span></span><span> &amp; </span><span itemprop="author" itemscope="" itemtype="http://schema.org/Person"><span itemprop="name"><span itemprop="familyName">Abelló</span><span>, </span><span itemprop="givenName">Alberto</span></span></span><span> &amp; </span><span itemprop="author" itemscope="" itemtype="http://schema.org/Person"><span itemprop="name"><span itemprop="familyName">Thiele</span><span>, </span><span itemprop="givenName">Maik</span></span></span><span> &amp; </span><span itemprop="author" itemscope="" itemtype="http://schema.org/Person"><span itemprop="name"><span itemprop="familyName">Lehner</span><span>, </span><span itemprop="givenName">Wolfgang</span></span></span> <q><cite itemprop="headline">Frequent patterns in ETL workflows: An empirical approach</cite></q> <span>(</span><time itemprop="datePublished" datetime="2017">2017</time><span>)</span> <span itemprop="publisher" itemscope="" itemtype="http://schema.org/Organization"><span itemprop="name">Elsevier BV</span></span><span>.</span> <span itemprop="publication" itemscope="" itemtype="http://schema.org/Journal"><span itemprop="name">Data &amp; Knowledge Engineering</span></span><span>.</span> <span> Page: </span><span itemprop="pageStart">1</span><span>. </span>DOI: <a itemprop="url" href="https://doi.org/10.1016/j.datak.2017.08.004">https://doi.org/10.1016/j.datak.2017.08.004</a></span></span></p>

<p>Vander Wal, T. (2004) <em>Folksonomy :: vanderwal.net</em>. Available at: <a href="https://vanderwal.net/folksonomy.html"></a><a href="https://vanderwal.net/folksonomy.html">https://vanderwal.net/folksonomy.html</a> (Accessed: 12 June 2021).</p>

<p>Véliz, C. (2020) <em>Privacy is power: why and how you should take back control of your data</em>.</p>

<p>vlntn, J. (2009) <em>English:&nbsp; Data Warehouse Feeding Data Marts</em>. Available at: <a href="https://commons.wikimedia.org/wiki/File:Data_Warehouse_Feeding_Data_Mart.jpg"></a><a href="https://commons.wikimedia.org/wiki/File:Data_Warehouse_Feeding_Data_Mart.jpg">https://commons.wikimedia.org/wiki/File:Data_Warehouse_Feeding_Data_Mart.jpg</a> (Accessed: 16 June 2021).</p>

<p>W3C (2018) <em>Web Content Accessibility Guidelines (WCAG) 2.1</em>. Available at: <a href="https://www.w3.org/TR/WCAG21/"></a><a href="https://www.w3.org/TR/WCAG21/">https://www.w3.org/TR/WCAG21/</a> (Accessed: 6 June 2021).</p>

<p><em>Welsh Language Act</em> (1993). Statute Law Database. Available at: <a href="https://www.legislation.gov.uk/ukpga/1993/38/contents"></a><a href="https://www.legislation.gov.uk/ukpga/1993/38/contents">https://www.legislation.gov.uk/ukpga/1993/38/contents</a> (Accessed: 6 June 2021).</p>

<p>Williams, N. (2018) <em>Why GOV.UK content should be published in HTML and not PDF - Government Digital Service</em>. Available at: <a href="https://gds.blog.gov.uk/2018/07/16/why-gov-uk-content-should-be-published-in-html-and-not-pdf/"></a><a href="https://gds.blog.gov.uk/2018/07/16/why-gov-uk-content-should-be-published-in-html-and-not-pdf/">https://gds.blog.gov.uk/2018/07/16/why-gov-uk-content-should-be-published-in-html-and-not-pdf/</a> (Accessed: 14 June 2021).</p>

<p><span itemscope="" itemtype="http://schema.org/ScholarlyArticle"><span itemprop="citation"><span itemprop="author" itemscope="" itemtype="http://schema.org/Person"><span itemprop="name"><span itemprop="familyName">Wolpert</span><span>, </span><span itemprop="givenName">D.H.</span></span></span><span> &amp; </span><span itemprop="author" itemscope="" itemtype="http://schema.org/Person"><span itemprop="name"><span itemprop="familyName">Macready</span><span>, </span><span itemprop="givenName">W.G.</span></span></span> <q><cite itemprop="headline">No free lunch theorems for optimization</cite></q> <span>(</span><time itemprop="datePublished" datetime="1997">1997</time><span>)</span> <span itemprop="publisher" itemscope="" itemtype="http://schema.org/Organization"><span itemprop="name">Institute of Electrical and Electronics Engineers (IEEE)</span></span><span>.</span> <span itemprop="publication" itemscope="" itemtype="http://schema.org/Journal"><span itemprop="name">IEEE Transactions on Evolutionary Computation</span></span><span>.</span> <span> Page: </span><span itemprop="pageStart">67</span><span>. </span>DOI: <a itemprop="url" href="https://doi.org/10.1109/4235.585893">https://doi.org/10.1109/4235.585893</a></span></span></p>

<p><span itemscope="" itemtype="http://schema.org/ScholarlyArticle"><span itemprop="citation"><span itemprop="author" itemscope="" itemtype="http://schema.org/Person"><span itemprop="name"><span itemprop="familyName">Xu</span><span>, </span><span itemprop="givenName">R.</span></span></span><span> &amp; </span><span itemprop="author" itemscope="" itemtype="http://schema.org/Person"><span itemprop="name"><span itemprop="familyName">WunschII</span><span>, </span><span itemprop="givenName">D.</span></span></span> <q><cite itemprop="headline">Survey of Clustering Algorithms</cite></q> <span>(</span><time itemprop="datePublished" datetime="2005">2005</time><span>)</span> <span itemprop="publisher" itemscope="" itemtype="http://schema.org/Organization"><span itemprop="name">Institute of Electrical and Electronics Engineers (IEEE)</span></span><span>.</span> <span itemprop="publication" itemscope="" itemtype="http://schema.org/Journal"><span itemprop="name">IEEE Transactions on Neural Networks</span></span><span>.</span> <span> Page: </span><span itemprop="pageStart">645</span><span>. </span>DOI: <a itemprop="url" href="https://doi.org/10.1109/tnn.2005.845141">https://doi.org/10.1109/tnn.2005.845141</a></span></span></p>

<p>Zachariou et al (2018) <em>How we used deep learning to structure GOV.UK’s content - Data in government</em>. Available at: <a href="https://dataingovernment.blog.gov.uk/2018/10/19/how-we-used-deep-learning-to-structure-gov-uks-content/"></a><a href="https://dataingovernment.blog.gov.uk/2018/10/19/how-we-used-deep-learning-to-structure-gov-uks-content/">https://dataingovernment.blog.gov.uk/2018/10/19/how-we-used-deep-learning-to-structure-gov-uks-content/</a> (Accessed: 6 June 2021).</p>

<p>Zaïane, O. (1999) <em>Glossary of Data Mining Terms</em>. Available at: <a href="https://webdocs.cs.ualberta.ca/~zaiane/courses/cmput690/glossary.html#D"></a><a href="https://webdocs.cs.ualberta.ca/~zaiane/courses/cmput690/glossary.html#D">https://webdocs.cs.ualberta.ca/~zaiane/courses/cmput690/glossary.html#D</a> (Accessed: 13 June 2021).</p>

<p>Zuboff, S. (2020) <em>The age of surveillance capitalism: the fight for a human future at the new frontier of power</em>. First Trade Paperback Edition. New York: PublicAffairs.</p>

<h2 id="copyright-and-copyleft"><a href="https://shkspr.mobi/blog/2021/08/msc-assignment-2-data-analytics-principles/#copyright-and-copyleft">Copyright and Copyleft</a></h2>

<p>This document is 🄯 Terence Eden <a href="https://creativecommons.org/licenses/by-nc/4.0/">CC-BY-NC</a>.</p>

<p>It may not be used or retained in electronic systems for the detection of plagiarism. No part of it may be used for commercial purposes without prior permission.</p>

<p>R code is under the <a href="https://opensource.org/licenses/MIT">MIT Licence</a>.</p>

<p>This document contains public sector information licensed under the <a href="https://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/">Open Government Licence v3.0</a>.</p>
<img src="https://shkspr.mobi/blog/wp-content/themes/edent-wordpress-theme/info/okgo.php?ID=39929&HTTP_REFERER=RSS" alt="" width="1" height="1" loading="eager">]]></content:encoded>
					
					<wfw:commentRss>https://shkspr.mobi/blog/2021/08/msc-assignment-2-data-analytics-principles/feed/</wfw:commentRss>
			<slash:comments>2</slash:comments>
		
		
			</item>
		<item>
		<title><![CDATA[A Decade of Drinking Beer on Untappd]]></title>
		<link>https://shkspr.mobi/blog/2021/07/a-decade-of-drinking-beer-on-untappd/</link>
					<comments>https://shkspr.mobi/blog/2021/07/a-decade-of-drinking-beer-on-untappd/#respond</comments>
				<dc:creator><![CDATA[@edent]]></dc:creator>
		<pubDate>Fri, 30 Jul 2021 11:34:32 +0000</pubDate>
				<category><![CDATA[/etc/]]></category>
		<category><![CDATA[beer]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[r]]></category>
		<category><![CDATA[untappd]]></category>
		<guid isPermaLink="false">https://shkspr.mobi/blog/?p=39658</guid>

					<description><![CDATA[10 years ago, I asked an innocent question on Twitter.  Terence Eden is on Mastodon@edentIs there any service which will let me &#34;check in&#34; to a beer? Because this Chocolate Tom I&#039;m drinking is amazing.❤️ 0💬 2🔁 018:55 - Thu 21 July 2011  The answers came in swiftly - Untappd was the app to use.  So, a few minutes later:  Terence Eden is on Mastodon@edentI just earned the &#039;Newbie&#039; badge on @untappd!…]]></description>
										<content:encoded><![CDATA[<p>10 years ago, I asked an innocent question on Twitter.</p>

<blockquote class="social-embed" id="social-embed-94118367950155776" lang="en" itemscope="" itemtype="https://schema.org/SocialMediaPosting"><header class="social-embed-header" itemprop="author" itemscope="" itemtype="https://schema.org/Person"><a href="https://twitter.com/edent" class="social-embed-user" itemprop="url"><img class="social-embed-avatar social-embed-avatar-circle" src="data:image/webp;base64,UklGRkgBAABXRUJQVlA4IDwBAACQCACdASowADAAPrVQn0ynJCKiJyto4BaJaQAIIsx4Au9dhDqVA1i1RoRTO7nbdyy03nM5FhvV62goUj37tuxqpfpPeTBZvrJ78w0qAAD+/hVyFHvYXIrMCjny0z7wqsB9/QE08xls/AQdXJFX0adG9lISsm6kV96J5FINBFXzHwfzMCr4N6r3z5/Aa/wfEoVGX3H976she3jyS8RqJv7Jw7bOxoTSPlu4gNbfXYZ9TnbdQ0MNnMObyaRQLIu556jIj03zfJrVgqRM8GPwRoWb1M9AfzFe6Mtg13uEIqrTHmiuBpH+bTVB5EEQ3uby0C//XOAPJOFv4QV8RZDPQd517Khyba8Jlr97j2kIBJD9K3mbOHSHiQDasj6Y3forATbIg4QZHxWnCeqqMkVYfUAivuL0L/68mMnagAAA" alt="" itemprop="image"><div class="social-embed-user-names"><p class="social-embed-user-names-name" itemprop="name">Terence Eden is on Mastodon</p>@edent</div></a><img class="social-embed-logo" alt="Twitter" src="data:image/svg+xml,%3Csvg%20xmlns%3D%22http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%22%0Aaria-label%3D%22Twitter%22%20role%3D%22img%22%0AviewBox%3D%220%200%20512%20512%22%3E%3Cpath%0Ad%3D%22m0%200H512V512H0%22%0Afill%3D%22%23fff%22%2F%3E%3Cpath%20fill%3D%22%231d9bf0%22%20d%3D%22m458%20140q-23%2010-45%2012%2025-15%2034-43-24%2014-50%2019a79%2079%200%2000-135%2072q-101-7-163-83a80%2080%200%200024%20106q-17%200-36-10s-3%2062%2064%2079q-19%205-36%201s15%2053%2074%2055q-50%2040-117%2033a224%20224%200%2000346-200q23-16%2040-41%22%2F%3E%3C%2Fsvg%3E"></header><section class="social-embed-text" itemprop="articleBody">Is there any service which will let me "check in" to a beer? Because this Chocolate Tom I'm drinking is amazing.</section><hr class="social-embed-hr"><footer class="social-embed-footer"><a href="https://twitter.com/edent/status/94118367950155776"><span aria-label="0 likes" class="social-embed-meta">❤️ 0</span><span aria-label="2 replies" class="social-embed-meta">💬 2</span><span aria-label="0 reposts" class="social-embed-meta">🔁 0</span><time datetime="2011-07-21T18:55:42.000Z" itemprop="datePublished">18:55 - Thu 21 July 2011</time></a></footer></blockquote>

<p>The answers came in swiftly - <a href="https://untappd.com">Untappd</a> was the app to use.  So, a few minutes later:</p>

<blockquote class="social-embed" id="social-embed-94126841610244096" lang="en" itemscope="" itemtype="https://schema.org/SocialMediaPosting"><header class="social-embed-header" itemprop="author" itemscope="" itemtype="https://schema.org/Person"><a href="https://twitter.com/edent" class="social-embed-user" itemprop="url"><img class="social-embed-avatar social-embed-avatar-circle" src="data:image/webp;base64,UklGRkgBAABXRUJQVlA4IDwBAACQCACdASowADAAPrVQn0ynJCKiJyto4BaJaQAIIsx4Au9dhDqVA1i1RoRTO7nbdyy03nM5FhvV62goUj37tuxqpfpPeTBZvrJ78w0qAAD+/hVyFHvYXIrMCjny0z7wqsB9/QE08xls/AQdXJFX0adG9lISsm6kV96J5FINBFXzHwfzMCr4N6r3z5/Aa/wfEoVGX3H976she3jyS8RqJv7Jw7bOxoTSPlu4gNbfXYZ9TnbdQ0MNnMObyaRQLIu556jIj03zfJrVgqRM8GPwRoWb1M9AfzFe6Mtg13uEIqrTHmiuBpH+bTVB5EEQ3uby0C//XOAPJOFv4QV8RZDPQd517Khyba8Jlr97j2kIBJD9K3mbOHSHiQDasj6Y3forATbIg4QZHxWnCeqqMkVYfUAivuL0L/68mMnagAAA" alt="" itemprop="image"><div class="social-embed-user-names"><p class="social-embed-user-names-name" itemprop="name">Terence Eden is on Mastodon</p>@edent</div></a><img class="social-embed-logo" alt="Twitter" src="data:image/svg+xml,%3Csvg%20xmlns%3D%22http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%22%0Aaria-label%3D%22Twitter%22%20role%3D%22img%22%0AviewBox%3D%220%200%20512%20512%22%3E%3Cpath%0Ad%3D%22m0%200H512V512H0%22%0Afill%3D%22%23fff%22%2F%3E%3Cpath%20fill%3D%22%231d9bf0%22%20d%3D%22m458%20140q-23%2010-45%2012%2025-15%2034-43-24%2014-50%2019a79%2079%200%2000-135%2072q-101-7-163-83a80%2080%200%200024%20106q-17%200-36-10s-3%2062%2064%2079q-19%205-36%201s15%2053%2074%2055q-50%2040-117%2033a224%20224%200%2000346-200q23-16%2040-41%22%2F%3E%3C%2Fsvg%3E"></header><section class="social-embed-text" itemprop="articleBody">I just earned the 'Newbie' badge on <a href="https://twitter.com/untappd">@untappd</a>! http://untp.it/p3POA0</section><hr class="social-embed-hr"><footer class="social-embed-footer"><a href="https://twitter.com/edent/status/94126841610244096"><span aria-label="0 likes" class="social-embed-meta">❤️ 0</span><span aria-label="0 replies" class="social-embed-meta">💬 0</span><span aria-label="0 reposts" class="social-embed-meta">🔁 0</span><time datetime="2011-07-21T19:29:22.000Z" itemprop="datePublished">19:29 - Thu 21 July 2011</time></a></footer></blockquote>

<p>In the last decade, how much beer and cider have I drunk?</p>

<p>I've written before about <a href="https://shkspr.mobi/blog/2018/11/extracting-your-data-from-untappd/">how to extract your data from Untappd using their API</a>.</p>

<p>(A few notes. I don't check in to every drink - I only tend to do so if it's a new beer. Some of these are only tasters of a beer - not a full pint. This is mostly an exercise in playing with R. Visit <a href="https://www.drinkaware.co.uk/">DrinkAware</a> if you'd like to help manage your alcohol consumption.)</p>

<h2 id="quick-stats"><a href="https://shkspr.mobi/blog/2021/07/a-decade-of-drinking-beer-on-untappd/#quick-stats">Quick Stats</a></h2>

<ul>
<li>985 check ins.</li>
<li>801 unique drinks</li>
<li>3.68 average rating</li>
<li>4.93 average ABV</li>
</ul>

<h2 id="graphs"><a href="https://shkspr.mobi/blog/2021/07/a-decade-of-drinking-beer-on-untappd/#graphs">Graphs</a></h2>

<p>Is there a correlation between how strong a drink is, and how much I like it?</p>

<pre><code class="language-R">library(jsonlite)
beer_data &lt;- read_json("untappd_data.json", simplifyVector = TRUE)

abv &lt;- beer_data$beer$beer_abv
scr &lt;- beer_data$rating_score

plot(abv, scr, main="ABV vs Score", xlab="ABV", ylab="Score")
abline(lm(scr~abv), col="red") # regression line (y~x)
</code></pre>

<img src="https://shkspr.mobi/blog/wp-content/uploads/2021/07/scatter.png" alt="A very busy scatter graph." width="627" height="614" class="aligncenter size-full wp-image-39661">

<p>Hmmm... There's some week positive correlation there. But it's a bit muddled.  Let's turn that into a hexmap:</p>

<pre><code class="language-R">library(hexbin)
bin&lt;-hexbin(abv, scr, xbins=10, xlab="ABV", ylab="Score")
plot(bin, main="Hexagonal Binning")
</code></pre>

<img src="https://shkspr.mobi/blog/wp-content/uploads/2021/07/hex.png" alt="A hex graph with a strong centre." width="637" height="549" class="aligncenter size-full wp-image-39660">

<p>Aha! A bit easier to see. Most of the beers I drink are in the 4-5% ABV. And there is some correlation. But, mostly, I just like beer and cider.  Hmmm... Which do I prefer?</p>

<p>Let's take a look at Cider first:</p>

<pre><code class="language-R">library(data.table)
beer_data &lt;- read_json("untappd_data.json", simplifyVector = TRUE)
cider &lt;- beer_data[grepl("Cider", beer_data$beer$beer_name), ]
cabv &lt;- cider$beer$beer_abv
cscr &lt;- cider$rating_score
</code></pre>

<img src="https://shkspr.mobi/blog/wp-content/uploads/2021/07/Cider-Scatter.png" alt="Scatter plot with weak positive correlation." width="650" height="637" class="aligncenter size-full wp-image-39662">

<img src="https://shkspr.mobi/blog/wp-content/uploads/2021/07/Cider-Hex.png" alt="Hex plot." width="637" height="549" class="aligncenter size-full wp-image-39663">

<p>How much do I like Cider vs Beer?
Just beer (OK, also includes Mead and a few other not Ciders)</p>

<pre><code class="language-R">justbeer &lt;- beer_data[!grepl("Cider", beer_data$beer$beer_name), ]
boxplot(cscr, ylab="Score", main="Cider Scores", ylim=c(0,5))
</code></pre>

<img src="https://shkspr.mobi/blog/wp-content/uploads/2021/07/cider-vs-beer.png" alt="Box and whisker diagrams." width="568" height="585" class="aligncenter size-full wp-image-39669">

<p>I like Cider a bit more than beer. Yup!</p>

<p>Let's plot that data on a map!  It's a bit more complicated because the JSON is nested.</p>

<pre><code class="language-R">library(jsonlite)
beer_data &lt;- read_json("untappd_data.json", simplifyVector = TRUE, flatten = TRUE)

venues_list &lt;- beer_data$venue
venues &lt;- as.data.frame(do.call(rbind, venues_list))
locations_list &lt;- venues$location
locations &lt;- as.data.frame(do.call(rbind, locations_list))
locations &lt;-subset(locations, venue_state!="Everywhere")
</code></pre>

<p>Display them on an interactive map:</p>

<pre><code class="language-R">library(sf)
library(mapview)
locations_sf &lt;- st_as_sf(locations, coords = c("lng", "lat"), crs = 4326)
mapview(locations_sf)
</code></pre>

<img src="https://shkspr.mobi/blog/wp-content/uploads/2021/07/WorldMap.png" alt="Map of the world with dots all over it." width="1161" height="618" class="aligncenter size-full wp-image-39678">

<p>Let's zoom in on London:
<img src="https://shkspr.mobi/blog/wp-content/uploads/2021/07/London.png" alt="Points dotted all over Central London." width="643" height="528" class="aligncenter size-full wp-image-39677"></p>

<p>Yup! Looks about right.</p>

<p>Well, that was a fun afternoon of noodling with R. If you'd like to play with the data, you can <a href="https://shkspr.mobi/blog/wp-content/uploads/2021/07/untappd.zip">download a decade of my Untappd data in JSON format</a></p>
<img src="https://shkspr.mobi/blog/wp-content/themes/edent-wordpress-theme/info/okgo.php?ID=39658&HTTP_REFERER=RSS" alt="" width="1" height="1" loading="eager">]]></content:encoded>
					
					<wfw:commentRss>https://shkspr.mobi/blog/2021/07/a-decade-of-drinking-beer-on-untappd/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title><![CDATA["Why do we use R rather than Excel?"]]></title>
		<link>https://shkspr.mobi/blog/2021/07/why-do-we-use-r-rather-than-excel/</link>
					<comments>https://shkspr.mobi/blog/2021/07/why-do-we-use-r-rather-than-excel/#comments</comments>
				<dc:creator><![CDATA[@edent]]></dc:creator>
		<pubDate>Sun, 11 Jul 2021 11:08:15 +0000</pubDate>
				<category><![CDATA[/etc/]]></category>
		<category><![CDATA[excel]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[r]]></category>
		<guid isPermaLink="false">https://shkspr.mobi/blog/?p=39519</guid>

					<description><![CDATA[I recently had cause to take a beginners course in R - a language I&#039;m fairly familiar with.  One of the other students had never used it before, so we were buddied up in order for me to show them the ropes.  The first lesson of R is always the same.  Read a CSV, manipulate it a bit, draw a graph.  We did it all without much fuss - and a graph appeared on screen. Nifty!  &#34;I don&#039;t get it,&#34; the…]]></description>
										<content:encoded><![CDATA[<p>I recently had cause to take a beginners course in R - a language I'm fairly familiar with.</p>

<p>One of the other students had never used it before, so we were buddied up in order for me to show them the ropes.</p>

<p>The first lesson of R is always the same.  Read a CSV, manipulate it a bit, draw a graph.  We did it all without much fuss - and a graph appeared on screen. Nifty!</p>

<p>"I don't get it," the student said, "Why wouldn't you just use Excel for this?"</p>

<p>To a programmer, it seems obvious - but it's a fair question! If you have a static set of data, you can drag your mouse over it, hit a few buttons, and a graph appears. Much easier than wrestling with esoteric syntax in a text-based interface.  This isn't a Koan - where the student becomes enlightened at the end. It's a tricky question to answer.  Here are the reasons I gave - feel free to add your own.</p>

<h2 id="visibility"><a href="https://shkspr.mobi/blog/2021/07/why-do-we-use-r-rather-than-excel/#visibility">Visibility</a></h2>

<p>How do you see the code inside an Excel document? How do you tell exactly what is going on? You have to go clicking through cells, or reverse engineer what settings a graph has.</p>

<p>With something like R, you automatically have all the code visible in front of you. Reading through the code in a linear fashion is possible. You can trace exactly what the code is doing without having to worry about whether there's some code hidden in cell Z44.</p>

<h2 id="track-changes"><a href="https://shkspr.mobi/blog/2021/07/why-do-we-use-r-rather-than-excel/#track-changes">Track Changes</a></h2>

<p>Related to the above, it's hard to visualise what changes have been made to an Excel document.  I don't know any way to easily see how a formula has changed.  With R and Git (or any other version control system) you can see exactly what has changed from one version to the next.</p>

<h2 id="repeatability"><a href="https://shkspr.mobi/blog/2021/07/why-do-we-use-r-rather-than-excel/#repeatability">Repeatability</a></h2>

<p>Typically, a user draws a graph on a single Excel document. If you want the same graph of a different data set, you're out of luck. You can copy the data from one Excel sheet into another - but there's no way to easily copy a bunch of manipulations and graph configurations to another document.</p>

<p>With R, you just change <code>read.csv("1.csv")</code> to <code>read.csv("2.csv")</code> and the <em>exact</em> same calculations are run on two different data sets.</p>

<h2 id="batch-processing"><a href="https://shkspr.mobi/blog/2021/07/why-do-we-use-r-rather-than-excel/#batch-processing">Batch processing</a></h2>

<p>Related to the above, you can read every CSV in a directory and produce a graph for each of them.  You can read data from an API and run the same process on it that you did yesterday.</p>

<h2 id="extensibility"><a href="https://shkspr.mobi/blog/2021/07/why-do-we-use-r-rather-than-excel/#extensibility">Extensibility</a></h2>

<p>Excel has a wide range of graphs available - but R has more.  Excel can do basic analysis - but R can do extensive, complex analysis. Excel has some decent tooling - but R has thousands of libraries which can do a bewildering array of clever stuff.</p>

<h2 id="what-else"><a href="https://shkspr.mobi/blog/2021/07/why-do-we-use-r-rather-than-excel/#what-else">What else?</a></h2>

<p>Those were the advantages that I could think of on the spur of the moment.  Perhaps you can think of more.</p>

<p>But what I learned was that it is decidedly <em>non-obvious</em> why a user would want to use something like R or Python when Excel (seemingly) covers all the basics.</p>
<img src="https://shkspr.mobi/blog/wp-content/themes/edent-wordpress-theme/info/okgo.php?ID=39519&HTTP_REFERER=RSS" alt="" width="1" height="1" loading="eager">]]></content:encoded>
					
					<wfw:commentRss>https://shkspr.mobi/blog/2021/07/why-do-we-use-r-rather-than-excel/feed/</wfw:commentRss>
			<slash:comments>25</slash:comments>
		
		
			</item>
		<item>
		<title><![CDATA[Animated TreeMaps in R - the hard way]]></title>
		<link>https://shkspr.mobi/blog/2021/06/animated-treemaps-in-r-the-hard-way/</link>
					<comments>https://shkspr.mobi/blog/2021/06/animated-treemaps-in-r-the-hard-way/#comments</comments>
				<dc:creator><![CDATA[@edent]]></dc:creator>
		<pubDate>Mon, 14 Jun 2021 11:23:24 +0000</pubDate>
				<category><![CDATA[/etc/]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[HowTo]]></category>
		<category><![CDATA[MSc]]></category>
		<category><![CDATA[r]]></category>
		<category><![CDATA[tutorial]]></category>
		<guid isPermaLink="false">https://shkspr.mobi/blog/?p=39262</guid>

					<description><![CDATA[As I am a bear of very little brain, these are notes to myself on my slightly shonky process for creating animated TreeMaps in R. The aim is to end up with something like this:  https://shkspr.mobi/blog/wp-content/uploads/2021/06/animated-tree-map.mp4  Generate the images  Getting the data is left as an exercise for the reader (sorry!). This loops through the data and generates a separate image…]]></description>
										<content:encoded><![CDATA[<p>As I am a bear of very little brain, these are notes to myself on my slightly shonky process for creating animated TreeMaps in R. The aim is to end up with something like this:</p>

<p></p><div style="width: 540px;" class="wp-video"><video class="wp-video-shortcode" id="video-39262-2" width="540" height="540" preload="metadata" controls="controls"><source type="video/mp4" src="https://shkspr.mobi/blog/wp-content/uploads/2021/06/animated-tree-map.mp4?_=2"><a href="https://shkspr.mobi/blog/wp-content/uploads/2021/06/animated-tree-map.mp4">https://shkspr.mobi/blog/wp-content/uploads/2021/06/animated-tree-map.mp4</a></video></div><p></p>

<h2 id="generate-the-images"><a href="https://shkspr.mobi/blog/2021/06/animated-treemaps-in-r-the-hard-way/#generate-the-images">Generate the images</a></h2>

<p>Getting the data is left as an exercise for the reader (sorry!). This loops through the data and generates a separate image for each TreeMap:</p>

<pre><code class="language-R">for(week in weeks) {
  weekly_data &lt;- subset(file_data, Week == week)

  size &lt;- sqrt(sum(weekly_data$Count)) / 2
  if (size &lt; 40) {
    size &lt;- 40
  }

  map &lt;- ggplot(weekly_data, aes(area = Count, label = paste(Filetype,formatC(Count, big.mark=",") ,sep="\n"), subgroup = Category, fill=Category)) +
    geom_treemap(layout="fixed") +
    geom_treemap_text(colour = "white", place = "centre", grow = TRUE, layout="fixed")

  file_name &lt;- paste("media/", weekly_data$Week[1], ".png", sep="")
  ggsave(file_name, map, width = size, height = size, units = "mm")
}
</code></pre>

<p>The width and height are proportionate the the square-root of the size of the data. Annoyingly, ggplot works in millimetres rather than pixels!</p>

<p>If images are too small, R throws an error of "Viewport has zero dimension(s)". So this sets a minimum size. This value was found using trial and error.</p>

<p>The layout is fixed, <a href="https://cran.r-project.org/web/packages/treemapify/vignettes/introduction-to-treemapify.html">as per the documentation</a> which keeps the order of the elements and their labels.</p>

<h2 id="resize-and-reorientate-the-images"><a href="https://shkspr.mobi/blog/2021/06/animated-treemaps-in-r-the-hard-way/#resize-and-reorientate-the-images">Resize and reorientate the images</a></h2>

<p>Now I have a directory of images, each a different size. I want all of them to have the same size canvas and to be placed against the right-hand edge.</p>

<pre><code class="language-_">mogrify -gravity east -background white -extent 1750x1750 *.png 
</code></pre>

<p>That sets the "gravity" to the right - so the original image is centred vertically but is up against the right edge. The <code>extent</code> is the dimension of the new image.</p>

<p>Mogrify <em>overwrites</em> the original images.</p>

<h2 id="make-a-video"><a href="https://shkspr.mobi/blog/2021/06/animated-treemaps-in-r-the-hard-way/#make-a-video">Make a video</a></h2>

<p>This is a lazy way to shove all the images into a video.</p>

<pre><code class="language-_">cat *.png | ffmpeg -f image2pipe -r 10 -vcodec png -i - -vcodec libx264 out.mp4
</code></pre>

<p>or</p>

<pre><code class="language-_">ffmpeg -framerate 8 -pattern_type glob -i '*.png' -c:v libx264 -r 30 -pix_fmt yuv420p out.mp4
</code></pre>

<p>Some websites will need further conversion as they have specific codec requirements.</p>

<h2 id="thats-it"><a href="https://shkspr.mobi/blog/2021/06/animated-treemaps-in-r-the-hard-way/#thats-it">That's it</a></h2>

<p>It isn't the prettiest way to do things, but it seemed pretty effective. If you know of a more efficient - or more R-ish way to accomplish the same animation - please let me know.</p>
<img src="https://shkspr.mobi/blog/wp-content/themes/edent-wordpress-theme/info/okgo.php?ID=39262&HTTP_REFERER=RSS" alt="" width="1" height="1" loading="eager">]]></content:encoded>
					
					<wfw:commentRss>https://shkspr.mobi/blog/2021/06/animated-treemaps-in-r-the-hard-way/feed/</wfw:commentRss>
			<slash:comments>1</slash:comments>
		
		<enclosure url="https://shkspr.mobi/blog/wp-content/uploads/2021/06/animated-tree-map.mp4" length="201481" type="video/mp4" />

			</item>
		<item>
		<title><![CDATA[How not to do coding examples]]></title>
		<link>https://shkspr.mobi/blog/2021/06/how-not-to-do-coding-examples/</link>
					<comments>https://shkspr.mobi/blog/2021/06/how-not-to-do-coding-examples/#comments</comments>
				<dc:creator><![CDATA[@edent]]></dc:creator>
		<pubDate>Thu, 03 Jun 2021 11:21:54 +0000</pubDate>
				<category><![CDATA[/etc/]]></category>
		<category><![CDATA[code]]></category>
		<category><![CDATA[MSc]]></category>
		<category><![CDATA[r]]></category>
		<guid isPermaLink="false">https://shkspr.mobi/blog/?p=39136</guid>

					<description><![CDATA[As part of my MSc, I&#039;m getting a few lessons in technologies I&#039;m not familiar with.  I&#039;ve found some of these lessons extremely confusing - even when I&#039;m proficient in the language.  Here&#039;s an example of a coding fragment from one of the tutorials in the R language.  Let me explain everything that I think is wrong with it.  barplot(H, names.arg =M, col=“blue” xlab =&#039;Country&#039;, ylab=&#34;Population&#34;) so…]]></description>
										<content:encoded><![CDATA[<p>As part of my MSc, I'm getting a few lessons in technologies I'm not familiar with.  I've found some of these lessons extremely confusing - even when I'm proficient in the language.</p>

<p>Here's an example of a coding fragment from one of the tutorials in the R language.  Let me explain everything that I think is wrong with it.</p>

<pre><code class="language-R">barplot(H, names.arg =M, col=“blue” xlab ='Country', ylab="Population")
something &lt;- lm( mydata$Col1~mydata$Col2)
</code></pre>

<p>What are <code>H</code> and <code>M</code>?  They are defined earlier in the document, but giving single character variable names  is disrespectful to readers of the code. We aren't in the mainframe era where we have limited memory and have to use single characters. We're not being charged per word here!</p>

<p>There is no justification for single character variables. Even in toy examples like the above. Take the time to respect your readers' limited time.</p>

<p>Next - what's with the inconsistent spacing on the <code>=</code> symbol?  Some languages are whitespace significant, others aren't. If you're a newbie to this language, do you automatically <em>know</em> whether R behaves weirdly if spaces aren't consistent?</p>

<p>Curly quotes! A sure sign that something has been written in a word processor which has "helpfully" turned a humble <code>"</code> into something more extravagant.  Again, if you're new to coding, will you instinctively understand why the copy-and-pasted example has failed?</p>

<p>Why do some strings use single quotes and some use double quotes? Can <code>"</code> be replaced by two apostrophes - <code>''</code>?</p>

<p><code>something</code> is not a suitable variable name for <em>anything</em>. The <code>lm()</code> function creates a linear model. If this is the student's first time using linear models, are they going to remember what <code>something</code> is when halfway through the exercise? For tutorial code, it almost always makes sense to give variable names a hint of what they contain. For example, <code>age_int</code> lets the reader know what the variable represents <em>and</em> what it contains.</p>

<p>Again <code>mydata</code> is pretty meaningless. Is this the data I created or the data I loaded from the CSV? Sure, <code>world_population_from_csv</code> is a bit wordy - but it is unambiguous in the context of a lesson.</p>

<p>And on and on it went.</p>

<p>As I happens, I was the only person in our tutorial with experience of R. Lest you think I'm exaggerating, I had to deal with all of the above when trying to help my fellow students understand what was going on.</p>

<p>If you understand a programming language, you are almost <em>guaranteed</em> to write an unsuitable tutorial the first time around. You first need to put the examples in front of people unfamiliar with your language, paradigm, or even the basics of programming. Let them explain to you what they find confusing so that you can write a better tutorial.</p>

<p>Yes, that's harder work for you. But your job is to make learning easier for others.</p>
<img src="https://shkspr.mobi/blog/wp-content/themes/edent-wordpress-theme/info/okgo.php?ID=39136&HTTP_REFERER=RSS" alt="" width="1" height="1" loading="eager">]]></content:encoded>
					
					<wfw:commentRss>https://shkspr.mobi/blog/2021/06/how-not-to-do-coding-examples/feed/</wfw:commentRss>
			<slash:comments>6</slash:comments>
		
		
			</item>
	</channel>
</rss>
