Terence Eden. He has a beard and is smiling.
Theme Switcher:

We've got to stop sending files to each other

· 19 comments · 650 words · Viewed ~10,366 times


Another day, another data breach.

the spreadsheet, initially shared in 2022, and thought to contain data related to a small number of applicants, had contained hidden data related to more than 18,000 people. 

ICO statement in response to 2022 MoD data breach

Why are people still sending files to each other? I remember having a stand-up argument a decade ago with a project manager who wanted us to email a completed Word template to him every day. He'd then spend hours merging the various documents together. He couldn't get his head around the collaborative document suite the company had purchased a licence for. I tried showing him that we could give specific people write-access to the document and they could edit it live. No more emailing back-and-forth.

It just didn't stick. It wasn't that he was ignorant about what computers could do, but his entire mental model was built around files. Discrete packets of data with a fixed metaphor from the real world.

Collaborative online documents don't have an easy analogue analogue. It is rare to see a dozen people scribbling on the same whiteboard or using the same typewriter keyboard.

Permissions are another things that aren't intuitive. The idea that only specific people can see something doesn't match our expectations of paper. Sure, anyone could grab a pen and deface it, that's why we have one person in charge of the "master copy".

Copy. What a hateful word.

The modern workforce shouldn't be flinging copies to each other. A copy is outdated the moment it is downloaded. A copy has no protection against illicit reading. A copy can never be revoked.

Data shouldn't live in a file on a laptop. It shouldn't be a single file on a network share. Data is a living beast. Data needs to live in a database - not an Excel file. Access should be granted for each according to their needs.

I see the same issue in the WeTransfer kerfuffle. Very Serious People saying it was intolerable that the untrusted 3rd party they were using to share Very Sensitive Information was going to read that information.

At which point you have to throw up your hands and ask why people are sending files to each other in the year of Our Lord 2025?!?!? If you have a sensitive file, use proper access controls. Or at least use a password so the FTP-as-a-service provider can't steal your IP.

And git! Don't get me started on git! The best minds of a generation stuck in a paradigm of downloading files to their local machine, making changes, then emailing git pushing them up to be approved? Madness!

Look, there are some times when you need a local copy. I want my own copy of my insurance documents - but that's not a living doc; it is an agreed artefact. Sure, it's handy to have access when there's no network connection - but that's what background sync is for. OK, you're on Office 365 and I'm on Google - so we'll have to work a little harder to set up access.

But all of this is possible!

We rant and rave about the 💾 icon being a skeuomorph. But the very concept of an individual file is also a skeuomorph! Data are not stored on paper files. There is no such thing as a filesystem directory - it's just a convention to make computing palatable for people born in the 20th century who lived in a world of A4 paper and manilla folders.

Modern computing is still stuck in the past. Our computers are like cars which have been designed to carry a bale of hay to mop up the horse-piss.


Share this post on…

19 thoughts on “We've got to stop sending files to each other”

  1. @Edent

    > Why are people still sending files to each other?

    Because CAP Theorem. A resource can have at most two properties out of three: Consistency, Availability, Partition tolerance.

    - A local digital file is AP. It is not Consistent. See merging issues, multiple versions, etc...
    - A URL pointing to a resource is CP. It is not Available when you cannot connect to its server.
    - A physical filing cabinet is CA. It is not Partition tolerant; there is one physical object.

    Reply | Reply to original comment on mastodon.social

  2. I can see what you are hinting at, but how would you share photo's of the children with the grandparents? To take one example. You can try Photos on iOS, if everyone has an iPhone and you know their Apple accounts. Maybe Google Photos if they are on Android - but how do I get my iPhone photos in Google Photos - what if grandpa doesn't have a Gmail account? Sharing files might not be optimal, but it works. Isn't that why it sticks.

    Reply

    1. What if grandpa starts sharing photos on social media, despite being told not to? What if grandma forwards the email to scammer she's talking with? What if they get hit with malware and lose all the photos on their phone? What if…

      You can make up a million scenarios where things might go wrong with something.

      Luckily, I don't have kids. So I don't know what solutions there are available for that particular problem.

      Reply

  3. @Edent I'd love to be able to do this, in my industry I have often joked we should have never left FTP servers behind, - I have since written my own upload portal but it's tcp based and so only works at "good" speeds for UK/London based clients,

    If anyone finds a good selfhosted QUIC based upload portal capable of multiple GB without installing anything (ie all in browser) I'm all ears.

    Reply | Reply to original comment on chaos.social

  4. @blog I'm with you so far as the transfer service intermediaries. There's no need for them, SFTP etc directly from me, for instance, to you, or me sending you the URL to a file I've exposed on my machine is better.

    However, the central service thing for collaboration is an assertion that the centre is in charge, and if I wanted to deal with two centres, I'd need two computers, or two sets of software, unless they happened to have hit on the same one.

    Reply | Reply to original comment on photog.social

  5. OK, you're on Office 365 and I'm on Google - so we'll have to work a little harder to set up access.

    I think this is the core technical issue, especially when working across organisational boundaries, where permission get messy even if using the same technology. And this lack of reliability then causes people to fall back to copies.

    So it falls back to usability issues with permissioning systems, which I've not seen done well across multiple identity providers.

    Reply

  6. Looking at the reactions to this post, a lot of it would benefit from defining the kind of data that should be shared this way.

    I can think of three kinds of data for which different approaches for sharing are suitable:

    Artefacts

    This is data that has been published at a specific point in time and serves as a reference for others to build off. For this the traditional file model works well, whether documents in PDF, datasets in CSV or SQLite files, or software release tarballs. These files can be combined with attestation to prove provenance, but the important thing is that they are under the complete control of the recipient, which copying achieves. Audit trails are not enough here, as you would not be able to maintain an audit trail showing changes for an artefact that has been deleted without making the delete pointless.

    Living resources with an authoritative source

    This is the category where having a URL, or other content independent reference, works best. This includes limited-access sensitive resources, like Afghan databases, in which case authorization and dynamic access controls becomes important, but also read-only public resources like bus timetables, which might be updatd by the operator, or even things like a wiki, which while they might be updated by anyone, it is always avaliable from a single location.

    They can also be transient, such as temporary live sharing of code.

    Living resources with no authoritative source

    Perhaps the messiest, these are resources that are changing, but don't have a single source. It could be something like distributed software development (e.g. how most Linux kernel developers send their changes to subsystem maintainers rather than Linus Torvalds, but can still pull from his tree), but can more generally be any federated service.

    I'm not sure how best this final category could be served, aside from serialising things into many small artefacts and copying those, which is what git and other SCM systems do.

    Reply

Trackbacks and Pingbacks

What are your reckons?

All comments are moderated and may not be published immediately. Your email address will not be published.

See allowed HTML elements: <a href="" title="">
<abbr title="">
<acronym title="">
<b>
<blockquote cite="">
<br>
<cite>
<code>
<del datetime="">
<em>
<i>
<img src="" alt="" title="" srcset="">
<p>
<pre>
<q cite="">
<s>
<strike>
<strong>

To respond on your own website, write a post which contains a link to this post - then enter the URl of your page here. Learn more about WebMentions.