23 December 2018
Infrastructural problems and instabilities caused by cloud services
Introduction
I see more and more cloud-based services being used and increasingly, both private users and companies depend on these services. While the cloud can provide a great deal of value and save you precious time, it also poses risks to your infrastructure that should not be underestimated. I would like to share some concerns and advice about these services, especially when it comes to the storage of private data. I recommend you to avoid cloud services like OneDrive, Dropbox or other services that mainly store user data like Facebook, Skype or Gmail for data you care about. At least back up all your data locally on drives you own and download all the content you want to keep. The transition away from these services may be slow and tedious but their continued usage comes at a cost that is too large to be ignored.
Definitions
I consider services like Facebook, Skype and Gmail cloud services too, though depending on the definition of a cloud service, this may be debatable. Data that is inside these systems is in the cloud for all intents and purposes. It is certainly stored on remote servers and at most partially cached on your local computer. Large parts of your data are out of your direct reach, and that is where I'd like to draw the line for the remainder of this post.
Lack of ownership
Data that is not on a storage medium you own is not your data. It might look like you own it when you browse your cloud storage, but all of it can be gone in the blink of an eye and you have no control over it whatsoever. If a cloud provider is in financial straits, it is likely that they have to pull the plug before everyone downloaded their data. Conversely, files that you "permanently" deleted from the cloud may continue to exist invisibly, sometimes forever, making it impossible for you to erase files. The interface is often opaque and hides what is actually being done with your data. The control is not with you, the user. With local hard disks and a basic backup strategy, you have none of these problems and you have full control over your data.
The cloud provider can also delete or replace your files for whatever reasons they see fit. For example, a music service may replace one recording of a song with another and you cannot restore the original version anymore, even if you liked it more. Maybe they replace a version that contains swear words with a censored one. This action could be caused by a cultural shift. Imagine that the books inside your bookshelves rewrite themselves or disappear depending on prevailing views and values in a society, a government or the company that hosts the data. Maybe you don't even realize it because who knows the exact number of books they own, and what exactly is in them line for line? In the digital world, the unthinkable practice of sneakily changing or removing what is written is commonplace.
Another point is the rise of internet justice mobs, often on Twitter and similar toxic platforms, who try to ruin the lives of people who say unpopular or politically incorrect things. Campaigns by such mobs have led to the termination of a variety of private online accounts, like on Cloudflare and Google, due to the sheer pressure of online discourse and media. Imagine losing all your emails, calendar entries, YouTube videos and your Android login due to a sudden Google account closure. This is a scenario that actually happened more than once and it is in no way guaranteed that a closed account will be restored. So it is entirely possible that a cloud provider will decide to lock you out for arbitrary reasons, and the right to do so is almost always explicitly reserved in their terms and conditions, so there is no way to challenge this decision because you agreed to it up front. The cloud provider will certainly lock your account if ordered to do so by the authorities. Even if nobody was ordered or pressured into closing your account, modern platforms are governed by algorithms that screen your content and take it down or outright ban you if a violation is detected. Unfortunately, content curating AI is in its embryonic stages and it is not known when and if it will ever work properly. The algorithms suffer from a myriad of problems like false positives and are abused by malicious people who want to shut down accounts by flagging content that does not violate the guidelines. Companies like Google largely replaced human tech support with these algorithms which means that it's almost impossible to get someone to help you once the algorithms fail. All these factors lead to an increased risk of data loss and denial of service, and it can be very difficult and tedious to replace the infrastructure whose access you just lost. You do not own your data if you do not own the hardware, it is as simple as that.
Data on the cloud may also not belong to you in another way: Some online platforms make you give up some or all of your rights to your intellectual property. They may also make you give them special permissions about the usage, modification and redistribution of your content. With these legal measures, the platform is able to gain power over you by controlling your content. If your daily income is dependent on an online platform, you are forced to create content that abides by their guidelines (whose enforcement is often quite arbitrary), which means that the platform is now dictating what you can say or do.
Lack of reliability
It may sound strange, but hosting your own solution is often more reliable than cloud services. At first, this sounds counterintuitive: "Can't the professionals do better than I?" That is most likely the case, but the crux is, the professionals who look after cloud services solve a different problem. While you manage a Raspberry Pi, a NAS box or a mini desktop, they regularly shuffle around petabytes of data. They are mainly concerned with hard disk replacements, energy consumption, cooling and load balancing. They serve millions if not billions of users every day. If you don't get your data, but the rest of the world does, they call that a successful day because that is an amazing quality of service, statistically speaking. They can schedule downtimes at times that are inconvenient for you. You may encounter performance or connectivity problems and you have no clue whose machine is at fault - often, official communication is insufficient or non-existent even during severe outages and tech support is left in the dark too, if they are reachable at all. Generally, if something breaks, you are helpless.
Since servers and data lines are shared between many users, performance drops during periods of high activity are the rule rather than the exception. This mode of operation is called statistical multiplexing and allows a service to be very cheap, but it only runs stable under the assumption that a limited, small percentage of customers uses the service at the same time. During periods of high activity, nobody is guaranteed to receive the service, and if they do, the performance is often abysmal. Almost all private internet connections are served with statistical multiplexing, but at least your CPU and the rest of your hardware is not shared like that with strangers if you self-host, leading to a more predictable quality of service.
Lack of privacy
If you upload unencrypted data, your cloud provider can see everything. They can search through, analyse and index your data. They could even sell information about it to advertisers. Special care has to be taken when it comes to privacy policies, especially since a lot of companies just change their privacy policy with reluctant and barely existing efforts to affirm your consent, which is illegal but almost always goes unpunished. This is a convenient way to smuggle dangerous clauses into privacy policies, though usually nobody reads them anyway. And even if the privacy policy was violated, the enforcement is often lacklustre at best - if the violation was even found out within a reasonable time frame. I won't go into mass surveillance by governments, the implications of the Snowden revelations should be clear to everyone. To put it bluntly, with unencrypted cloud storage, you cannot expect any privacy whatsoever, and it would be foolish to do so.
To protect your privacy, you should always fully encrypt your data with a state of the art encryption scheme. I recommend something like AES-256 which is widely believed to be secure as of the year 2018 and is well supported across the board. However, beware, cloud providers often advertise that they support full cloud storage encryption, but what they do not like to tell you is that they themselves hold the key. This is almost like uploading plain unencrypted data. No privacy has been gained whatsoever - your cloud storage provider can still peek into all your data at all times. You need to encrypt your data with a key you and only you hold. A term that is often used in this context is end-to-end encryption (E2EE). What this means is that the data stays securely encrypted throughout the entire path it takes, and only the endpoints (in this case, only your own computers are endpoints, not the cloud provider's) can decrypt the data. End-to-end encryption is like the gold standard for privacy and highly recommended for digital communication and storage of data.
Cloud providers do not like fully encrypted data to which they hold no key. This has various reasons. When they can inspect and index your data, they may be able to improve cloud performance and implement handy features like previews and text search. Mail providers can filter spam much more effectively if they can inspect your communications. There may also be legal reasons that compel cloud providers to reject end-to-end encryption. In the worst case, they don't like end-to-end encryption because they sell information about you to advertisers or other third parties. Intelligence agencies also hate strong end-to-end encryption because it is a major obstacle to mass surveillance. In general, encrypted data to which only you hold the key disempowers the cloud provider and other third parties and empowers you, the user and owner of the data, though it might come at a cost of diminished convenience.
Insufficient time horizon
With an average life expectancy between 70 and 84 years in large parts of the world, that means the time horizon for a young person to store their personal data is around half a century or more. Half a century translates to an eternity in the very young cloud industry. The unstable nature of companies, services, software, data formats and also digital law doesn't lend itself to such time horizons. A properly set up Linux box needs absolutely minimal maintenance and updates to stay in shape. Nobody deletes your data, shoves breaking updates into your face or has random outages due to problems that can only occur at enormous data centres. It also frees you from any vendor lock-in and provides good compatibility to computers of most brands.
Needless internet dependency
Many services don't really need the internet. Some services only need an internet connection infrequently and offline work is mostly possible. Moving such services entirely to the cloud creates a needless internet dependency where none is needed. New points of failure are introduced. These services should be deployed on premise, internally on LAN. As a side-effect, both your privacy and security are greatly enhanced, and so is your data transfer rate. Even locally deployed services can later be hooked up to the internet if need be.
Conclusion
Cloud services can be powerful tools. However, for the privacy aware user who is above all interested in a long-term, robust and minimalist solution and who values freedom and ownership of their data, local solutions are superior. The most important thing is data redundancy and avoiding (over-)reliance on single services. I don't expect everyone to run their own mail server now because that's a difficult thing to do. There are applications where the benefits of the cloud outweigh the risks. I just want to warn you about the implicit and often hidden costs of cloud services and the instabilities they can introduce to your life. I don't want anyone to be surprised when they are let down by their cloud service and they can't do anything about it. Keep your data as local as possible and as remote as necessary. Don't go for the cloud without having good reasons for it.
Post comment
Comments
(no comments yet)