As a libertarian I am automatically suspicious that government agencies may be spying on us all the time, though the story about PRISM in this week has aroused my technical curiousity.
The article in the Guardian appears to rely on a single Powerpoint file of dubious provenance. It was provided by a chap who hides his head and keyboard under a blanket when logging in so his password cannot be seen, despite him alleging that he’s aware of a program that can read most emails and messages on the planet. He is now ‘in hiding’ in a Hong Kong hotel room.
Whilst the Director of National Intelligence, James Clapper, has admitted that there is collection of data on non-US citizens under Section 702 of FISA, he said that the reports in the Guardian and Washington Post “contain numerous inaccuracies“. Also, several of the companies whose data is allegedly being accessed have already denied this.
I am not disputing that a collection programme exists, only that the scope of it portrayed in those newspapers is highly unlikely. I am still uncertain whether Clapper or Obama were referring to the Verizon court order, uncovered the previous day, in their limited admissions.
[Please note that many of the included calculations are rough and ready – I’ve a day job and am blogging this in my lunchtime! If someone wants to pay me to undertake a more accurate assessment then I’m happy to elaborate.]
Information On Slides Released So Far
One slide lists the following providers: Microsoft, Google, Yahoo!, Facebook, PalTalk, YouTube, Skype, AOL and Apple.
The same slide then asserts “What Will You Receive in Collection (Surveillance and Stored Comms)?” and then lists the following: E-mail, Chat (video, voice), Videos, Photos, Stored Data, VoIP, File transfers, Video Conferencing, Notifications of target activity – logins, etc., Online Social Networking details and Special Requests (sic).
Now firstly, let’s analyse what is meant by both Surveillance and Stored Comms. Surveillance means to watch someone, taking note of their activities, such as when they send a message and to whom. Many online are correctly referring to this as metadata, that is data that refers to other data. Interestingly my reading of the situation is that collection and analysis of this metadata many not need a warrant under US (and other countries’ legislation), whereas looking at what’s in the message definitely does. If the PRISM system was just collecting metadata then the collection requirements would be considerably less, though still enormous based on the providers and services listed above.
Secondly, let’s consider Stored Comms. This means storing the actual communication. With an email this can be as small as a few thousand bytes, instant messaging would be even less. Picture data would be huge, and videos would be even larger. Videoconferencing would be enormous – around 1MB of data per minute at least, increasing up to 11MB per minute for HD quality videoconferencing.
Let’s take a couple of examples. Firstly Facebook.
Facebook users are adding 350 million new picture files a day and Facebook already holds over 240 billion pictures (2012 figures). To support this growth Facebook engineers have to deploy (install) 7 Petabytes of new storage per month. That’s 7,000 Terabytes; or 7,000,000 Gigabytes; or 7,000,000,000 Megabytes. A conservative estimate of cost is around $3m per month for purchasing additional storage alone. If hosted on Amazon’s Glacier storage (the lowest price, slowest recovery storage) it would cost $20m a year just for Facebook photos alone!
A second example is Skype.
Skype: Skype does not record calls so for VoIP calls to be recorded the NSA would need to record every single conversation in real-time. Architecturally this is impossible as many calls do not route through Skype equipment, but are still connected peer-to-peer, despite changes to the Skype architecture listed in the blog above. A quick conservative estimate of Skype voice data is at least 147 Terabytes a month alone (based on June 2012 data from the Skype blog above), without any other transferred files or chat.
Other services to consider briefly:
- Apple: iCloud storage of all iPhones, iPods and iPads with automatic backups enabled.
- Microsoft: Hotmail (360 million users in July 2011) plus, one assumes, all cloud hosted email via Office 365 (over a million users).
- Google: All Google search data (100 Billion per month), plus Gmail (425 million active users a month).
- Yahoo!: Email and searches.
- YouTube: Videos – one estimate here is 22.7 Terabytes per day.
The cumulative data requirements could easily be estimated from their press releases and blogs, sadly I don’t have time to do this.
Put The Captured Data In The Cloud?
Using commercial cloud providers has been suggested as to how the NSA can store all this data – because “it’s cheap”. No it isn’t. The storage costs would be phenomenal, plus the processing costs. Any data I/O to remote datacentres is even more costly. If the NSA were to do this they would have to host it in the data source companies’ datacentres or its own military-grade secure facility, connected by dark fibre to each datacentre of each data source.
Technical Flaw – Telecoms
One of the slide graphics released suggests that there is a system called ‘Upstream’ that allows access to telecoms data “from fiber cables and infrastructure as the data flows past“.
Whilst gathering data from optical fibre cables is technically possibly, it would be very difficult physically and would be noticeable to the telecoms provider due to drops in signal strength. A fibre contains many channels (or wavelengths) of light, each of which will also have many channels of data multiplexed into it. Therefore this single statement alone damages the technical credibility of the presentation. Also there are too many cables coming in and out of the US for them to tap into all the actual fibres. (I was involved in the project to build just one of the existing transatlantic cables a decade ago, so have experience in this area).
If they were just connecting to telecoms ‘infrastructure’ (switches, multiplexers, etc.) then it would be more believable, although this is still pretty much impossible as the data volumes are enormous and much data is encrypted.
To tap into an interactive conversation, as per the Bourne films, whether over chat, voice or video would not be possible in real-time without knowing a considerable number of parameters, many unknown even to the network provider. Recording the data for later reference (with or without court order) would still require phenomenal amounts of storage. Cisco’s latest Visual Networking Index estimates global IP Internet traffic last year was 43,570 Petabytes per month (43,570,000 Terabytes, 43,500,000,000 Gigabytes): this equates to 16,700 Gigabytes of IP traffic a second. This would require 20 x 900 Terabyte hard drives a second to store this, costing approximately $6,000 a second!
While I don’t think the scope of data collection from servers as envisaged by the Guardian is impossible, I do think it’s highly improbable and would cost many $Billions per annum (just look up the IT storage costs of the companies above for an indication). If PRISM does exist it is likely to only capture metadata, that is data about conversations. This would still be a considerable amount of data and would require costs orders of magnitude above the $20m cited in the other slide below.
My personal instinct is that this Powerpoint is a fraud, for whatever reason. Whether the PRISM data collection programme exists is another question. If it does I don’t think the above released information would accurately reflect its capabilities nor its effectiveness. At best, it enables the NSA to search many metadata databases, though the legality of this, as either participant could be a US Citizen, is also dubious.
I still don’t trust any government with access to my data, but I’m not convinced any government, especially the US Government, has this capability. Yet.
The copyright of this article remains with the author. It can be used only if attributed to The New Liberty blog.
P.S. To the NSA, if you ever want to build something like PRISM properly then give me a call, I’m sure you know my number and my billing rates!