Thursday, 16 October 2008

1984 project delivered late? Big brother database.

You have probably seen the splashes on the news pages. The British government are considering a database that logs a degree of internet traffic. There is a report here if you missed it

What are they considering logging? Well let us look at what is currently logged. Details of the times, dates, duration and locations of mobile phone calls, numbers called, website visited and addresses e-mailed are already stored by telecoms companies for 12 months. Any of these details are surrendered to an appropriate agency on request. The proposal is that these records should now be held for 2 years and be held directly by the government.

Jacqui Smith went on to say: "There are no plans for an enormous database which will contain the content of your emails, the texts that you send or the chats you have on the phone or online.”

Hmmm… let us consider what is being said here. Not the content then. What reasonable use would there be in storing the email header information only? Well, you would have the IP address it was sent from, the email account that it was sent from and you would have the time that it was sent. That is no great trick for SMTP since it is sent in plain text by default. SMTP (mail) protocols are really just special purpose TCP/IP chatter on port 25. This stuff is defined in RFC 821 and 822. It is easy enough to log that stuff if you can record any packet on a network. You can do similar things for IMAP and POP3. So, to effectively you would need to be sitting on the email servers to record this. Ok. The UK government can enforce this on UK servers if they want to – you can’t fight city hall… but what if the email is not on a UK server? Hotmail is not based in the UK and I am willing to bet that it doesn’t internally use SMTP or IMAP – when sending a message from one hotmail user to another, you are effectively doing a database operation and that is how I would implement it if I were you. I bet that most web based email services such as Yahoo, Gmail and so on work that way. The UK government could ask Google to send it this data but would they? It seems unlikely. How about (a Russian free webmail) or which is in Jordan. Now, Jordan and the UK get on pretty well but would they reasonably hand over that sort of data to the UK government? I don't think so. The Russians? Even less chance. There are hundreds of web email providers.

Oh, and here is something else that makes me wonder. You know why the industry doesn’t chase down the people who send the SPAM? Well, how would you tell who they were? It is trivial to fake an SMTP header and that is what the spammers do. There is nothing to stop the terrorists doing the same.

How about SMS messages? Well, they are a bit different because the whole message is sent as a packet. Longer messages are sent as multiple messages and stitched back together later, it seems. The message and the header are all in the same packet. I suppose that a scheme could overwrite the message content before recording the packet to a log but I would be surprised if that were done. The Multimedia Messaging Service protocols are more complex and more problematic.

Logging all phone numbers and times of calls and location of the caller? Well, that is pretty powerful if you know who the number represents. More than 75% of the UK population have a mobile phone. What other government can claim to be able to track 75% of their population at any time? Of course, pay as you go phones can be a problem. Pop into Tesco with some cash and you can buy a phone and some air time. Name? You are not required to give it. You want a free SIM card? You can have a dozen. Companies want to give them away. Why would a terrorist use the same one twice? This measures strikes me as an excellent way of monitoring the honest and the stupid but a rotten way of monitoring the intelligent and devious. There is also the question of the sheer volume of data as there is with emails. There are 60 million people in the UK roughly. About 75% have a mobile. That is 45 million mobiles to track. Some of those are teenagers who send dozens of texts a day. That could easily be 450 million texts per day. That is more than 160 billion texts per year. Good luck analysing that many. As for emails, that boggles the mind. There are more than 100 billion SPAM emails per day. Britain punches above her weight her because computer ownership is common. Let us say that 5% of these are in the UK. So, 5 billion SPAM emails per day. That is 1.8 trillion emails per year. Good luck in storing and scanning all those.

Hmmm… what websites were visited? That could be a useful one. In the course of writing this post, I have been to over 100 sites and I made no attempt at all to hide where I went. I don’t mind anyone knowing that I was looking at news sources and RFCs. Had I minded, I would have used a proxy. There are over 2000 free web proxies, hardly any of which are in the UK. You could investigate everyone who uses a proxy, of course. He who would keep a secret must keep it secret that he has a secret to keep, if I may quote Carlyle. You would be looking at trillions of web addresses each year though. It would be difficult data to mine. Where would you capture the data? The DNS servers would seem to be an obvious choice but I don’t need to go via a DNS server at all – indeed, the local cache serves most of my needs and I can keep a hosts file as large as I need. I don’t have to use a UK based DNS service at all and unless data is harvested at every router along the way, I don’t see how the traffic could be recorded as it doesn’t go through a central point. Again, you can monitor those who let you but those that want to slip through the net will find it easy enough to do so.

What about other forms of communication? Instant messaging would be hard to monitor – text messages for most types go via the server but voice and data go from peer to peer via UDP. That would be hard to monitor without something very like the Bundestrojaner, a bit of software created by the Austrian government to monitor individual computers using malware type techniques. That would be politically difficult to implement widely. Audio and video data is hardest yet to capture and when you look at structures like the Skype cloud architecture where there is little centralised control, it is tempting to throw up your hands in horror.

Of course, the more data you collect, the less effective your screening is. You really want to monitor the smart and criminal ones – and you have data on the dumb and the honest. You have so much data that it could only be analysed by machine, even if you have an army of spooks. The more data you have, the lower the signal to noise ratio and the less intelligent scrutiny you can give to the signal.

The problem is actually still worse. Let us consider what data related to terrorism might look like. Would it be a message saying “On Tuesday, we will meet at the town hall at 7:30. You bring the semtex and I will bring the guns. If wet, meet in the King’s head”? Why would it be in English? Why would it be in plain text? I could send that information as an MP3 of speech, as a JPG, as a video, as an encrypted file or hidden in a dozen ways, many of which are well known and have been used in dozens of films. We can safely assume that any terrorist worth his salt can do 20 minutes research. Code books are old hat but they still work. No scanning program can work out whether a discussion of the health of an aged relative really means something different when decrypted the old fashioned way with a look up reference such as the old book ciphers. There are also some cool things that you can do with steganography.

So, what does this cost us if it is implemented? Well, maybe not much. If the data is mostly ignored then there is little loss of liberty and the intelligence services will not be wasting much of their time. It might be useful in a case where our friends in the Office for Security and Counter Terrorism were trying to work out who a suicide bomber had been talking to.

However, if it is misused, it will have a massive effect on civil liberties and will blind the intelligence services because there will be too much data to ever process.

There is also a problem that you always have to consider. Even if you trust this government (and I am making no statement at all on that), do you trust every government that will come after? Will none of them use this to oppress their opponents or police the ranks of their own party? Will no future government use this to control its population? Forever is a very long time. There will be a bad leader some day. I leave it to you to decide how happy you are with that thought.

Signing off

Mark Long, Digital Looking Glass


Anonymous said...

Some additional considerations:

Even if the contents of the messages are not saved, the origin and recipient is enough to build sociograms, in other words, a map of who you associate with.

Add to this the possibility to cross reference different kinds of data.

Say for instance that I do some searches on various epidemics for a game I'm working on. Combine that with me having been to the Middle East 8-10 times. I email and talk on the phone with a friend who was in prison for not doing the military service. I have borrowed political books from the library. I buy suspicious amounts of electronics on my credit card. Suspiciously often, I fill up the tank of my car on a petrol station in Stockholm, 250 km away from home.

Could I be a terrorist planning something in Stockholm?

Another example by a Swedish politician: Every friday, I buy a bottle of wine (on my credit card), drive to Malmskillnadsgatan (a street that has become the symbol of prostitution in Sweden) which is noticed by logging the position of my mobile phone, spends an hour there, before going home. What does it look like to someone analysing that data? How much could it hurt my reputation? Does the data show that I'm visiting my old mother who lives there?

It is this cross referencing of seemingly innocent data that is so dangerous.

We already have a law that will force the ISP's to hand over ALL traffic (yeah, I know, it's a lot) to the military intelligence, without any insight at all from the people. Another law will make logging of mobile phone positions and calls mandatory. Medical journals and dna collected for research has already been misused by the police. Credit card purchases has likewise been misused. A new proposition suggests that speed control cameras should register which vehicles pass them, regardless of speed. The DNA register of every Swedish citizen since the beginning of the 70'ies will be accessible to the police, despite being collected for research. Police are now expected to routinely collect DNA samples of suspects, because their labs DON'T HAVE ENOUGH TO DO! The police is allowed to search the house of anyone with a weapon licence without a warrant or even suspicion of crime. They keep track of when you leave the country and where you go. They keep track of your travels on public transportation. Face recognition surveillance cameras are coming. All this information can be put together, cross referenced, analysed. It can, and will be, traded with other countries, countries where legal actions in your country may be illegal, for example homosexuality.

This is not science fiction, most of it is already here, and the rest is technologically possible and there is a political will to go through with it. To make matters worse, the legal framework is being reworked so that the security mechanisms are weakened.

As an example of how we are monitored, I listened to a speech by Per Hellqvist, security expert at Symantec, at our annual user conference. He talked about how phone positions could be logged and used for both surveillance, but also for such things as directed marketing. Less than an hour later, I got an SMS with an offer to buy a new phone at a good price at my local phone store (which was named). It was not the one closest to my home, but the one I move around most in the vicinity of. They had checked the position of my phone and sent me that directed SMS. After hearing him, that gave me chills.

We are not that far from a society where, when we have a meal at a restaurant, the waiter asks us to provide identification, and please, to speak clearly and into the flower pot on the table. People have not yet realized this, as they are still used to the "the government means us well" thing. Well, we all know what the road to hell is paved with. One by one, these measures can be defended and may look reasonable, but when one looks at the entire complex, it's scary.

Anonymous said...

One small addition: Most of my examples are from the speech by Per Hellqvist. Credit where credit is due.

Mark Long said...

Hmmm... It seems to me that we are in agreement about some of things here. However, I was deliberately not addressing the civil liberties aspects of this since this is a techical blog.

Civil liberties are important and there always needs to be a balance but that is too large a subject for me to tackle.