The Joy of Stats

Download Subtitles

Transcript

0:00:03 > 0:00:10The world we live in is awash with data that comes pouring in from everywhere around us.

0:00:10 > 0:00:14On its own this data is just noise and confusion.

0:00:14 > 0:00:22To make sense of data, to find the meaning in it, we need the powerful branch of science - statistics.

0:00:22 > 0:00:26Believe me there's nothing boring about statistics.

0:00:26 > 0:00:29Especially not today when we can make the data sing.

0:00:29 > 0:00:33With statistics we can really make sense of the world.

0:00:33 > 0:00:35And there's more.

0:00:35 > 0:00:40With statistics, the data deluge, as it's being called, is leading us

0:00:40 > 0:00:46to an ever greater understanding of life on Earth and the universe beyond.

0:00:46 > 0:00:50And thanks to the incredible power of today's computers,

0:00:50 > 0:00:57it may fundamentally transform the process of scientific discovery.

0:00:57 > 0:01:02I kid you not, statistics is now the sexiest subject around.

0:01:23 > 0:01:25Did you know that there is one million boats in Sweden?

0:01:25 > 0:01:27That's one boat per nine people!

0:01:27 > 0:01:31It's the highest number of boats per person in Europe!

0:01:41 > 0:01:45Being a statistician, you don't like telling your profession at dinner parties.

0:01:45 > 0:01:48But really, statisticians shouldn't be shy

0:01:48 > 0:01:51because everyone wants to understand what's going on.

0:01:51 > 0:01:56And statistics gives us a perspective on the world we live in

0:01:56 > 0:01:59that we can't get in any other way.

0:02:03 > 0:02:09Statistics tells us whether the things we think and believe are actually true.

0:02:19 > 0:02:25And statistics are far more useful than we usually like to admit.

0:02:25 > 0:02:29In the last recession there was this famous call-in to a talk radio station.

0:02:29 > 0:02:37The man complained, "In times like this when unemployment rates are up to 13%, income has fallen by 5%,

0:02:37 > 0:02:41"and suicide rates are climbing, and I get so angry that the government

0:02:41 > 0:02:45"is wasting money on things like collection of statistics."

0:02:48 > 0:02:50I'm not officially a statistician.

0:02:50 > 0:02:55Strictly speaking, my field is global health.

0:02:58 > 0:03:03But I got really obsessed with stats when I realised how much people

0:03:03 > 0:03:06in Sweden just don't know about the rest of the world.

0:03:06 > 0:03:10I started in our medical university, Karolinska Institutet,

0:03:10 > 0:03:13an undergraduate course called Global Health.

0:03:13 > 0:03:17These students coming to us actually have the highest grade you can get

0:03:17 > 0:03:18in the Swedish college system,

0:03:18 > 0:03:22so I thought, "Maybe they know everything I'm going to teach them."

0:03:22 > 0:03:25So I did a pre-test when they came, and one of the questions

0:03:25 > 0:03:28from which I learned a lot was this one -

0:03:28 > 0:03:32which country has the highest child mortality of these five pairs?

0:03:32 > 0:03:34I won't put you at test here, but it is Turkey

0:03:34 > 0:03:37which is highest there, Poland,

0:03:37 > 0:03:40Russia, Pakistan, and South Africa.

0:03:40 > 0:03:43And these were the result of the Swedish students.

0:03:43 > 0:03:44A 1.8 right answer out of five possible.

0:03:44 > 0:03:49And that means there was a place for a professor of International Health and for my course.

0:03:49 > 0:03:56But one late night when I was compiling the report, I really realised my discovery.

0:03:56 > 0:04:01I had shown that Swedish top students know statistically

0:04:01 > 0:04:04significantly less about the world than the chimpanzees.

0:04:06 > 0:04:09Because the chimpanzees would score half right.

0:04:09 > 0:04:12If I gave them two bananas with Sri Lanka and Turkey,

0:04:12 > 0:04:15they would be right half of the cases, but the students are not there.

0:04:15 > 0:04:20I did also an unethical study of the professors of the Karolinska Institutet,

0:04:20 > 0:04:25that hands out the Nobel Prize for medicine, and they are on par with the chimpanzees there.

0:04:28 > 0:04:32Today there's more information accessible than ever before.

0:04:32 > 0:04:35'And I work with my team at the Gapminder Foundation

0:04:35 > 0:04:41'using new tools that help everyone make sense of the changing world.

0:04:41 > 0:04:45'We draw on the masses of data that's now freely available

0:04:45 > 0:04:49'from international institutions like the UN and the World Bank.

0:04:49 > 0:04:53'And it's become my mission to share the insights

0:04:53 > 0:05:00'from this data with anyone who'll listen, and to reveal how statistics is nothing to be frightened of.'

0:05:02 > 0:05:05I'm going to provide you a view of

0:05:05 > 0:05:09the global health situation across mankind.

0:05:09 > 0:05:14And I'm going to do that in hopefully an enjoyable way, so relax.

0:05:14 > 0:05:17So we did this software which displays it like this.

0:05:17 > 0:05:19Every bubble here is a country -

0:05:19 > 0:05:21this is China, this is India.

0:05:21 > 0:05:23The size of the bubble is the population.

0:05:23 > 0:05:27I'm going to stage a race between this sort of yellowish Ford here

0:05:27 > 0:05:32and the red Toyota down there and the brownish Volvo.

0:05:32 > 0:05:36The Toyota has a very bad start down here, and United States,

0:05:36 > 0:05:38Ford is going off-road there,

0:05:38 > 0:05:40and the Volvo is doing quite fine, this is the war.

0:05:40 > 0:05:43The Toyota got off track, now Toyota is on the healthier side of Sweden.

0:05:43 > 0:05:46That's about where I sold the Volvo and bought the Toyota.

0:05:46 > 0:05:47AUDIENCE LAUGH

0:05:47 > 0:05:50This is the great leap forward, when China fell down.

0:05:50 > 0:05:53It was the central planning by Mao Zedong.

0:05:53 > 0:05:56China recovered and said, "Never more stupid central planning,"

0:05:56 > 0:05:57but they went up here.

0:05:57 > 0:06:02No, there is one more inequity, look there - United States

0:06:02 > 0:06:07They broke my frame. Washington DC is so rich over there,

0:06:07 > 0:06:13but they are not as healthy as Kerala in India. It's quite interesting, isn't it?

0:06:13 > 0:06:14LAUGHTER AND APPLAUSE

0:06:20 > 0:06:25Welcome to the USA, world leaders in big cars

0:06:25 > 0:06:28and free data.

0:06:28 > 0:06:35There are many here who share my vision of making public data accessible and useful for everyone.

0:06:35 > 0:06:43The city of San Francisco is in the lead, opening up its data on everything.

0:06:43 > 0:06:47Even the police department is releasing all its crime reports.

0:06:47 > 0:06:50This official crime data has been turned

0:06:50 > 0:06:55into a wonderful interactive map by two of the city's computer whizzes.

0:06:55 > 0:06:58It's community statistics in action.

0:07:09 > 0:07:13Crimespotting is a map of crime reports from the San Francisco Police Department

0:07:13 > 0:07:16showing dots on maps for citizens to be able to see

0:07:16 > 0:07:19patterns of crime around their neighbourhoods in San Francisco.

0:07:19 > 0:07:25The map is not just about individual crimes but about broader patterns that show you where crime is

0:07:25 > 0:07:27clustered around the city, which areas have high crime,

0:07:27 > 0:07:30and which areas have relatively low crime.

0:07:36 > 0:07:41We're here at the top of Jones Street on Nob Hill...

0:07:42 > 0:07:45..quite a nice neighbourhood.

0:07:45 > 0:07:49What the crime maps show us is the relationship between

0:07:49 > 0:07:51topography and crime.

0:07:51 > 0:07:54Basically the higher up the hill, the less crime there is.

0:07:56 > 0:07:58You cross over the border

0:07:58 > 0:08:00into the flats...

0:08:02 > 0:08:09Essentially as soon as you get into the lower lying areas of Jones Street the crime just skyrockets.

0:08:20 > 0:08:24We're here in the uptown Tenderloin district.

0:08:26 > 0:08:30It's one of the oldest and densest neighbourhoods in San Francisco.

0:08:30 > 0:08:32This is where you go to buy drugs.

0:08:32 > 0:08:33Right around here.

0:08:37 > 0:08:41We see lots of aggravated assaults, lots of auto thefts.

0:08:41 > 0:08:48Basically a huge part of the crime that happens in the city happens in this five or six block radius.

0:08:55 > 0:08:58If you've been hearing police sirens in your neighbourhood,

0:08:58 > 0:09:02you can use the map to find out why.

0:09:02 > 0:09:05If you're out at night in an unfamiliar part of town,

0:09:05 > 0:09:09you can check the map for streets to avoid.

0:09:09 > 0:09:12If a neighbour gets burgled, you can see -

0:09:12 > 0:09:16is it a one-off or has there been a spike in local crime?

0:09:16 > 0:09:19If you commute through a neighbourhood and you're worried

0:09:19 > 0:09:23about its safety, the fact that we have the ability to turn off all

0:09:23 > 0:09:25the night-time and middle-of-the-day crimes

0:09:25 > 0:09:28and show you just the things that are happening during the commute,

0:09:28 > 0:09:32it is a statistical operation. But I think to people that are interacting with the thing

0:09:32 > 0:09:38it feels very much more like they're just sort of browsing a website or shopping on Amazon.

0:09:38 > 0:09:43They're looking at data and they don't realise they're doing statistics.

0:09:43 > 0:09:47What's most exciting for me is that public statistics

0:09:47 > 0:09:52is making citizens more powerful and the authorities more accountable.

0:10:02 > 0:10:04We have community meetings that the police attend

0:10:04 > 0:10:08and what citizens are now doing are bringing printouts

0:10:08 > 0:10:12of the maps that show where crimes are taking place,

0:10:12 > 0:10:16and they're demanding services from the police department

0:10:16 > 0:10:20and the police department is now having to change how they police,

0:10:20 > 0:10:22how they provide policing services,

0:10:22 > 0:10:27because the data is showing what is working and what is not.

0:10:28 > 0:10:31People in San Francisco are also using public data

0:10:31 > 0:10:35to map social inequalities and see how to improve society.

0:10:35 > 0:10:39And the possibilities are endless.

0:10:39 > 0:10:43I think our dream government data analysis project

0:10:43 > 0:10:46would really be focused on live information,

0:10:46 > 0:10:51on stuff that was being reported and pushed out to the world over the internet as it was happening.

0:10:51 > 0:10:55You know, trash pickups, traffic accidents, buses,

0:10:55 > 0:10:57and I think through the kind of stats-gathering power

0:10:57 > 0:11:02of the internet it's possible to really begin to see the workings of the city

0:11:02 > 0:11:04displayed as a unified interface.

0:11:07 > 0:11:09So that's where we are heading.

0:11:09 > 0:11:14Towards a world of free data with all the statistical insights that come from it,

0:11:14 > 0:11:21accessible to everyone, empowering us as citizens and letting us hold our rulers to account.

0:11:21 > 0:11:26It's a long way from where statistics began.

0:11:26 > 0:11:32Statistics are essential to us to monitor our governments and our societies.

0:11:32 > 0:11:36But it was our rulers up there who started

0:11:36 > 0:11:40the collection of statistics in the first place in order to monitor us!

0:11:46 > 0:11:51In fact the word 'statistics' comes from 'the state'.

0:11:51 > 0:11:55Modern statistics began two centuries ago.

0:11:55 > 0:11:59Once it got going, it spread and never stopped.

0:11:59 > 0:12:01And guess who was first!

0:12:03 > 0:12:07The Chinese have Confucius, the Italians have da Vinci,

0:12:07 > 0:12:10and the British have Shakespeare.

0:12:10 > 0:12:12And we have the Tabellverket -

0:12:12 > 0:12:16the first ever systematic collection of statistics!

0:12:16 > 0:12:21Since the year 1749 we have collected data

0:12:21 > 0:12:26on every birth, marriage and death, and we are proud of it!

0:12:29 > 0:12:32The Tabellverket recorded information

0:12:32 > 0:12:34from every parish in Sweden.

0:12:34 > 0:12:39It was a huge quantity of data and it was the first time any government

0:12:39 > 0:12:41could get an accurate picture of its people.

0:12:49 > 0:12:53Sweden had been the greatest military power in Northern Europe,

0:12:53 > 0:12:58but by 1749 our star was really fading

0:12:58 > 0:13:00and other countries were growing stronger.

0:13:00 > 0:13:03At least we were a large power,

0:13:03 > 0:13:09thought to have 20 million people, enough to rival Britain and France.

0:13:13 > 0:13:18But we were in for a nasty surprise.

0:13:18 > 0:13:20The first analysis of the Tabellverket

0:13:20 > 0:13:24revealed that Sweden only had two million inhabitants.

0:13:24 > 0:13:30Sweden was not just a power in decline, it also had a very small population.

0:13:30 > 0:13:36The government was horrified by this finding - what if the enemy found out?

0:13:37 > 0:13:44But the Tabellverket also showed that many women died in childbirth and many children died young.

0:13:44 > 0:13:48So government took action to improve the health of the people.

0:13:48 > 0:13:52This was the beginning of modern Sweden.

0:13:53 > 0:13:59It took more than 50 years before the Austrians, Belgians, Danes,

0:13:59 > 0:14:02Dutch, French, Germans, Italians

0:14:02 > 0:14:08and, finally, the British, caught up with Sweden in collecting and using statistics.

0:14:24 > 0:14:29It was called political arithmetic. It was a lovely phrase that was used for statistics.

0:14:29 > 0:14:33Governments could have much more control and understanding of

0:14:33 > 0:14:36the society - how it was working, how it was developing

0:14:36 > 0:14:40and essentially so they could control it better.

0:14:43 > 0:14:47It wasn't just governments who woke up to the power of statistics.

0:14:47 > 0:14:54Right across Europe, 19th century society went mad for facts.

0:14:54 > 0:14:57And, despite its late start, Britain,

0:14:57 > 0:15:01with its Royal Statistical Society in London,

0:15:01 > 0:15:04was soon a statisticians' nirvana.

0:15:05 > 0:15:09I love looking at old copies of the Royal Statistical Society journal

0:15:09 > 0:15:11because it's full of such odd stuff.

0:15:11 > 0:15:14There's a wonderful paper from the 1840s

0:15:14 > 0:15:19which shows a map of England and the rates of bastardy in each county.

0:15:19 > 0:15:23So you can identify very quickly the areas with high rates of bastardy.

0:15:23 > 0:15:27Being in East Anglia it always makes me slightly laugh that Norfolk

0:15:27 > 0:15:30seems to top the "bastardy league" in the 1840s.

0:15:30 > 0:15:36One of the founders of the Royal Statistical Society

0:15:36 > 0:15:42was the great Victorian mathematician and inventor Charles Babbage.

0:15:42 > 0:15:50In 1842 he read the latest poem by an equally great Victorian, Alfred Tennyson.

0:15:50 > 0:15:53Vision of Sin contained the lines:

0:15:53 > 0:15:55"Fill the cup, and fill the can

0:15:55 > 0:15:58"Have a rouse before the morn

0:15:58 > 0:16:03"Every moment dies a man Every moment one is born."

0:16:03 > 0:16:07So keen a statistician was Babbage that he could not contain himself.

0:16:07 > 0:16:09He dashed off a letter to Tennyson

0:16:09 > 0:16:12explaining that because of population growth,

0:16:12 > 0:16:13the line should read,

0:16:13 > 0:16:18"Every moment dies a man and one and a 16th is born."

0:16:18 > 0:16:22I may add that the exact figure is 1.067,

0:16:22 > 0:16:27but something must be conceded to the laws of metre.

0:16:31 > 0:16:36In the 19th century, scholars all over Europe did amazing work

0:16:36 > 0:16:39in measuring their societies.

0:16:39 > 0:16:42They were hoovering up data on almost everything.

0:16:42 > 0:16:46But numbers alone don't tell you anything.

0:16:46 > 0:16:51You have to analyse them, and that's what makes statistics.

0:16:55 > 0:16:59When the first statisticians began to get to grips with

0:16:59 > 0:17:00analysing their data

0:17:00 > 0:17:05they seized upon the average, and they took the average of everything.

0:17:09 > 0:17:13What's so great about an average is that

0:17:13 > 0:17:18you can take a whole mass of data and reduce it to a single number.

0:17:21 > 0:17:26And though each of us is unique, our collective lives produce

0:17:26 > 0:17:29averages that can characterise whole populations.

0:17:41 > 0:17:45I looked in my local newspaper one week and saw a pensioner

0:17:45 > 0:17:49had accidentally put her foot on the accelerator

0:17:49 > 0:17:52and crushed her friend against a wall.

0:17:52 > 0:17:56Devastating, hideous, horrible thing to happen.

0:17:56 > 0:18:01And then there was a second one about a young man who didn't have

0:18:01 > 0:18:07a driving licence, was driving a car under the influence of drugs and alcohol

0:18:07 > 0:18:10and he bashed into a pedestrian and killed him.

0:18:10 > 0:18:15What's remarkable, absolutely remarkable, if you look at the number

0:18:15 > 0:18:22of people who die each year in traffic crashes, it's nearly a constant.

0:18:22 > 0:18:24What?

0:18:24 > 0:18:31All these individual events, somehow when you sum them all up there's the same number every year.

0:18:31 > 0:18:35And every year, two and a half times as many men

0:18:35 > 0:18:38die in traffic crashes as women, and it's a constant.

0:18:38 > 0:18:44And every year the rate in Belgium is double the rate in England.

0:18:44 > 0:18:47There are these remarkable regularities.

0:18:47 > 0:18:54So that these individual particular events sum up into a social phenomenon.

0:18:56 > 0:18:58Let's see what Sweden have done.

0:18:58 > 0:19:01We used to boast about fast social progress, that's where we were....

0:19:01 > 0:19:05'In my lectures, to tell stories about the changing world,

0:19:05 > 0:19:08'I use the averages from entire countries,

0:19:08 > 0:19:12'whether the average of income, child mortality, family size

0:19:12 > 0:19:13'or carbon output.'

0:19:13 > 0:19:16OK, I give you Singapore. The year I was born,

0:19:16 > 0:19:20Singapore had twice the child mortality of Sweden, the most tropical country in the world,

0:19:20 > 0:19:22a marshland on the Equator, and here we go.

0:19:22 > 0:19:25It took a little time for them to get independent,

0:19:25 > 0:19:27but then they started to grow their economy,

0:19:27 > 0:19:29and they made the social investment, they got away malaria,

0:19:29 > 0:19:33they got a magnificent health system that beat both US and Sweden.

0:19:33 > 0:19:37We never thought it would happen that they would win over Sweden!

0:19:37 > 0:19:40LAUGHTER AND APPLAUSE

0:19:40 > 0:19:46But useful as averages are, they don't tell you the whole story.

0:19:48 > 0:19:53On average, Swedish people have slightly less than two legs.

0:19:53 > 0:19:57This is because few people only have one leg or no legs,

0:19:57 > 0:19:59and no-one has three legs.

0:19:59 > 0:20:06So almost everybody in Sweden has more than the average number of legs.

0:20:06 > 0:20:10The variation in data is just as important as the average.

0:20:16 > 0:20:19But how do you get a handle on variation?

0:20:19 > 0:20:23For this, you transform numbers into shapes.

0:20:23 > 0:20:26Let's look again at the number of adult women in Sweden

0:20:26 > 0:20:27for different heights.

0:20:27 > 0:20:31Plotting the data as a shape shows how much their heights

0:20:31 > 0:20:36vary from the average and how wide that variation is.

0:20:36 > 0:20:41The shape a set of data makes is called its distribution.

0:20:41 > 0:20:46This is the income distribution of China, 1970.

0:20:46 > 0:20:51This is the income distribution of the United States, 1970.

0:20:51 > 0:20:54Almost no overlap, and what has happened?

0:20:54 > 0:20:56China is growing, it's not so equal any longer,

0:20:56 > 0:21:01and it's appearing here overlooking the United States.

0:21:01 > 0:21:03Almost like a ghost, isn't it?

0:21:03 > 0:21:05It's pretty scary.

0:21:05 > 0:21:06Rrrr!

0:21:06 > 0:21:08LAUGHTER

0:21:17 > 0:21:21The statisticians who first explored distribution

0:21:21 > 0:21:25discovered one shape that turned up again and again.

0:21:25 > 0:21:28The Victorian scholar Francis Galton

0:21:28 > 0:21:32was so fascinated he built a machine that could reproduce it,

0:21:32 > 0:21:36and he found it fitted so many different sets of measurements

0:21:36 > 0:21:38that he named it the normal distribution.

0:21:38 > 0:21:45Whether it was people's arm spans, lung capacities,

0:21:45 > 0:21:47or even their exam results,

0:21:47 > 0:21:51the normal distribution shape recurred time and time again.

0:21:51 > 0:21:56Other statisticians soon found many other regular shapes,

0:21:56 > 0:22:01each produced by particular kinds of natural or social processes.

0:22:01 > 0:22:05And every statistician has their favourite.

0:22:05 > 0:22:09The Poisson distribution, the Poisson shape is my favourite distribution.

0:22:09 > 0:22:11I think it's an absolute cracker.

0:22:15 > 0:22:18The Poisson shape describes how likely it is

0:22:18 > 0:22:21that out-of-the-ordinary things will happen.

0:22:21 > 0:22:24Imagine a London bus stop where we know that on average

0:22:24 > 0:22:26we'll get three buses in an hour.

0:22:26 > 0:22:29We won't always get three buses, of course.

0:22:29 > 0:22:33Amazingly, the Poisson shape will show us the probability

0:22:33 > 0:22:37that in any given hour we will get four, five, or six buses,

0:22:37 > 0:22:39or no buses at all.

0:22:40 > 0:22:43The exact shape changes with the average.

0:22:43 > 0:22:46But whether it's how many people will win the lottery jackpot

0:22:46 > 0:22:48each week,

0:22:48 > 0:22:51or how many people will phone a call centre each minute,

0:22:51 > 0:22:54the Poisson shape will give the probabilities.

0:22:57 > 0:23:01The wonderful example where this was applied to in the late 19th century

0:23:01 > 0:23:04was to count each year the number of Prussian officers,

0:23:04 > 0:23:07cavalry officers, who were kicked to death by their horses.

0:23:07 > 0:23:10Now, some years there were none, some years there were one,

0:23:10 > 0:23:13some years there were two, up to seven, I think, one particularly bad year.

0:23:13 > 0:23:16But with this distribution, however many years there were

0:23:16 > 0:23:19with nought, one, two, three, four Prussian cavalry officers

0:23:19 > 0:23:23kicked to death by their horses, beautifully obeyed the Poisson distribution.

0:23:42 > 0:23:48So statisticians use shapes to reveal the patterns in the data.

0:23:48 > 0:23:51But we also use images of all kinds

0:23:51 > 0:23:54to communicate statistics to a wider public.

0:23:54 > 0:23:57Because if the story in the numbers

0:23:57 > 0:24:02is told by a beautiful and clever image, then everyone understands.

0:24:02 > 0:24:09Of the pioneers of statistical graphics, my favourite is Florence Nightingale.

0:24:24 > 0:24:27There are not many people who realise that she was known

0:24:27 > 0:24:30as a passionate statistician and not just the Lady of the Lamp.

0:24:30 > 0:24:34She said that "to understand God's thoughts, we must study statistics,

0:24:34 > 0:24:37"for these are the measure of His purpose."

0:24:37 > 0:24:40Statistics was for her a religious duty and moral imperative.

0:24:42 > 0:24:45When Florence was nine years old she started collecting data.

0:24:45 > 0:24:48Her data was different fruits and vegetables she found.

0:24:48 > 0:24:50Put them into different tables.

0:24:50 > 0:24:52Trying to organise them in some standard form.

0:24:52 > 0:24:55And so we have one of Nightingale's first statistical tables

0:24:55 > 0:24:57at the age of nine.

0:25:04 > 0:25:11In the mid 1850s Florence Nightingale went to the Crimea to care for British casualties of war.

0:25:11 > 0:25:14She was horrified by what she discovered.

0:25:14 > 0:25:19For all the soldiers being blown to bits on the battlefield, there were many, many more soldiers

0:25:19 > 0:25:25dying from diseases they caught in the army's filthy hospitals.

0:25:25 > 0:25:29So Florence Nightingale began counting the dead.

0:25:29 > 0:25:34For two years she recorded mortality data in meticulous detail.

0:25:34 > 0:25:39When the war was over she persuaded the government to set up

0:25:39 > 0:25:41a Royal Commission of Inquiry,

0:25:41 > 0:25:44and gathered her data in a devastating report.

0:25:44 > 0:25:48What has cemented her place in the statistical history books

0:25:48 > 0:25:50are the graphics she used.

0:25:50 > 0:25:53And one in particular, the polar area graph.

0:25:53 > 0:25:58For each month of the war, a huge blue wedge represented

0:25:58 > 0:26:02the soldiers who had died from preventable diseases.

0:26:02 > 0:26:05The much smaller red wedges were deaths from wounds,

0:26:05 > 0:26:10and the black wedges were deaths from accidents and other causes.

0:26:10 > 0:26:17Nightingale's graphics were so clear they were impossible to ignore.

0:26:17 > 0:26:19The usual thing around Florence Nightingale's time

0:26:19 > 0:26:23was just to produce tables and tables of figures - absolutely really tedious stuff that,

0:26:23 > 0:26:26unless you're an absolutely dedicated statistician,

0:26:26 > 0:26:29it's really quite difficult to spot the patterns quite naturally.

0:26:29 > 0:26:33But visualisations, they tell a story, they tell a story immediately.

0:26:33 > 0:26:38And the use of colour and the use of shape can really tell a powerful story.

0:26:38 > 0:26:41And nowadays of course we can make things move as well.

0:26:41 > 0:26:44Florence Nightingale would have loved to have played with...

0:26:44 > 0:26:48She would have produced wonderful animations, I'm absolutely certain of it.

0:26:50 > 0:26:54Today, 150 years on, Nightingale's graphics

0:26:54 > 0:26:57are rightly regarded as a classic.

0:26:57 > 0:27:00They led to a revolution in nursing, health care

0:27:00 > 0:27:05and hygiene in hospitals worldwide, which saved innumerable lives.

0:27:07 > 0:27:11And statistical graphics has become an art form of its very own,

0:27:11 > 0:27:16led by designers who are passionate about visualising data.

0:27:24 > 0:27:27This is the Billion Pound-O-Gram.

0:27:27 > 0:27:29This image arose out of frustration

0:27:29 > 0:27:32with the reporting of billion pound amounts in the media.

0:27:32 > 0:27:34£500 billion pounds for this war.

0:27:34 > 0:27:36£50 billion for this oil spill.

0:27:36 > 0:27:39It doesn't make sense - the numbers are too enormous to get your mind round.

0:27:39 > 0:27:43So I scraped all this data from various news sources and created this diagram.

0:27:43 > 0:27:48So the squares here are scaled according to the billion pound amounts.

0:27:48 > 0:27:51When you see numbers visualised like this

0:27:51 > 0:27:54you start to have a different relationship with them.

0:27:54 > 0:27:56You can start to see the patterns, and the scale of them.

0:27:56 > 0:27:59Here in the corner, this little square - £37 billion.

0:27:59 > 0:28:02This was the predicted cost of the Iraq war in 2003.

0:28:02 > 0:28:06As you can see it's grown exponentially over the last few years

0:28:06 > 0:28:10and the total cost now is around about £2,500 billion.

0:28:10 > 0:28:13It's funny because when you visualise statistics

0:28:13 > 0:28:15you understand them, and when you understand them

0:28:15 > 0:28:18you can really start to put things in perspective.

0:28:23 > 0:28:27Visualisation is right at the heart of my own work too.

0:28:27 > 0:28:30I teach global health.

0:28:30 > 0:28:33And I know having the data is not enough -

0:28:33 > 0:28:39I have to show it in ways people both enjoy and understand.

0:28:39 > 0:28:42Now I'm going to try something I've never done before.

0:28:42 > 0:28:45Animating the data in real space,

0:28:45 > 0:28:50with a bit of technical assistance from the crew.

0:28:50 > 0:28:52So here we go.

0:28:52 > 0:28:54First, an axis for health.

0:28:54 > 0:28:58Life expectancy from 25 years to 75 years.

0:28:58 > 0:29:01And down here an axis for wealth.

0:29:01 > 0:29:06Income per person - 400, 4,000, 40,000.

0:29:06 > 0:29:10So down here is poor and sick.

0:29:10 > 0:29:14And up here is rich and healthy.

0:29:14 > 0:29:18Now I'm going to show you the world

0:29:18 > 0:29:21200 years ago, in 1810.

0:29:21 > 0:29:22Here come all the countries.

0:29:22 > 0:29:26Europe, brown; Asia, red; Middle East, green;

0:29:26 > 0:29:29Africa south of the Sahara, blue; and the Americas, yellow.

0:29:29 > 0:29:33And the size of the country bubble shows the size of the population.

0:29:33 > 0:29:37In 1810, it was pretty crowded down there, wasn't it?

0:29:37 > 0:29:39All countries were sick and poor.

0:29:39 > 0:29:43Life expectancy was below 40 in all countries.

0:29:43 > 0:29:48And only UK and the Netherlands were slightly better off. But not much.

0:29:48 > 0:29:52And now I start the world.

0:29:52 > 0:29:56The industrial revolution makes countries in Europe and elsewhere

0:29:56 > 0:29:59move away from the rest.

0:29:59 > 0:30:02But the colonized countries in Asia and Africa,

0:30:02 > 0:30:04they are stuck down there.

0:30:04 > 0:30:08And eventually the Western countries get healthier and healthier.

0:30:08 > 0:30:13And now we slow down to show the impact of the First World War

0:30:13 > 0:30:15and the Spanish flu epidemic.

0:30:15 > 0:30:18What a catastrophe!

0:30:18 > 0:30:22And now I speed up through the 1920s and the 1930s and,

0:30:22 > 0:30:24in spite of the Great Depression,

0:30:24 > 0:30:27Western countries forge on towards greater wealth and health.

0:30:27 > 0:30:29Japan and some others try to follow.

0:30:29 > 0:30:32But most countries stay down here.

0:30:32 > 0:30:35And after the tragedies of the Second World War,

0:30:35 > 0:30:39we stop a bit to look at the world in 1948.

0:30:39 > 0:30:421948 was a great year.

0:30:42 > 0:30:43The war was over,

0:30:43 > 0:30:48Sweden topped the medal table at the Winter Olympics and I was born.

0:30:48 > 0:30:51But the differences between the countries of the world

0:30:51 > 0:30:52was wider than ever.

0:30:52 > 0:30:54United States was in the front.

0:30:54 > 0:30:56Japan was catching up.

0:30:56 > 0:30:58Brazil was way behind,

0:30:58 > 0:31:03Iran was getting a little richer from oil but still had short lives.

0:31:03 > 0:31:05And the Asian giants...

0:31:05 > 0:31:08China, India, Pakistan, Bangladesh, and Indonesia,

0:31:08 > 0:31:11they were still poor and sick down here.

0:31:11 > 0:31:14But look what was about to happen! Here we go again.

0:31:14 > 0:31:18In my lifetime, former colonies gained independence and then finally

0:31:18 > 0:31:22they started to get healthier and healthier and healthier.

0:31:22 > 0:31:26And in the 1970s, then countries in Asia and Latin America

0:31:26 > 0:31:28started to catch up with the Western countries.

0:31:28 > 0:31:31They became the emerging economies.

0:31:31 > 0:31:32Some in Africa follows,

0:31:32 > 0:31:36some Africans were stuck in civil war, and others were hit by HIV.

0:31:36 > 0:31:41And now we can see the world in the most up-to-date statistics.

0:31:42 > 0:31:45Most people today live in the middle.

0:31:45 > 0:31:48But there is huge difference at the same time

0:31:48 > 0:31:51between the best-off countries and the worst-off countries.

0:31:51 > 0:31:54And there are also huge inequalities within countries.

0:31:54 > 0:31:59These bubbles show country averages but I can split them.

0:31:59 > 0:32:02Take China. I can split it into provinces.

0:32:02 > 0:32:05There goes Shanghai...

0:32:05 > 0:32:08It has the same health and wealth as Italy today.

0:32:08 > 0:32:11And there is the poor inland province Guizhou,

0:32:11 > 0:32:12it is like Pakistan.

0:32:12 > 0:32:18And if I split it further, the rural parts are like Ghana in Africa.

0:32:19 > 0:32:23And yet, despite the enormous disparities today,

0:32:23 > 0:32:27we have seen 200 years of remarkable progress!

0:32:27 > 0:32:31That huge historical gap between the west and the rest is now closing.

0:32:31 > 0:32:35We have become an entirely new, converging world.

0:32:35 > 0:32:37And I see a clear trend into the future.

0:32:37 > 0:32:40With aid, trade, green technology and peace,

0:32:40 > 0:32:43it's fully possible that everyone can make it

0:32:43 > 0:32:45to the healthy, wealthy corner.

0:32:48 > 0:32:51Well, what you've just seen in the last few minutes

0:32:51 > 0:32:56is a story of 200 countries shown over 200 years and beyond.

0:32:56 > 0:33:00It involved plotting 120,000 numbers.

0:33:00 > 0:33:02Pretty neat, huh?

0:33:07 > 0:33:13So, with statistics, we can begin to see things as they really are.

0:33:13 > 0:33:18From tables of data to averages, distributions and visualisations,

0:33:18 > 0:33:22statistics gives us a clear description of the world.

0:33:22 > 0:33:28But, with statistics, we can not only discover WHAT is happening

0:33:28 > 0:33:30but also explore WHY,

0:33:30 > 0:33:34by using the powerful analytical method - correlation.

0:33:35 > 0:33:38Just looking at one thing at a time doesn't tell you very much.

0:33:38 > 0:33:41You've got to look at the relationships between things,

0:33:41 > 0:33:43how they change, how they vary together.

0:33:43 > 0:33:45That's what correlation is about.

0:33:45 > 0:33:48That's how you start trying to understand the processes

0:33:48 > 0:33:50that are really going on in the world and society.

0:33:52 > 0:33:57Most of us today would recognise that crime correlates to poverty,

0:33:57 > 0:34:00that infection correlates to poor sanitation,

0:34:00 > 0:34:02and that knowledge of statistics correlates

0:34:02 > 0:34:05to being great at dancing!

0:34:06 > 0:34:10Correlations can be very tricky.

0:34:10 > 0:34:12I got a joke about silly correlations.

0:34:12 > 0:34:15There was this American who was afraid of heart attack.

0:34:15 > 0:34:19He found out that the Japanese ate very little fat

0:34:19 > 0:34:22and almost didn't drink wine,

0:34:22 > 0:34:25but they had much less heart attacks than the Americans.

0:34:25 > 0:34:28But, on the other hand, he also found out that the French

0:34:28 > 0:34:35eat as much fat as the Americans and they drink much more wine but they also have less heart attacks.

0:34:35 > 0:34:40So he concluded that what kills you is speaking English.

0:34:40 > 0:34:43# Smoke, smoke, smoke that cigarette

0:34:43 > 0:34:48# Puff, puff, puff and if you smoke yourself to death... #

0:34:48 > 0:34:51The time, the pace, the cigarette. Weights Tilt.

0:34:51 > 0:34:56The best example of a really ground-breaking correlation

0:34:56 > 0:35:01is the link that was established in the 1950s between smoking and lung cancer.

0:35:01 > 0:35:07Not long after the Second World War, a British doctor, Richard Doll,

0:35:07 > 0:35:11investigated lung cancer patients in 20 London hospitals.

0:35:11 > 0:35:15And he became certain that the only thing they had in common was smoking.

0:35:15 > 0:35:18So certain, that he stopped smoking himself.

0:35:18 > 0:35:22But other people weren't so sure.

0:35:22 > 0:35:25A lot of the discussion of the early data,

0:35:25 > 0:35:29linking smoking to lung cancer, said, "It's not the smoking, surely,

0:35:29 > 0:35:32"that thing we've done all our lives, that can't be bad for you.

0:35:32 > 0:35:35"Maybe it's genes.

0:35:35 > 0:35:39"Maybe people who are genetically predisposed to get lung cancer

0:35:39 > 0:35:43"are also genetically predisposed to smoke."

0:35:43 > 0:35:47"Maybe it's not the smoking, maybe it's air pollution -

0:35:47 > 0:35:52"that smokers are somehow more exposed to air pollution than non-smokers.

0:35:52 > 0:35:56"Maybe it's not smoking, maybe it's poverty."

0:35:56 > 0:36:00So now we've got three alternative explanations, apart from chance.

0:36:02 > 0:36:06To verify his correlation did imply cause and effect.

0:36:06 > 0:36:10Richard Doll created the biggest statistical study of smoking yet.

0:36:10 > 0:36:14He began tracking the lives of 40,000 British doctors,

0:36:14 > 0:36:17some of whom smoked and some of whom didn't,

0:36:17 > 0:36:19and gathered enough data

0:36:19 > 0:36:22to correlate the amount the doctors smoked

0:36:22 > 0:36:24with their likelihood of getting cancer.

0:36:24 > 0:36:30Eventually, he not only showed a correlation between smoking and lung cancer,

0:36:30 > 0:36:35but also a correlation between stopping smoking and reducing the risk.

0:36:35 > 0:36:37This was science at its best.

0:36:39 > 0:36:44What correlations do not replace is human thought.

0:36:44 > 0:36:46You've got to think about what it means.

0:36:46 > 0:36:50What a good scientist does, if he comes with a correlation,

0:36:50 > 0:36:55is try as hard as she or he possibly can to disprove it,

0:36:55 > 0:37:00to break it down, to get rid of it, to try and refute it.

0:37:00 > 0:37:05And if it withstands all those efforts at demolishing it

0:37:05 > 0:37:10and it is still standing up then, cautiously, you say, "We really might have something here."

0:37:26 > 0:37:32However brilliant the scientist, data is still the oxygen of science.

0:37:32 > 0:37:39The good news is that the more we have, the more correlations we'll find, the more theories we'll test,

0:37:39 > 0:37:42and the more discoveries we're likely to make.

0:37:46 > 0:37:53And history shows how our total sum of information grows in huge leaps as we develop new technologies.

0:37:53 > 0:38:00The invention of the printing press kicked off the first data and information explosion.

0:38:00 > 0:38:06If you piled up all the books that had been printed by the year 1700,

0:38:06 > 0:38:11they would make 60 stacks each as high as Mount Everest.

0:38:12 > 0:38:15Then, starting in the 19th century,

0:38:15 > 0:38:19there came a second information revolution with the telegraph,

0:38:19 > 0:38:23gramophone and camera. And later radio and TV.

0:38:23 > 0:38:28The total amount of information exploded.

0:38:28 > 0:38:35And by the 1950s the information available to us all had multiplied 6,000 times.

0:38:35 > 0:38:41Then, thanks to the computer and later the internet, we went digital.

0:38:41 > 0:38:47And the amount of data we have now is unimaginably vast.

0:38:49 > 0:38:55A single letter printed in a book is equivalent to a byte of data.

0:38:55 > 0:38:58A printed page equals a kilobyte or two.

0:39:01 > 0:39:06Five megabytes is enough for the complete works of Shakespeare.

0:39:08 > 0:39:1110 gigabytes - that's a DVD movie.

0:39:16 > 0:39:23Two terabytes is the tens of millions of photos added to Facebook every day.

0:39:24 > 0:39:32Ten petabytes is the data recorded every second by the world's largest particle accelerator.

0:39:32 > 0:39:35So much only a tiny fraction is kept.

0:39:35 > 0:39:43Six exabytes is what you'd have if you sequenced the genomes of every single person on Earth.

0:39:48 > 0:39:50But really, that's nothing.

0:39:50 > 0:39:55In 2009, the internet added up to 500 exabytes.

0:39:55 > 0:40:02In 2010, in just one year, that will double to more than one zettabyte!

0:40:06 > 0:40:14Back in the real world, if we turned all this data into print it would make 90 stacks of books,

0:40:14 > 0:40:18each reaching from here all the way to the sun!

0:40:18 > 0:40:23The data deluge is staggering, but, with today's computers

0:40:23 > 0:40:28and statistics, I'm confident we can handle it.

0:40:28 > 0:40:31When it comes to all the data on the internet,

0:40:31 > 0:40:33the powerhouse of statistical analysis

0:40:33 > 0:40:37is the Silicon Valley giant Google.

0:40:44 > 0:40:50The average person over their lifetime is exposed to about 100 million words of conversation.

0:40:50 > 0:40:54And so if you multiple that by the six billion people on the planet,

0:40:54 > 0:40:58that amount of words is about equal to the number of words

0:40:58 > 0:41:01that Google has available at any one instant in time.

0:41:03 > 0:41:08Google's computers hoover up and file away every document, web page, and image they can find.

0:41:08 > 0:41:14They then hunt for patterns and correlations in all this data,

0:41:14 > 0:41:17doing statistics on a massive scale.

0:41:17 > 0:41:25And, for me, Google has one project that's particularly exciting - statistical language translation.

0:41:25 > 0:41:30We wanted to provide access to all the web's information, no matter what language you spoke.

0:41:30 > 0:41:33There's just so much information on the internet,

0:41:33 > 0:41:37you couldn't hope to translate it all by hand into every possible language.

0:41:37 > 0:41:41We figured we'd have to be able to do machine translation.

0:41:44 > 0:41:47In the past, programmers tried to teach their computers

0:41:47 > 0:41:53to see each language as a set of grammatical rules - much like the way languages are taught at school.

0:41:53 > 0:41:58But this didn't work because no set of rules could capture a language

0:41:58 > 0:42:01in all its subtlety and ambiguity.

0:42:01 > 0:42:05"Having eaten our lunch the coach departed."

0:42:05 > 0:42:07Well, that's obviously incorrect.

0:42:07 > 0:42:12Written like that it would imply that the coach has eaten the lunch.

0:42:12 > 0:42:15It would be far better to say...

0:42:15 > 0:42:19"having eaten our lunch we departed in the coach."

0:42:19 > 0:42:26Those rules are helpful and they are useful most of time, but they don't turn out to be true all the time.

0:42:26 > 0:42:30And the insight of using statistical machine translation is saying,

0:42:30 > 0:42:35"If you've got to have all these exceptions anyways, maybe you can get by without having any of the rules.

0:42:35 > 0:42:39"Maybe you can treat everything as an exception." And that's essentially what we've done.

0:42:48 > 0:42:52What the computer is doing when he's learning how to translate

0:42:52 > 0:42:55is to learn correlations between words

0:42:55 > 0:42:57and correlations between phrases.

0:42:57 > 0:43:00So we feed the system very large amounts of data

0:43:00 > 0:43:04and then the system is seeing that a certain word or a certain phrase

0:43:04 > 0:43:07correlates very often to the other language.

0:43:09 > 0:43:15Google's website currently offers translation between any of 57 different languages.

0:43:15 > 0:43:22It does this purely statistically, having correlated a huge collection of multilingual texts.

0:43:22 > 0:43:25The people that built the system don't need to know Chinese

0:43:25 > 0:43:29in order to build the Chinese-to-English system, or they don't need to know Arabic.

0:43:29 > 0:43:33But the expertise that's needed is basically knowledge of statistics,

0:43:33 > 0:43:35knowledge of computer science, knowledge of infrastructure

0:43:35 > 0:43:40to build those very large computational systems that we are building for doing that.

0:43:42 > 0:43:48I hooked up with Google from my office in Stockholm to try the translator for myself.

0:43:48 > 0:43:51'I will type... some Swedish sentences.'

0:43:51 > 0:43:53OK.

0:43:53 > 0:43:55Sveriges...

0:43:55 > 0:43:59..guldring i orat.

0:44:00 > 0:44:07OK. So it says, "Sweden's finance minister has a ponytail and a gold ring in your ear."

0:44:07 > 0:44:11- I guess it probably means in his ear.- 'That's exactly correct, it's amazing!

0:44:11 > 0:44:15'He comes from the Conservative party, that's the kind of Sweden we have today.

0:44:15 > 0:44:18'I will type one more sentence.'

0:44:18 > 0:44:22'I sitt samkonade...'

0:44:22 > 0:44:25partnerskap...

0:44:25 > 0:44:28nya biskop.

0:44:28 > 0:44:35"In his same-sex partnership has Stockholm's new bishop and his partners a three-year son."

0:44:35 > 0:44:38It's almost perfect, there's one important thing -

0:44:38 > 0:44:41it's HER, it's a lesbian partnership.

0:44:41 > 0:44:46OK, so those kinds of words his and her are one of the challenges

0:44:46 > 0:44:49in translation to get really those right.

0:44:49 > 0:44:51Especially when it comes to bishops one can excuse it!

0:44:51 > 0:44:53'Right, right.'

0:44:53 > 0:44:58- I guess more often than not it would probably be a "his". - 'I will write one more sentence.'

0:44:58 > 0:45:01Nar Sverige deltar I olympiader ar malet

0:45:01 > 0:45:03'inte att vinna utan att sla Norge.'

0:45:06 > 0:45:11OK. "When Sweden is taking part in Olympic goal is not to win but to beat Norway."

0:45:11 > 0:45:13'Yes! This is what it is!

0:45:13 > 0:45:17'But they are very good in Winter Olympics, so we can't make it, but we are trying.'

0:45:17 > 0:45:19Ah, very good, very good.

0:45:19 > 0:45:24'This is absolutely amazing, you know, and I was especially impressed

0:45:24 > 0:45:30'that it picks up words like "same-sex partnership" which are very new to the language."

0:45:30 > 0:45:36'The translator is good, but if they succeed with what's next, that'll be remarkable.'

0:45:36 > 0:45:38One of the exciting possibilities

0:45:38 > 0:45:42is combining the machine translation technology with the speech recognition technology.

0:45:42 > 0:45:45Now, both of these are statistical in nature.

0:45:45 > 0:45:51The machine translation relies on the statistics of mapping from one language to another,

0:45:51 > 0:45:57and similarly speech recognition relies on the statistics of mapping from a sound form to the words.

0:45:57 > 0:45:59When we put them together,

0:45:59 > 0:46:03now we have the capability of having instant conversation

0:46:03 > 0:46:06between two people that don't speak a common language.

0:46:06 > 0:46:08I can talk to you in my language,

0:46:08 > 0:46:11you hear me in your language and you can answer back.

0:46:11 > 0:46:15And in real time we can make that translation,

0:46:15 > 0:46:18we can bring two people together and allow them to speak.

0:46:31 > 0:46:39The internet is just one of many technologies created to gather massive amounts of data.

0:46:39 > 0:46:43Scientists studying our earth and our environment

0:46:43 > 0:46:47now use an incredible range of instruments

0:46:47 > 0:46:50to measure the processes of our planet.

0:46:52 > 0:47:00All around us are sensors continuously measuring temperature, water flow, and ocean currents.

0:47:00 > 0:47:06And high in orbit are satellites busy imaging cloud formations, forest growth and snow cover.

0:47:06 > 0:47:11Scientists speak of "instrumenting the earth".

0:47:13 > 0:47:20And pointing up to the skies above are powerful new telescopes mapping the universe.

0:47:30 > 0:47:34What's happening in astronomy is typical of how profoundly

0:47:34 > 0:47:39this new torrent of data is transforming science.

0:47:39 > 0:47:45Astronomers are now addressing many enduring mysteries of the cosmos

0:47:45 > 0:47:49by applying statistical methods to all this new data.

0:47:59 > 0:48:03The galaxy is a very big place and it's got billions of stars in it,

0:48:03 > 0:48:09and so to put together a coherent picture of the whole galaxy requires having an enormous amount of data.

0:48:09 > 0:48:13And before you could do a large sky survey with sensitive, digital detectors

0:48:13 > 0:48:16that meant that you could map many, many stars all at once,

0:48:16 > 0:48:20it was very difficult to build up enough data on enough of the galaxy.

0:48:24 > 0:48:28In the past, large surveys of the night sky had to be done

0:48:28 > 0:48:32by exposing thousands of large photographic plates.

0:48:32 > 0:48:37But these surveys could take 25 years or more to complete.

0:48:39 > 0:48:44Then, in the 1990s, came digital astronomy and a huge increase

0:48:44 > 0:48:49in both the amount and the accessibility of data.

0:48:49 > 0:48:55The Sloan Sky Survey is the world's biggest yet, using a massive digital sensor

0:48:55 > 0:49:00mounted on the back of a custom-built telescope in New Mexico.

0:49:00 > 0:49:05It's scanned the sky night after night for eight years,

0:49:05 > 0:49:09building up a composite picture in unprecedented resolution.

0:49:09 > 0:49:14The Sloan is some of the best, deepest survey data that we have in astronomy.

0:49:14 > 0:49:18Both on our own galaxy and on galaxies further away from ours.

0:49:24 > 0:49:27All the Sloan data is on the internet,

0:49:27 > 0:49:34and with it astronomers have identified millions of hitherto unknown stars and galaxies.

0:49:34 > 0:49:37They also comb the database for statistical patterns

0:49:37 > 0:49:42which will prove, disprove, or even suggest new theories.

0:49:42 > 0:49:49So we have this idea that galaxies grow, they become large galaxies like the one we live in, the milky way,

0:49:49 > 0:49:55not all at once, or not smoothly, but by continuously incorporating,

0:49:55 > 0:49:59basically cannibalising, smaller galaxies.

0:49:59 > 0:50:04They dissolve them and they become part of the bigger galaxy as it grows.

0:50:06 > 0:50:12It's a startling idea, and, in the Sloan data, is the evidence to support it.

0:50:12 > 0:50:16Groups of stars that came from cannibalised galaxies

0:50:16 > 0:50:21stand out in the Sloan data as statistically different from other stars

0:50:21 > 0:50:24because they move at a different velocity.

0:50:24 > 0:50:28Each big spike on one of these distribution graphs

0:50:28 > 0:50:35means Professor Rockosi has found a group of stars all travelling in a different way to the rest.

0:50:35 > 0:50:38They are the telltale patterns she's looking for.

0:50:40 > 0:50:44The evidence is accumulating that, in fact, this really is how galaxies grow,

0:50:44 > 0:50:47or an important way in which how galaxies grow.

0:50:47 > 0:50:53And so this is an important part of understanding how galaxies form, not only ours but every galaxy.

0:50:56 > 0:51:00The more data there is, the more discoveries can be made.

0:51:00 > 0:51:03And the technology is getting better all the time.

0:51:03 > 0:51:07The next big survey telescope starts its work in 2015.

0:51:07 > 0:51:10It will leave Sloan in the dust!

0:51:10 > 0:51:16Sloan has taken eight years to cover one quarter of the night sky.

0:51:17 > 0:51:25The new telescope will scan the entire sky, in even greater resolution, every three days!

0:51:34 > 0:51:41The vast amounts of data we have today allows researchers in all sorts of fields

0:51:41 > 0:51:46to test their theories on a previously unimaginable scale.

0:51:46 > 0:51:53But more than this, it may even change the fundamental way science is done.

0:51:53 > 0:51:58With the power of today's computers applied to all this data,

0:51:58 > 0:52:03the machines might even be able to guide the researchers.

0:52:14 > 0:52:17We're at a potentially profoundly important

0:52:17 > 0:52:22and potentially one of the most significant points in science,

0:52:22 > 0:52:24and certainly one of the most exciting,

0:52:24 > 0:52:32where the potential to transform not just how scientists do science but even what science is possible.

0:52:32 > 0:52:34And what will power that transformation

0:52:34 > 0:52:38of both how science is done and even what science is possible

0:52:38 > 0:52:40is going to be computation.

0:52:41 > 0:52:49Many of the dynamics of the natural world, like the interplay between the rainforests and the atmosphere,

0:52:49 > 0:52:53are so complex that we don't as yet really understand them.

0:52:53 > 0:52:59But now computers are generating literally tens of thousands of different simulations

0:52:59 > 0:53:03of how these biological systems might work.

0:53:03 > 0:53:07It's like creating thousands of hypothetical parallel worlds.

0:53:07 > 0:53:10Each and every one of these simulations

0:53:10 > 0:53:18is analysed with statistics to see if any are a good match for what is observed in nature.

0:53:18 > 0:53:21The computers can now automatically generate,

0:53:21 > 0:53:26test and discard hypotheses with scarcely a human in sight.

0:53:28 > 0:53:35This new application of statistics will become absolutely vital for the future of science.

0:53:35 > 0:53:39It's creating a new paradigm, if you like,

0:53:39 > 0:53:42in science, in the way in which we can do science,

0:53:42 > 0:53:45which is increasingly...

0:53:45 > 0:53:51Which one might characterise as... data-centric or data driven

0:53:51 > 0:53:55rather than being hypothesis-driven or experimentally-driven.

0:53:55 > 0:53:58So, it's exciting times in terms of the science,

0:53:58 > 0:54:02in terms of the computation and in terms of the statistics.

0:54:08 > 0:54:15Now, if all that sounds a bit abstract and theoretical to you, how about one final frontier?

0:54:15 > 0:54:19Could statistics even make sense of your feelings?

0:54:21 > 0:54:25In California - where else? - one computer scientist

0:54:25 > 0:54:32is harvesting the internet to try to divine the patterns of our innermost thoughts and emotions.

0:54:44 > 0:54:46This is the madness movement.

0:54:46 > 0:54:50The madness movement represents a skyscraper view of the world.

0:54:50 > 0:54:54Each of these brightly coloured dots is an individual feeling

0:54:54 > 0:54:58expressed by someone out there in a blog or a tweet.

0:54:58 > 0:55:04And when you click on the dot it explodes to reveal the underlying feeling of that person.

0:55:04 > 0:55:07This is what people say they're feeling today.

0:55:07 > 0:55:10Better...safe...

0:55:10 > 0:55:12crappy...

0:55:12 > 0:55:14well...

0:55:14 > 0:55:18pretty...special...

0:55:18 > 0:55:20sorry...alone...

0:55:25 > 0:55:29So, every minute, We Feel Fine crawls the world's blogs,

0:55:29 > 0:55:34takes all the sentences that start with the words "I feel" or "I am feeling",

0:55:34 > 0:55:35and puts them in a database.

0:55:35 > 0:55:40We collect all the feelings and we count the most common.

0:55:40 > 0:55:43They are better...bad...

0:55:43 > 0:55:45good...right...

0:55:45 > 0:55:48guilty...sick...

0:55:48 > 0:55:51the same...like shit...

0:55:51 > 0:55:54sorry...well...

0:55:54 > 0:55:56and so on.

0:55:58 > 0:56:01And we can take a look at any one feeling and analyse it.

0:56:01 > 0:56:04Right now a lot of people are feeling happy.

0:56:04 > 0:56:11We can take a look at all the people who are happy and break it down by age, gender or location.

0:56:11 > 0:56:16Since bloggers have public profiles we have that information and so we can ask questions like,

0:56:16 > 0:56:21"Are women happier than men?" or, "Is England happier than the United States?"

0:56:30 > 0:56:33We find that, as people get older, they get happier.

0:56:33 > 0:56:40And, moreover, we find that for younger people they associate happiness more with excitement,

0:56:40 > 0:56:47and, as people get older, they associate happiness more with peacefulness.

0:56:51 > 0:56:57And we also find that women feel loved more often than men, but also more guilty.

0:56:57 > 0:57:02While men feel good more often than women, but also more alone.

0:57:06 > 0:57:12As people lead more and more of their lives online, they leave behind digital traces,

0:57:12 > 0:57:19and with these digital traces we can begin to statistically analyse what it means to be human.

0:57:51 > 0:57:54So where does all of this leave us?

0:57:54 > 0:58:00We generate unimaginable quantities of data about everything you can think of.

0:58:00 > 0:58:02We analyse it to reveal the patterns.

0:58:02 > 0:58:10And now not only experts but all of us can understand the stories in the numbers.

0:58:18 > 0:58:21Instead of being led astray by prejudice,

0:58:21 > 0:58:28with statistics at our fingertips, our eyes can be open for a fact-based view of the world.

0:58:28 > 0:58:33So, more than ever before, we can become authors of our own destiny.

0:58:33 > 0:58:36And that's pretty exciting isn't it?!

0:58:37 > 0:58:44# 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20

0:58:44 > 0:58:50# 1, 22, 3, 24, 25, 26, 27, 28, 9, 30, 31, 32, 3, 34, 35, 36, 7

0:58:50 > 0:58:54# 38, 39, 40, 41, 42, 3, 44, 45, 46, 47

0:58:54 > 0:58:58LYRICS DEGENERATE INTO GIBBERISH

0:59:08 > 0:59:13GIBBERISH DEGENERATES INTO NOISE

0:59:13 > 0:59:14# 100. #