
Browse content similar to The Joy of Stats. Check below for episodes and series from the same categories and more!
| Line | From | To | |
|---|---|---|---|
The world we live in is awash with data that comes pouring in from everywhere around us. | 0:00:03 | 0:00:10 | |
On its own this data is just noise and confusion. | 0:00:10 | 0:00:14 | |
To make sense of data, to find the meaning in it, we need the powerful branch of science - statistics. | 0:00:14 | 0:00:22 | |
Believe me there's nothing boring about statistics. | 0:00:22 | 0:00:26 | |
Especially not today when we can make the data sing. | 0:00:26 | 0:00:29 | |
With statistics we can really make sense of the world. | 0:00:29 | 0:00:33 | |
And there's more. | 0:00:33 | 0:00:35 | |
With statistics, the data deluge, as it's being called, is leading us | 0:00:35 | 0:00:40 | |
to an ever greater understanding of life on Earth and the universe beyond. | 0:00:40 | 0:00:46 | |
And thanks to the incredible power of today's computers, | 0:00:46 | 0:00:50 | |
it may fundamentally transform the process of scientific discovery. | 0:00:50 | 0:00:57 | |
I kid you not, statistics is now the sexiest subject around. | 0:00:57 | 0:01:02 | |
Did you know that there is one million boats in Sweden? | 0:01:23 | 0:01:25 | |
That's one boat per nine people! | 0:01:25 | 0:01:27 | |
It's the highest number of boats per person in Europe! | 0:01:27 | 0:01:31 | |
Being a statistician, you don't like telling your profession at dinner parties. | 0:01:41 | 0:01:45 | |
But really, statisticians shouldn't be shy | 0:01:45 | 0:01:48 | |
because everyone wants to understand what's going on. | 0:01:48 | 0:01:51 | |
And statistics gives us a perspective on the world we live in | 0:01:51 | 0:01:56 | |
that we can't get in any other way. | 0:01:56 | 0:01:59 | |
Statistics tells us whether the things we think and believe are actually true. | 0:02:03 | 0:02:09 | |
And statistics are far more useful than we usually like to admit. | 0:02:19 | 0:02:25 | |
In the last recession there was this famous call-in to a talk radio station. | 0:02:25 | 0:02:29 | |
The man complained, "In times like this when unemployment rates are up to 13%, income has fallen by 5%, | 0:02:29 | 0:02:37 | |
"and suicide rates are climbing, and I get so angry that the government | 0:02:37 | 0:02:41 | |
"is wasting money on things like collection of statistics." | 0:02:41 | 0:02:45 | |
I'm not officially a statistician. | 0:02:48 | 0:02:50 | |
Strictly speaking, my field is global health. | 0:02:50 | 0:02:55 | |
But I got really obsessed with stats when I realised how much people | 0:02:58 | 0:03:03 | |
in Sweden just don't know about the rest of the world. | 0:03:03 | 0:03:06 | |
I started in our medical university, Karolinska Institutet, | 0:03:06 | 0:03:10 | |
an undergraduate course called Global Health. | 0:03:10 | 0:03:13 | |
These students coming to us actually have the highest grade you can get | 0:03:13 | 0:03:17 | |
in the Swedish college system, | 0:03:17 | 0:03:18 | |
so I thought, "Maybe they know everything I'm going to teach them." | 0:03:18 | 0:03:22 | |
So I did a pre-test when they came, and one of the questions | 0:03:22 | 0:03:25 | |
from which I learned a lot was this one - | 0:03:25 | 0:03:28 | |
which country has the highest child mortality of these five pairs? | 0:03:28 | 0:03:32 | |
I won't put you at test here, but it is Turkey | 0:03:32 | 0:03:34 | |
which is highest there, Poland, | 0:03:34 | 0:03:37 | |
Russia, Pakistan, and South Africa. | 0:03:37 | 0:03:40 | |
And these were the result of the Swedish students. | 0:03:40 | 0:03:43 | |
A 1.8 right answer out of five possible. | 0:03:43 | 0:03:44 | |
And that means there was a place for a professor of International Health and for my course. | 0:03:44 | 0:03:49 | |
But one late night when I was compiling the report, I really realised my discovery. | 0:03:49 | 0:03:56 | |
I had shown that Swedish top students know statistically | 0:03:56 | 0:04:01 | |
significantly less about the world than the chimpanzees. | 0:04:01 | 0:04:04 | |
Because the chimpanzees would score half right. | 0:04:06 | 0:04:09 | |
If I gave them two bananas with Sri Lanka and Turkey, | 0:04:09 | 0:04:12 | |
they would be right half of the cases, but the students are not there. | 0:04:12 | 0:04:15 | |
I did also an unethical study of the professors of the Karolinska Institutet, | 0:04:15 | 0:04:20 | |
that hands out the Nobel Prize for medicine, and they are on par with the chimpanzees there. | 0:04:20 | 0:04:25 | |
Today there's more information accessible than ever before. | 0:04:28 | 0:04:32 | |
'And I work with my team at the Gapminder Foundation | 0:04:32 | 0:04:35 | |
'using new tools that help everyone make sense of the changing world. | 0:04:35 | 0:04:41 | |
'We draw on the masses of data that's now freely available | 0:04:41 | 0:04:45 | |
'from international institutions like the UN and the World Bank. | 0:04:45 | 0:04:49 | |
'And it's become my mission to share the insights | 0:04:49 | 0:04:53 | |
'from this data with anyone who'll listen, and to reveal how statistics is nothing to be frightened of.' | 0:04:53 | 0:05:00 | |
I'm going to provide you a view of | 0:05:02 | 0:05:05 | |
the global health situation across mankind. | 0:05:05 | 0:05:09 | |
And I'm going to do that in hopefully an enjoyable way, so relax. | 0:05:09 | 0:05:14 | |
So we did this software which displays it like this. | 0:05:14 | 0:05:17 | |
Every bubble here is a country - | 0:05:17 | 0:05:19 | |
this is China, this is India. | 0:05:19 | 0:05:21 | |
The size of the bubble is the population. | 0:05:21 | 0:05:23 | |
I'm going to stage a race between this sort of yellowish Ford here | 0:05:23 | 0:05:27 | |
and the red Toyota down there and the brownish Volvo. | 0:05:27 | 0:05:32 | |
The Toyota has a very bad start down here, and United States, | 0:05:32 | 0:05:36 | |
Ford is going off-road there, | 0:05:36 | 0:05:38 | |
and the Volvo is doing quite fine, this is the war. | 0:05:38 | 0:05:40 | |
The Toyota got off track, now Toyota is on the healthier side of Sweden. | 0:05:40 | 0:05:43 | |
That's about where I sold the Volvo and bought the Toyota. | 0:05:43 | 0:05:46 | |
AUDIENCE LAUGH | 0:05:46 | 0:05:47 | |
This is the great leap forward, when China fell down. | 0:05:47 | 0:05:50 | |
It was the central planning by Mao Zedong. | 0:05:50 | 0:05:53 | |
China recovered and said, "Never more stupid central planning," | 0:05:53 | 0:05:56 | |
but they went up here. | 0:05:56 | 0:05:57 | |
No, there is one more inequity, look there - United States | 0:05:57 | 0:06:02 | |
They broke my frame. Washington DC is so rich over there, | 0:06:02 | 0:06:07 | |
but they are not as healthy as Kerala in India. It's quite interesting, isn't it? | 0:06:07 | 0:06:13 | |
LAUGHTER AND APPLAUSE | 0:06:13 | 0:06:14 | |
Welcome to the USA, world leaders in big cars | 0:06:20 | 0:06:25 | |
and free data. | 0:06:25 | 0:06:28 | |
There are many here who share my vision of making public data accessible and useful for everyone. | 0:06:28 | 0:06:35 | |
The city of San Francisco is in the lead, opening up its data on everything. | 0:06:35 | 0:06:43 | |
Even the police department is releasing all its crime reports. | 0:06:43 | 0:06:47 | |
This official crime data has been turned | 0:06:47 | 0:06:50 | |
into a wonderful interactive map by two of the city's computer whizzes. | 0:06:50 | 0:06:55 | |
It's community statistics in action. | 0:06:55 | 0:06:58 | |
Crimespotting is a map of crime reports from the San Francisco Police Department | 0:07:09 | 0:07:13 | |
showing dots on maps for citizens to be able to see | 0:07:13 | 0:07:16 | |
patterns of crime around their neighbourhoods in San Francisco. | 0:07:16 | 0:07:19 | |
The map is not just about individual crimes but about broader patterns that show you where crime is | 0:07:19 | 0:07:25 | |
clustered around the city, which areas have high crime, | 0:07:25 | 0:07:27 | |
and which areas have relatively low crime. | 0:07:27 | 0:07:30 | |
We're here at the top of Jones Street on Nob Hill... | 0:07:36 | 0:07:41 | |
..quite a nice neighbourhood. | 0:07:42 | 0:07:45 | |
What the crime maps show us is the relationship between | 0:07:45 | 0:07:49 | |
topography and crime. | 0:07:49 | 0:07:51 | |
Basically the higher up the hill, the less crime there is. | 0:07:51 | 0:07:54 | |
You cross over the border | 0:07:56 | 0:07:58 | |
into the flats... | 0:07:58 | 0:08:00 | |
Essentially as soon as you get into the lower lying areas of Jones Street the crime just skyrockets. | 0:08:02 | 0:08:09 | |
We're here in the uptown Tenderloin district. | 0:08:20 | 0:08:24 | |
It's one of the oldest and densest neighbourhoods in San Francisco. | 0:08:26 | 0:08:30 | |
This is where you go to buy drugs. | 0:08:30 | 0:08:32 | |
Right around here. | 0:08:32 | 0:08:33 | |
We see lots of aggravated assaults, lots of auto thefts. | 0:08:37 | 0:08:41 | |
Basically a huge part of the crime that happens in the city happens in this five or six block radius. | 0:08:41 | 0:08:48 | |
If you've been hearing police sirens in your neighbourhood, | 0:08:55 | 0:08:58 | |
you can use the map to find out why. | 0:08:58 | 0:09:02 | |
If you're out at night in an unfamiliar part of town, | 0:09:02 | 0:09:05 | |
you can check the map for streets to avoid. | 0:09:05 | 0:09:09 | |
If a neighbour gets burgled, you can see - | 0:09:09 | 0:09:12 | |
is it a one-off or has there been a spike in local crime? | 0:09:12 | 0:09:16 | |
If you commute through a neighbourhood and you're worried | 0:09:16 | 0:09:19 | |
about its safety, the fact that we have the ability to turn off all | 0:09:19 | 0:09:23 | |
the night-time and middle-of-the-day crimes | 0:09:23 | 0:09:25 | |
and show you just the things that are happening during the commute, | 0:09:25 | 0:09:28 | |
it is a statistical operation. But I think to people that are interacting with the thing | 0:09:28 | 0:09:32 | |
it feels very much more like they're just sort of browsing a website or shopping on Amazon. | 0:09:32 | 0:09:38 | |
They're looking at data and they don't realise they're doing statistics. | 0:09:38 | 0:09:43 | |
What's most exciting for me is that public statistics | 0:09:43 | 0:09:47 | |
is making citizens more powerful and the authorities more accountable. | 0:09:47 | 0:09:52 | |
We have community meetings that the police attend | 0:10:02 | 0:10:04 | |
and what citizens are now doing are bringing printouts | 0:10:04 | 0:10:08 | |
of the maps that show where crimes are taking place, | 0:10:08 | 0:10:12 | |
and they're demanding services from the police department | 0:10:12 | 0:10:16 | |
and the police department is now having to change how they police, | 0:10:16 | 0:10:20 | |
how they provide policing services, | 0:10:20 | 0:10:22 | |
because the data is showing what is working and what is not. | 0:10:22 | 0:10:27 | |
People in San Francisco are also using public data | 0:10:28 | 0:10:31 | |
to map social inequalities and see how to improve society. | 0:10:31 | 0:10:35 | |
And the possibilities are endless. | 0:10:35 | 0:10:39 | |
I think our dream government data analysis project | 0:10:39 | 0:10:43 | |
would really be focused on live information, | 0:10:43 | 0:10:46 | |
on stuff that was being reported and pushed out to the world over the internet as it was happening. | 0:10:46 | 0:10:51 | |
You know, trash pickups, traffic accidents, buses, | 0:10:51 | 0:10:55 | |
and I think through the kind of stats-gathering power | 0:10:55 | 0:10:57 | |
of the internet it's possible to really begin to see the workings of the city | 0:10:57 | 0:11:02 | |
displayed as a unified interface. | 0:11:02 | 0:11:04 | |
So that's where we are heading. | 0:11:07 | 0:11:09 | |
Towards a world of free data with all the statistical insights that come from it, | 0:11:09 | 0:11:14 | |
accessible to everyone, empowering us as citizens and letting us hold our rulers to account. | 0:11:14 | 0:11:21 | |
It's a long way from where statistics began. | 0:11:21 | 0:11:26 | |
Statistics are essential to us to monitor our governments and our societies. | 0:11:26 | 0:11:32 | |
But it was our rulers up there who started | 0:11:32 | 0:11:36 | |
the collection of statistics in the first place in order to monitor us! | 0:11:36 | 0:11:40 | |
In fact the word 'statistics' comes from 'the state'. | 0:11:46 | 0:11:51 | |
Modern statistics began two centuries ago. | 0:11:51 | 0:11:55 | |
Once it got going, it spread and never stopped. | 0:11:55 | 0:11:59 | |
And guess who was first! | 0:11:59 | 0:12:01 | |
The Chinese have Confucius, the Italians have da Vinci, | 0:12:03 | 0:12:07 | |
and the British have Shakespeare. | 0:12:07 | 0:12:10 | |
And we have the Tabellverket - | 0:12:10 | 0:12:12 | |
the first ever systematic collection of statistics! | 0:12:12 | 0:12:16 | |
Since the year 1749 we have collected data | 0:12:16 | 0:12:21 | |
on every birth, marriage and death, and we are proud of it! | 0:12:21 | 0:12:26 | |
The Tabellverket recorded information | 0:12:29 | 0:12:32 | |
from every parish in Sweden. | 0:12:32 | 0:12:34 | |
It was a huge quantity of data and it was the first time any government | 0:12:34 | 0:12:39 | |
could get an accurate picture of its people. | 0:12:39 | 0:12:41 | |
Sweden had been the greatest military power in Northern Europe, | 0:12:49 | 0:12:53 | |
but by 1749 our star was really fading | 0:12:53 | 0:12:58 | |
and other countries were growing stronger. | 0:12:58 | 0:13:00 | |
At least we were a large power, | 0:13:00 | 0:13:03 | |
thought to have 20 million people, enough to rival Britain and France. | 0:13:03 | 0:13:09 | |
But we were in for a nasty surprise. | 0:13:13 | 0:13:18 | |
The first analysis of the Tabellverket | 0:13:18 | 0:13:20 | |
revealed that Sweden only had two million inhabitants. | 0:13:20 | 0:13:24 | |
Sweden was not just a power in decline, it also had a very small population. | 0:13:24 | 0:13:30 | |
The government was horrified by this finding - what if the enemy found out? | 0:13:30 | 0:13:36 | |
But the Tabellverket also showed that many women died in childbirth and many children died young. | 0:13:37 | 0:13:44 | |
So government took action to improve the health of the people. | 0:13:44 | 0:13:48 | |
This was the beginning of modern Sweden. | 0:13:48 | 0:13:52 | |
It took more than 50 years before the Austrians, Belgians, Danes, | 0:13:53 | 0:13:59 | |
Dutch, French, Germans, Italians | 0:13:59 | 0:14:02 | |
and, finally, the British, caught up with Sweden in collecting and using statistics. | 0:14:02 | 0:14:08 | |
It was called political arithmetic. It was a lovely phrase that was used for statistics. | 0:14:24 | 0:14:29 | |
Governments could have much more control and understanding of | 0:14:29 | 0:14:33 | |
the society - how it was working, how it was developing | 0:14:33 | 0:14:36 | |
and essentially so they could control it better. | 0:14:36 | 0:14:40 | |
It wasn't just governments who woke up to the power of statistics. | 0:14:43 | 0:14:47 | |
Right across Europe, 19th century society went mad for facts. | 0:14:47 | 0:14:54 | |
And, despite its late start, Britain, | 0:14:54 | 0:14:57 | |
with its Royal Statistical Society in London, | 0:14:57 | 0:15:01 | |
was soon a statisticians' nirvana. | 0:15:01 | 0:15:04 | |
I love looking at old copies of the Royal Statistical Society journal | 0:15:05 | 0:15:09 | |
because it's full of such odd stuff. | 0:15:09 | 0:15:11 | |
There's a wonderful paper from the 1840s | 0:15:11 | 0:15:14 | |
which shows a map of England and the rates of bastardy in each county. | 0:15:14 | 0:15:19 | |
So you can identify very quickly the areas with high rates of bastardy. | 0:15:19 | 0:15:23 | |
Being in East Anglia it always makes me slightly laugh that Norfolk | 0:15:23 | 0:15:27 | |
seems to top the "bastardy league" in the 1840s. | 0:15:27 | 0:15:30 | |
One of the founders of the Royal Statistical Society | 0:15:30 | 0:15:36 | |
was the great Victorian mathematician and inventor Charles Babbage. | 0:15:36 | 0:15:42 | |
In 1842 he read the latest poem by an equally great Victorian, Alfred Tennyson. | 0:15:42 | 0:15:50 | |
Vision of Sin contained the lines: | 0:15:50 | 0:15:53 | |
"Fill the cup, and fill the can | 0:15:53 | 0:15:55 | |
"Have a rouse before the morn | 0:15:55 | 0:15:58 | |
"Every moment dies a man Every moment one is born." | 0:15:58 | 0:16:03 | |
So keen a statistician was Babbage that he could not contain himself. | 0:16:03 | 0:16:07 | |
He dashed off a letter to Tennyson | 0:16:07 | 0:16:09 | |
explaining that because of population growth, | 0:16:09 | 0:16:12 | |
the line should read, | 0:16:12 | 0:16:13 | |
"Every moment dies a man and one and a 16th is born." | 0:16:13 | 0:16:18 | |
I may add that the exact figure is 1.067, | 0:16:18 | 0:16:22 | |
but something must be conceded to the laws of metre. | 0:16:22 | 0:16:27 | |
In the 19th century, scholars all over Europe did amazing work | 0:16:31 | 0:16:36 | |
in measuring their societies. | 0:16:36 | 0:16:39 | |
They were hoovering up data on almost everything. | 0:16:39 | 0:16:42 | |
But numbers alone don't tell you anything. | 0:16:42 | 0:16:46 | |
You have to analyse them, and that's what makes statistics. | 0:16:46 | 0:16:51 | |
When the first statisticians began to get to grips with | 0:16:55 | 0:16:59 | |
analysing their data | 0:16:59 | 0:17:00 | |
they seized upon the average, and they took the average of everything. | 0:17:00 | 0:17:05 | |
What's so great about an average is that | 0:17:09 | 0:17:13 | |
you can take a whole mass of data and reduce it to a single number. | 0:17:13 | 0:17:18 | |
And though each of us is unique, our collective lives produce | 0:17:21 | 0:17:26 | |
averages that can characterise whole populations. | 0:17:26 | 0:17:29 | |
I looked in my local newspaper one week and saw a pensioner | 0:17:41 | 0:17:45 | |
had accidentally put her foot on the accelerator | 0:17:45 | 0:17:49 | |
and crushed her friend against a wall. | 0:17:49 | 0:17:52 | |
Devastating, hideous, horrible thing to happen. | 0:17:52 | 0:17:56 | |
And then there was a second one about a young man who didn't have | 0:17:56 | 0:18:01 | |
a driving licence, was driving a car under the influence of drugs and alcohol | 0:18:01 | 0:18:07 | |
and he bashed into a pedestrian and killed him. | 0:18:07 | 0:18:10 | |
What's remarkable, absolutely remarkable, if you look at the number | 0:18:10 | 0:18:15 | |
of people who die each year in traffic crashes, it's nearly a constant. | 0:18:15 | 0:18:22 | |
What? | 0:18:22 | 0:18:24 | |
All these individual events, somehow when you sum them all up there's the same number every year. | 0:18:24 | 0:18:31 | |
And every year, two and a half times as many men | 0:18:31 | 0:18:35 | |
die in traffic crashes as women, and it's a constant. | 0:18:35 | 0:18:38 | |
And every year the rate in Belgium is double the rate in England. | 0:18:38 | 0:18:44 | |
There are these remarkable regularities. | 0:18:44 | 0:18:47 | |
So that these individual particular events sum up into a social phenomenon. | 0:18:47 | 0:18:54 | |
Let's see what Sweden have done. | 0:18:56 | 0:18:58 | |
We used to boast about fast social progress, that's where we were.... | 0:18:58 | 0:19:01 | |
'In my lectures, to tell stories about the changing world, | 0:19:01 | 0:19:05 | |
'I use the averages from entire countries, | 0:19:05 | 0:19:08 | |
'whether the average of income, child mortality, family size | 0:19:08 | 0:19:12 | |
'or carbon output.' | 0:19:12 | 0:19:13 | |
OK, I give you Singapore. The year I was born, | 0:19:13 | 0:19:16 | |
Singapore had twice the child mortality of Sweden, the most tropical country in the world, | 0:19:16 | 0:19:20 | |
a marshland on the Equator, and here we go. | 0:19:20 | 0:19:22 | |
It took a little time for them to get independent, | 0:19:22 | 0:19:25 | |
but then they started to grow their economy, | 0:19:25 | 0:19:27 | |
and they made the social investment, they got away malaria, | 0:19:27 | 0:19:29 | |
they got a magnificent health system that beat both US and Sweden. | 0:19:29 | 0:19:33 | |
We never thought it would happen that they would win over Sweden! | 0:19:33 | 0:19:37 | |
LAUGHTER AND APPLAUSE | 0:19:37 | 0:19:40 | |
But useful as averages are, they don't tell you the whole story. | 0:19:40 | 0:19:46 | |
On average, Swedish people have slightly less than two legs. | 0:19:48 | 0:19:53 | |
This is because few people only have one leg or no legs, | 0:19:53 | 0:19:57 | |
and no-one has three legs. | 0:19:57 | 0:19:59 | |
So almost everybody in Sweden has more than the average number of legs. | 0:19:59 | 0:20:06 | |
The variation in data is just as important as the average. | 0:20:06 | 0:20:10 | |
But how do you get a handle on variation? | 0:20:16 | 0:20:19 | |
For this, you transform numbers into shapes. | 0:20:19 | 0:20:23 | |
Let's look again at the number of adult women in Sweden | 0:20:23 | 0:20:26 | |
for different heights. | 0:20:26 | 0:20:27 | |
Plotting the data as a shape shows how much their heights | 0:20:27 | 0:20:31 | |
vary from the average and how wide that variation is. | 0:20:31 | 0:20:36 | |
The shape a set of data makes is called its distribution. | 0:20:36 | 0:20:41 | |
This is the income distribution of China, 1970. | 0:20:41 | 0:20:46 | |
This is the income distribution of the United States, 1970. | 0:20:46 | 0:20:51 | |
Almost no overlap, and what has happened? | 0:20:51 | 0:20:54 | |
China is growing, it's not so equal any longer, | 0:20:54 | 0:20:56 | |
and it's appearing here overlooking the United States. | 0:20:56 | 0:21:01 | |
Almost like a ghost, isn't it? | 0:21:01 | 0:21:03 | |
It's pretty scary. | 0:21:03 | 0:21:05 | |
Rrrr! | 0:21:05 | 0:21:06 | |
LAUGHTER | 0:21:06 | 0:21:08 | |
The statisticians who first explored distribution | 0:21:17 | 0:21:21 | |
discovered one shape that turned up again and again. | 0:21:21 | 0:21:25 | |
The Victorian scholar Francis Galton | 0:21:25 | 0:21:28 | |
was so fascinated he built a machine that could reproduce it, | 0:21:28 | 0:21:32 | |
and he found it fitted so many different sets of measurements | 0:21:32 | 0:21:36 | |
that he named it the normal distribution. | 0:21:36 | 0:21:38 | |
Whether it was people's arm spans, lung capacities, | 0:21:38 | 0:21:45 | |
or even their exam results, | 0:21:45 | 0:21:47 | |
the normal distribution shape recurred time and time again. | 0:21:47 | 0:21:51 | |
Other statisticians soon found many other regular shapes, | 0:21:51 | 0:21:56 | |
each produced by particular kinds of natural or social processes. | 0:21:56 | 0:22:01 | |
And every statistician has their favourite. | 0:22:01 | 0:22:05 | |
The Poisson distribution, the Poisson shape is my favourite distribution. | 0:22:05 | 0:22:09 | |
I think it's an absolute cracker. | 0:22:09 | 0:22:11 | |
The Poisson shape describes how likely it is | 0:22:15 | 0:22:18 | |
that out-of-the-ordinary things will happen. | 0:22:18 | 0:22:21 | |
Imagine a London bus stop where we know that on average | 0:22:21 | 0:22:24 | |
we'll get three buses in an hour. | 0:22:24 | 0:22:26 | |
We won't always get three buses, of course. | 0:22:26 | 0:22:29 | |
Amazingly, the Poisson shape will show us the probability | 0:22:29 | 0:22:33 | |
that in any given hour we will get four, five, or six buses, | 0:22:33 | 0:22:37 | |
or no buses at all. | 0:22:37 | 0:22:39 | |
The exact shape changes with the average. | 0:22:40 | 0:22:43 | |
But whether it's how many people will win the lottery jackpot | 0:22:43 | 0:22:46 | |
each week, | 0:22:46 | 0:22:48 | |
or how many people will phone a call centre each minute, | 0:22:48 | 0:22:51 | |
the Poisson shape will give the probabilities. | 0:22:51 | 0:22:54 | |
The wonderful example where this was applied to in the late 19th century | 0:22:57 | 0:23:01 | |
was to count each year the number of Prussian officers, | 0:23:01 | 0:23:04 | |
cavalry officers, who were kicked to death by their horses. | 0:23:04 | 0:23:07 | |
Now, some years there were none, some years there were one, | 0:23:07 | 0:23:10 | |
some years there were two, up to seven, I think, one particularly bad year. | 0:23:10 | 0:23:13 | |
But with this distribution, however many years there were | 0:23:13 | 0:23:16 | |
with nought, one, two, three, four Prussian cavalry officers | 0:23:16 | 0:23:19 | |
kicked to death by their horses, beautifully obeyed the Poisson distribution. | 0:23:19 | 0:23:23 | |
So statisticians use shapes to reveal the patterns in the data. | 0:23:42 | 0:23:48 | |
But we also use images of all kinds | 0:23:48 | 0:23:51 | |
to communicate statistics to a wider public. | 0:23:51 | 0:23:54 | |
Because if the story in the numbers | 0:23:54 | 0:23:57 | |
is told by a beautiful and clever image, then everyone understands. | 0:23:57 | 0:24:02 | |
Of the pioneers of statistical graphics, my favourite is Florence Nightingale. | 0:24:02 | 0:24:09 | |
There are not many people who realise that she was known | 0:24:24 | 0:24:27 | |
as a passionate statistician and not just the Lady of the Lamp. | 0:24:27 | 0:24:30 | |
She said that "to understand God's thoughts, we must study statistics, | 0:24:30 | 0:24:34 | |
"for these are the measure of His purpose." | 0:24:34 | 0:24:37 | |
Statistics was for her a religious duty and moral imperative. | 0:24:37 | 0:24:40 | |
When Florence was nine years old she started collecting data. | 0:24:42 | 0:24:45 | |
Her data was different fruits and vegetables she found. | 0:24:45 | 0:24:48 | |
Put them into different tables. | 0:24:48 | 0:24:50 | |
Trying to organise them in some standard form. | 0:24:50 | 0:24:52 | |
And so we have one of Nightingale's first statistical tables | 0:24:52 | 0:24:55 | |
at the age of nine. | 0:24:55 | 0:24:57 | |
In the mid 1850s Florence Nightingale went to the Crimea to care for British casualties of war. | 0:25:04 | 0:25:11 | |
She was horrified by what she discovered. | 0:25:11 | 0:25:14 | |
For all the soldiers being blown to bits on the battlefield, there were many, many more soldiers | 0:25:14 | 0:25:19 | |
dying from diseases they caught in the army's filthy hospitals. | 0:25:19 | 0:25:25 | |
So Florence Nightingale began counting the dead. | 0:25:25 | 0:25:29 | |
For two years she recorded mortality data in meticulous detail. | 0:25:29 | 0:25:34 | |
When the war was over she persuaded the government to set up | 0:25:34 | 0:25:39 | |
a Royal Commission of Inquiry, | 0:25:39 | 0:25:41 | |
and gathered her data in a devastating report. | 0:25:41 | 0:25:44 | |
What has cemented her place in the statistical history books | 0:25:44 | 0:25:48 | |
are the graphics she used. | 0:25:48 | 0:25:50 | |
And one in particular, the polar area graph. | 0:25:50 | 0:25:53 | |
For each month of the war, a huge blue wedge represented | 0:25:53 | 0:25:58 | |
the soldiers who had died from preventable diseases. | 0:25:58 | 0:26:02 | |
The much smaller red wedges were deaths from wounds, | 0:26:02 | 0:26:05 | |
and the black wedges were deaths from accidents and other causes. | 0:26:05 | 0:26:10 | |
Nightingale's graphics were so clear they were impossible to ignore. | 0:26:10 | 0:26:17 | |
The usual thing around Florence Nightingale's time | 0:26:17 | 0:26:19 | |
was just to produce tables and tables of figures - absolutely really tedious stuff that, | 0:26:19 | 0:26:23 | |
unless you're an absolutely dedicated statistician, | 0:26:23 | 0:26:26 | |
it's really quite difficult to spot the patterns quite naturally. | 0:26:26 | 0:26:29 | |
But visualisations, they tell a story, they tell a story immediately. | 0:26:29 | 0:26:33 | |
And the use of colour and the use of shape can really tell a powerful story. | 0:26:33 | 0:26:38 | |
And nowadays of course we can make things move as well. | 0:26:38 | 0:26:41 | |
Florence Nightingale would have loved to have played with... | 0:26:41 | 0:26:44 | |
She would have produced wonderful animations, I'm absolutely certain of it. | 0:26:44 | 0:26:48 | |
Today, 150 years on, Nightingale's graphics | 0:26:50 | 0:26:54 | |
are rightly regarded as a classic. | 0:26:54 | 0:26:57 | |
They led to a revolution in nursing, health care | 0:26:57 | 0:27:00 | |
and hygiene in hospitals worldwide, which saved innumerable lives. | 0:27:00 | 0:27:05 | |
And statistical graphics has become an art form of its very own, | 0:27:07 | 0:27:11 | |
led by designers who are passionate about visualising data. | 0:27:11 | 0:27:16 | |
This is the Billion Pound-O-Gram. | 0:27:24 | 0:27:27 | |
This image arose out of frustration | 0:27:27 | 0:27:29 | |
with the reporting of billion pound amounts in the media. | 0:27:29 | 0:27:32 | |
£500 billion pounds for this war. | 0:27:32 | 0:27:34 | |
£50 billion for this oil spill. | 0:27:34 | 0:27:36 | |
It doesn't make sense - the numbers are too enormous to get your mind round. | 0:27:36 | 0:27:39 | |
So I scraped all this data from various news sources and created this diagram. | 0:27:39 | 0:27:43 | |
So the squares here are scaled according to the billion pound amounts. | 0:27:43 | 0:27:48 | |
When you see numbers visualised like this | 0:27:48 | 0:27:51 | |
you start to have a different relationship with them. | 0:27:51 | 0:27:54 | |
You can start to see the patterns, and the scale of them. | 0:27:54 | 0:27:56 | |
Here in the corner, this little square - £37 billion. | 0:27:56 | 0:27:59 | |
This was the predicted cost of the Iraq war in 2003. | 0:27:59 | 0:28:02 | |
As you can see it's grown exponentially over the last few years | 0:28:02 | 0:28:06 | |
and the total cost now is around about £2,500 billion. | 0:28:06 | 0:28:10 | |
It's funny because when you visualise statistics | 0:28:10 | 0:28:13 | |
you understand them, and when you understand them | 0:28:13 | 0:28:15 | |
you can really start to put things in perspective. | 0:28:15 | 0:28:18 | |
Visualisation is right at the heart of my own work too. | 0:28:23 | 0:28:27 | |
I teach global health. | 0:28:27 | 0:28:30 | |
And I know having the data is not enough - | 0:28:30 | 0:28:33 | |
I have to show it in ways people both enjoy and understand. | 0:28:33 | 0:28:39 | |
Now I'm going to try something I've never done before. | 0:28:39 | 0:28:42 | |
Animating the data in real space, | 0:28:42 | 0:28:45 | |
with a bit of technical assistance from the crew. | 0:28:45 | 0:28:50 | |
So here we go. | 0:28:50 | 0:28:52 | |
First, an axis for health. | 0:28:52 | 0:28:54 | |
Life expectancy from 25 years to 75 years. | 0:28:54 | 0:28:58 | |
And down here an axis for wealth. | 0:28:58 | 0:29:01 | |
Income per person - 400, 4,000, 40,000. | 0:29:01 | 0:29:06 | |
So down here is poor and sick. | 0:29:06 | 0:29:10 | |
And up here is rich and healthy. | 0:29:10 | 0:29:14 | |
Now I'm going to show you the world | 0:29:14 | 0:29:18 | |
200 years ago, in 1810. | 0:29:18 | 0:29:21 | |
Here come all the countries. | 0:29:21 | 0:29:22 | |
Europe, brown; Asia, red; Middle East, green; | 0:29:22 | 0:29:26 | |
Africa south of the Sahara, blue; and the Americas, yellow. | 0:29:26 | 0:29:29 | |
And the size of the country bubble shows the size of the population. | 0:29:29 | 0:29:33 | |
In 1810, it was pretty crowded down there, wasn't it? | 0:29:33 | 0:29:37 | |
All countries were sick and poor. | 0:29:37 | 0:29:39 | |
Life expectancy was below 40 in all countries. | 0:29:39 | 0:29:43 | |
And only UK and the Netherlands were slightly better off. But not much. | 0:29:43 | 0:29:48 | |
And now I start the world. | 0:29:48 | 0:29:52 | |
The industrial revolution makes countries in Europe and elsewhere | 0:29:52 | 0:29:56 | |
move away from the rest. | 0:29:56 | 0:29:59 | |
But the colonized countries in Asia and Africa, | 0:29:59 | 0:30:02 | |
they are stuck down there. | 0:30:02 | 0:30:04 | |
And eventually the Western countries get healthier and healthier. | 0:30:04 | 0:30:08 | |
And now we slow down to show the impact of the First World War | 0:30:08 | 0:30:13 | |
and the Spanish flu epidemic. | 0:30:13 | 0:30:15 | |
What a catastrophe! | 0:30:15 | 0:30:18 | |
And now I speed up through the 1920s and the 1930s and, | 0:30:18 | 0:30:22 | |
in spite of the Great Depression, | 0:30:22 | 0:30:24 | |
Western countries forge on towards greater wealth and health. | 0:30:24 | 0:30:27 | |
Japan and some others try to follow. | 0:30:27 | 0:30:29 | |
But most countries stay down here. | 0:30:29 | 0:30:32 | |
And after the tragedies of the Second World War, | 0:30:32 | 0:30:35 | |
we stop a bit to look at the world in 1948. | 0:30:35 | 0:30:39 | |
1948 was a great year. | 0:30:39 | 0:30:42 | |
The war was over, | 0:30:42 | 0:30:43 | |
Sweden topped the medal table at the Winter Olympics and I was born. | 0:30:43 | 0:30:48 | |
But the differences between the countries of the world | 0:30:48 | 0:30:51 | |
was wider than ever. | 0:30:51 | 0:30:52 | |
United States was in the front. | 0:30:52 | 0:30:54 | |
Japan was catching up. | 0:30:54 | 0:30:56 | |
Brazil was way behind, | 0:30:56 | 0:30:58 | |
Iran was getting a little richer from oil but still had short lives. | 0:30:58 | 0:31:03 | |
And the Asian giants... | 0:31:03 | 0:31:05 | |
China, India, Pakistan, Bangladesh, and Indonesia, | 0:31:05 | 0:31:08 | |
they were still poor and sick down here. | 0:31:08 | 0:31:11 | |
But look what was about to happen! Here we go again. | 0:31:11 | 0:31:14 | |
In my lifetime, former colonies gained independence and then finally | 0:31:14 | 0:31:18 | |
they started to get healthier and healthier and healthier. | 0:31:18 | 0:31:22 | |
And in the 1970s, then countries in Asia and Latin America | 0:31:22 | 0:31:26 | |
started to catch up with the Western countries. | 0:31:26 | 0:31:28 | |
They became the emerging economies. | 0:31:28 | 0:31:31 | |
Some in Africa follows, | 0:31:31 | 0:31:32 | |
some Africans were stuck in civil war, and others were hit by HIV. | 0:31:32 | 0:31:36 | |
And now we can see the world in the most up-to-date statistics. | 0:31:36 | 0:31:41 | |
Most people today live in the middle. | 0:31:42 | 0:31:45 | |
But there is huge difference at the same time | 0:31:45 | 0:31:48 | |
between the best-off countries and the worst-off countries. | 0:31:48 | 0:31:51 | |
And there are also huge inequalities within countries. | 0:31:51 | 0:31:54 | |
These bubbles show country averages but I can split them. | 0:31:54 | 0:31:59 | |
Take China. I can split it into provinces. | 0:31:59 | 0:32:02 | |
There goes Shanghai... | 0:32:02 | 0:32:05 | |
It has the same health and wealth as Italy today. | 0:32:05 | 0:32:08 | |
And there is the poor inland province Guizhou, | 0:32:08 | 0:32:11 | |
it is like Pakistan. | 0:32:11 | 0:32:12 | |
And if I split it further, the rural parts are like Ghana in Africa. | 0:32:12 | 0:32:18 | |
And yet, despite the enormous disparities today, | 0:32:19 | 0:32:23 | |
we have seen 200 years of remarkable progress! | 0:32:23 | 0:32:27 | |
That huge historical gap between the west and the rest is now closing. | 0:32:27 | 0:32:31 | |
We have become an entirely new, converging world. | 0:32:31 | 0:32:35 | |
And I see a clear trend into the future. | 0:32:35 | 0:32:37 | |
With aid, trade, green technology and peace, | 0:32:37 | 0:32:40 | |
it's fully possible that everyone can make it | 0:32:40 | 0:32:43 | |
to the healthy, wealthy corner. | 0:32:43 | 0:32:45 | |
Well, what you've just seen in the last few minutes | 0:32:48 | 0:32:51 | |
is a story of 200 countries shown over 200 years and beyond. | 0:32:51 | 0:32:56 | |
It involved plotting 120,000 numbers. | 0:32:56 | 0:33:00 | |
Pretty neat, huh? | 0:33:00 | 0:33:02 | |
So, with statistics, we can begin to see things as they really are. | 0:33:07 | 0:33:13 | |
From tables of data to averages, distributions and visualisations, | 0:33:13 | 0:33:18 | |
statistics gives us a clear description of the world. | 0:33:18 | 0:33:22 | |
But, with statistics, we can not only discover WHAT is happening | 0:33:22 | 0:33:28 | |
but also explore WHY, | 0:33:28 | 0:33:30 | |
by using the powerful analytical method - correlation. | 0:33:30 | 0:33:34 | |
Just looking at one thing at a time doesn't tell you very much. | 0:33:35 | 0:33:38 | |
You've got to look at the relationships between things, | 0:33:38 | 0:33:41 | |
how they change, how they vary together. | 0:33:41 | 0:33:43 | |
That's what correlation is about. | 0:33:43 | 0:33:45 | |
That's how you start trying to understand the processes | 0:33:45 | 0:33:48 | |
that are really going on in the world and society. | 0:33:48 | 0:33:50 | |
Most of us today would recognise that crime correlates to poverty, | 0:33:52 | 0:33:57 | |
that infection correlates to poor sanitation, | 0:33:57 | 0:34:00 | |
and that knowledge of statistics correlates | 0:34:00 | 0:34:02 | |
to being great at dancing! | 0:34:02 | 0:34:05 | |
Correlations can be very tricky. | 0:34:06 | 0:34:10 | |
I got a joke about silly correlations. | 0:34:10 | 0:34:12 | |
There was this American who was afraid of heart attack. | 0:34:12 | 0:34:15 | |
He found out that the Japanese ate very little fat | 0:34:15 | 0:34:19 | |
and almost didn't drink wine, | 0:34:19 | 0:34:22 | |
but they had much less heart attacks than the Americans. | 0:34:22 | 0:34:25 | |
But, on the other hand, he also found out that the French | 0:34:25 | 0:34:28 | |
eat as much fat as the Americans and they drink much more wine but they also have less heart attacks. | 0:34:28 | 0:34:35 | |
So he concluded that what kills you is speaking English. | 0:34:35 | 0:34:40 | |
# Smoke, smoke, smoke that cigarette | 0:34:40 | 0:34:43 | |
# Puff, puff, puff and if you smoke yourself to death... # | 0:34:43 | 0:34:48 | |
The time, the pace, the cigarette. Weights Tilt. | 0:34:48 | 0:34:51 | |
The best example of a really ground-breaking correlation | 0:34:51 | 0:34:56 | |
is the link that was established in the 1950s between smoking and lung cancer. | 0:34:56 | 0:35:01 | |
Not long after the Second World War, a British doctor, Richard Doll, | 0:35:01 | 0:35:07 | |
investigated lung cancer patients in 20 London hospitals. | 0:35:07 | 0:35:11 | |
And he became certain that the only thing they had in common was smoking. | 0:35:11 | 0:35:15 | |
So certain, that he stopped smoking himself. | 0:35:15 | 0:35:18 | |
But other people weren't so sure. | 0:35:18 | 0:35:22 | |
A lot of the discussion of the early data, | 0:35:22 | 0:35:25 | |
linking smoking to lung cancer, said, "It's not the smoking, surely, | 0:35:25 | 0:35:29 | |
"that thing we've done all our lives, that can't be bad for you. | 0:35:29 | 0:35:32 | |
"Maybe it's genes. | 0:35:32 | 0:35:35 | |
"Maybe people who are genetically predisposed to get lung cancer | 0:35:35 | 0:35:39 | |
"are also genetically predisposed to smoke." | 0:35:39 | 0:35:43 | |
"Maybe it's not the smoking, maybe it's air pollution - | 0:35:43 | 0:35:47 | |
"that smokers are somehow more exposed to air pollution than non-smokers. | 0:35:47 | 0:35:52 | |
"Maybe it's not smoking, maybe it's poverty." | 0:35:52 | 0:35:56 | |
So now we've got three alternative explanations, apart from chance. | 0:35:56 | 0:36:00 | |
To verify his correlation did imply cause and effect. | 0:36:02 | 0:36:06 | |
Richard Doll created the biggest statistical study of smoking yet. | 0:36:06 | 0:36:10 | |
He began tracking the lives of 40,000 British doctors, | 0:36:10 | 0:36:14 | |
some of whom smoked and some of whom didn't, | 0:36:14 | 0:36:17 | |
and gathered enough data | 0:36:17 | 0:36:19 | |
to correlate the amount the doctors smoked | 0:36:19 | 0:36:22 | |
with their likelihood of getting cancer. | 0:36:22 | 0:36:24 | |
Eventually, he not only showed a correlation between smoking and lung cancer, | 0:36:24 | 0:36:30 | |
but also a correlation between stopping smoking and reducing the risk. | 0:36:30 | 0:36:35 | |
This was science at its best. | 0:36:35 | 0:36:37 | |
What correlations do not replace is human thought. | 0:36:39 | 0:36:44 | |
You've got to think about what it means. | 0:36:44 | 0:36:46 | |
What a good scientist does, if he comes with a correlation, | 0:36:46 | 0:36:50 | |
is try as hard as she or he possibly can to disprove it, | 0:36:50 | 0:36:55 | |
to break it down, to get rid of it, to try and refute it. | 0:36:55 | 0:37:00 | |
And if it withstands all those efforts at demolishing it | 0:37:00 | 0:37:05 | |
and it is still standing up then, cautiously, you say, "We really might have something here." | 0:37:05 | 0:37:10 | |
However brilliant the scientist, data is still the oxygen of science. | 0:37:26 | 0:37:32 | |
The good news is that the more we have, the more correlations we'll find, the more theories we'll test, | 0:37:32 | 0:37:39 | |
and the more discoveries we're likely to make. | 0:37:39 | 0:37:42 | |
And history shows how our total sum of information grows in huge leaps as we develop new technologies. | 0:37:46 | 0:37:53 | |
The invention of the printing press kicked off the first data and information explosion. | 0:37:53 | 0:38:00 | |
If you piled up all the books that had been printed by the year 1700, | 0:38:00 | 0:38:06 | |
they would make 60 stacks each as high as Mount Everest. | 0:38:06 | 0:38:11 | |
Then, starting in the 19th century, | 0:38:12 | 0:38:15 | |
there came a second information revolution with the telegraph, | 0:38:15 | 0:38:19 | |
gramophone and camera. And later radio and TV. | 0:38:19 | 0:38:23 | |
The total amount of information exploded. | 0:38:23 | 0:38:28 | |
And by the 1950s the information available to us all had multiplied 6,000 times. | 0:38:28 | 0:38:35 | |
Then, thanks to the computer and later the internet, we went digital. | 0:38:35 | 0:38:41 | |
And the amount of data we have now is unimaginably vast. | 0:38:41 | 0:38:47 | |
A single letter printed in a book is equivalent to a byte of data. | 0:38:49 | 0:38:55 | |
A printed page equals a kilobyte or two. | 0:38:55 | 0:38:58 | |
Five megabytes is enough for the complete works of Shakespeare. | 0:39:01 | 0:39:06 | |
10 gigabytes - that's a DVD movie. | 0:39:08 | 0:39:11 | |
Two terabytes is the tens of millions of photos added to Facebook every day. | 0:39:16 | 0:39:23 | |
Ten petabytes is the data recorded every second by the world's largest particle accelerator. | 0:39:24 | 0:39:32 | |
So much only a tiny fraction is kept. | 0:39:32 | 0:39:35 | |
Six exabytes is what you'd have if you sequenced the genomes of every single person on Earth. | 0:39:35 | 0:39:43 | |
But really, that's nothing. | 0:39:48 | 0:39:50 | |
In 2009, the internet added up to 500 exabytes. | 0:39:50 | 0:39:55 | |
In 2010, in just one year, that will double to more than one zettabyte! | 0:39:55 | 0:40:02 | |
Back in the real world, if we turned all this data into print it would make 90 stacks of books, | 0:40:06 | 0:40:14 | |
each reaching from here all the way to the sun! | 0:40:14 | 0:40:18 | |
The data deluge is staggering, but, with today's computers | 0:40:18 | 0:40:23 | |
and statistics, I'm confident we can handle it. | 0:40:23 | 0:40:28 | |
When it comes to all the data on the internet, | 0:40:28 | 0:40:31 | |
the powerhouse of statistical analysis | 0:40:31 | 0:40:33 | |
is the Silicon Valley giant Google. | 0:40:33 | 0:40:37 | |
The average person over their lifetime is exposed to about 100 million words of conversation. | 0:40:44 | 0:40:50 | |
And so if you multiple that by the six billion people on the planet, | 0:40:50 | 0:40:54 | |
that amount of words is about equal to the number of words | 0:40:54 | 0:40:58 | |
that Google has available at any one instant in time. | 0:40:58 | 0:41:01 | |
Google's computers hoover up and file away every document, web page, and image they can find. | 0:41:03 | 0:41:08 | |
They then hunt for patterns and correlations in all this data, | 0:41:08 | 0:41:14 | |
doing statistics on a massive scale. | 0:41:14 | 0:41:17 | |
And, for me, Google has one project that's particularly exciting - statistical language translation. | 0:41:17 | 0:41:25 | |
We wanted to provide access to all the web's information, no matter what language you spoke. | 0:41:25 | 0:41:30 | |
There's just so much information on the internet, | 0:41:30 | 0:41:33 | |
you couldn't hope to translate it all by hand into every possible language. | 0:41:33 | 0:41:37 | |
We figured we'd have to be able to do machine translation. | 0:41:37 | 0:41:41 | |
In the past, programmers tried to teach their computers | 0:41:44 | 0:41:47 | |
to see each language as a set of grammatical rules - much like the way languages are taught at school. | 0:41:47 | 0:41:53 | |
But this didn't work because no set of rules could capture a language | 0:41:53 | 0:41:58 | |
in all its subtlety and ambiguity. | 0:41:58 | 0:42:01 | |
"Having eaten our lunch the coach departed." | 0:42:01 | 0:42:05 | |
Well, that's obviously incorrect. | 0:42:05 | 0:42:07 | |
Written like that it would imply that the coach has eaten the lunch. | 0:42:07 | 0:42:12 | |
It would be far better to say... | 0:42:12 | 0:42:15 | |
"having eaten our lunch we departed in the coach." | 0:42:15 | 0:42:19 | |
Those rules are helpful and they are useful most of time, but they don't turn out to be true all the time. | 0:42:19 | 0:42:26 | |
And the insight of using statistical machine translation is saying, | 0:42:26 | 0:42:30 | |
"If you've got to have all these exceptions anyways, maybe you can get by without having any of the rules. | 0:42:30 | 0:42:35 | |
"Maybe you can treat everything as an exception." And that's essentially what we've done. | 0:42:35 | 0:42:39 | |
What the computer is doing when he's learning how to translate | 0:42:48 | 0:42:52 | |
is to learn correlations between words | 0:42:52 | 0:42:55 | |
and correlations between phrases. | 0:42:55 | 0:42:57 | |
So we feed the system very large amounts of data | 0:42:57 | 0:43:00 | |
and then the system is seeing that a certain word or a certain phrase | 0:43:00 | 0:43:04 | |
correlates very often to the other language. | 0:43:04 | 0:43:07 | |
Google's website currently offers translation between any of 57 different languages. | 0:43:09 | 0:43:15 | |
It does this purely statistically, having correlated a huge collection of multilingual texts. | 0:43:15 | 0:43:22 | |
The people that built the system don't need to know Chinese | 0:43:22 | 0:43:25 | |
in order to build the Chinese-to-English system, or they don't need to know Arabic. | 0:43:25 | 0:43:29 | |
But the expertise that's needed is basically knowledge of statistics, | 0:43:29 | 0:43:33 | |
knowledge of computer science, knowledge of infrastructure | 0:43:33 | 0:43:35 | |
to build those very large computational systems that we are building for doing that. | 0:43:35 | 0:43:40 | |
I hooked up with Google from my office in Stockholm to try the translator for myself. | 0:43:42 | 0:43:48 | |
'I will type... some Swedish sentences.' | 0:43:48 | 0:43:51 | |
OK. | 0:43:51 | 0:43:53 | |
Sveriges... | 0:43:53 | 0:43:55 | |
..guldring i orat. | 0:43:55 | 0:43:59 | |
OK. So it says, "Sweden's finance minister has a ponytail and a gold ring in your ear." | 0:44:00 | 0:44:07 | |
-I guess it probably means in his ear. -'That's exactly correct, it's amazing! | 0:44:07 | 0:44:11 | |
'He comes from the Conservative party, that's the kind of Sweden we have today. | 0:44:11 | 0:44:15 | |
'I will type one more sentence.' | 0:44:15 | 0:44:18 | |
'I sitt samkonade...' | 0:44:18 | 0:44:22 | |
partnerskap... | 0:44:22 | 0:44:25 | |
nya biskop. | 0:44:25 | 0:44:28 | |
"In his same-sex partnership has Stockholm's new bishop and his partners a three-year son." | 0:44:28 | 0:44:35 | |
It's almost perfect, there's one important thing - | 0:44:35 | 0:44:38 | |
it's HER, it's a lesbian partnership. | 0:44:38 | 0:44:41 | |
OK, so those kinds of words his and her are one of the challenges | 0:44:41 | 0:44:46 | |
in translation to get really those right. | 0:44:46 | 0:44:49 | |
Especially when it comes to bishops one can excuse it! | 0:44:49 | 0:44:51 | |
'Right, right.' | 0:44:51 | 0:44:53 | |
-I guess more often than not it would probably be a "his". -'I will write one more sentence.' | 0:44:53 | 0:44:58 | |
Nar Sverige deltar I olympiader ar malet | 0:44:58 | 0:45:01 | |
'inte att vinna utan att sla Norge.' | 0:45:01 | 0:45:03 | |
OK. "When Sweden is taking part in Olympic goal is not to win but to beat Norway." | 0:45:06 | 0:45:11 | |
'Yes! This is what it is! | 0:45:11 | 0:45:13 | |
'But they are very good in Winter Olympics, so we can't make it, but we are trying.' | 0:45:13 | 0:45:17 | |
Ah, very good, very good. | 0:45:17 | 0:45:19 | |
'This is absolutely amazing, you know, and I was especially impressed | 0:45:19 | 0:45:24 | |
'that it picks up words like "same-sex partnership" which are very new to the language." | 0:45:24 | 0:45:30 | |
'The translator is good, but if they succeed with what's next, that'll be remarkable.' | 0:45:30 | 0:45:36 | |
One of the exciting possibilities | 0:45:36 | 0:45:38 | |
is combining the machine translation technology with the speech recognition technology. | 0:45:38 | 0:45:42 | |
Now, both of these are statistical in nature. | 0:45:42 | 0:45:45 | |
The machine translation relies on the statistics of mapping from one language to another, | 0:45:45 | 0:45:51 | |
and similarly speech recognition relies on the statistics of mapping from a sound form to the words. | 0:45:51 | 0:45:57 | |
When we put them together, | 0:45:57 | 0:45:59 | |
now we have the capability of having instant conversation | 0:45:59 | 0:46:03 | |
between two people that don't speak a common language. | 0:46:03 | 0:46:06 | |
I can talk to you in my language, | 0:46:06 | 0:46:08 | |
you hear me in your language and you can answer back. | 0:46:08 | 0:46:11 | |
And in real time we can make that translation, | 0:46:11 | 0:46:15 | |
we can bring two people together and allow them to speak. | 0:46:15 | 0:46:18 | |
The internet is just one of many technologies created to gather massive amounts of data. | 0:46:31 | 0:46:39 | |
Scientists studying our earth and our environment | 0:46:39 | 0:46:43 | |
now use an incredible range of instruments | 0:46:43 | 0:46:47 | |
to measure the processes of our planet. | 0:46:47 | 0:46:50 | |
All around us are sensors continuously measuring temperature, water flow, and ocean currents. | 0:46:52 | 0:47:00 | |
And high in orbit are satellites busy imaging cloud formations, forest growth and snow cover. | 0:47:00 | 0:47:06 | |
Scientists speak of "instrumenting the earth". | 0:47:06 | 0:47:11 | |
And pointing up to the skies above are powerful new telescopes mapping the universe. | 0:47:13 | 0:47:20 | |
What's happening in astronomy is typical of how profoundly | 0:47:30 | 0:47:34 | |
this new torrent of data is transforming science. | 0:47:34 | 0:47:39 | |
Astronomers are now addressing many enduring mysteries of the cosmos | 0:47:39 | 0:47:45 | |
by applying statistical methods to all this new data. | 0:47:45 | 0:47:49 | |
The galaxy is a very big place and it's got billions of stars in it, | 0:47:59 | 0:48:03 | |
and so to put together a coherent picture of the whole galaxy requires having an enormous amount of data. | 0:48:03 | 0:48:09 | |
And before you could do a large sky survey with sensitive, digital detectors | 0:48:09 | 0:48:13 | |
that meant that you could map many, many stars all at once, | 0:48:13 | 0:48:16 | |
it was very difficult to build up enough data on enough of the galaxy. | 0:48:16 | 0:48:20 | |
In the past, large surveys of the night sky had to be done | 0:48:24 | 0:48:28 | |
by exposing thousands of large photographic plates. | 0:48:28 | 0:48:32 | |
But these surveys could take 25 years or more to complete. | 0:48:32 | 0:48:37 | |
Then, in the 1990s, came digital astronomy and a huge increase | 0:48:39 | 0:48:44 | |
in both the amount and the accessibility of data. | 0:48:44 | 0:48:49 | |
The Sloan Sky Survey is the world's biggest yet, using a massive digital sensor | 0:48:49 | 0:48:55 | |
mounted on the back of a custom-built telescope in New Mexico. | 0:48:55 | 0:49:00 | |
It's scanned the sky night after night for eight years, | 0:49:00 | 0:49:05 | |
building up a composite picture in unprecedented resolution. | 0:49:05 | 0:49:09 | |
The Sloan is some of the best, deepest survey data that we have in astronomy. | 0:49:09 | 0:49:14 | |
Both on our own galaxy and on galaxies further away from ours. | 0:49:14 | 0:49:18 | |
All the Sloan data is on the internet, | 0:49:24 | 0:49:27 | |
and with it astronomers have identified millions of hitherto unknown stars and galaxies. | 0:49:27 | 0:49:34 | |
They also comb the database for statistical patterns | 0:49:34 | 0:49:37 | |
which will prove, disprove, or even suggest new theories. | 0:49:37 | 0:49:42 | |
So we have this idea that galaxies grow, they become large galaxies like the one we live in, the milky way, | 0:49:42 | 0:49:49 | |
not all at once, or not smoothly, but by continuously incorporating, | 0:49:49 | 0:49:55 | |
basically cannibalising, smaller galaxies. | 0:49:55 | 0:49:59 | |
They dissolve them and they become part of the bigger galaxy as it grows. | 0:49:59 | 0:50:04 | |
It's a startling idea, and, in the Sloan data, is the evidence to support it. | 0:50:06 | 0:50:12 | |
Groups of stars that came from cannibalised galaxies | 0:50:12 | 0:50:16 | |
stand out in the Sloan data as statistically different from other stars | 0:50:16 | 0:50:21 | |
because they move at a different velocity. | 0:50:21 | 0:50:24 | |
Each big spike on one of these distribution graphs | 0:50:24 | 0:50:28 | |
means Professor Rockosi has found a group of stars all travelling in a different way to the rest. | 0:50:28 | 0:50:35 | |
They are the telltale patterns she's looking for. | 0:50:35 | 0:50:38 | |
The evidence is accumulating that, in fact, this really is how galaxies grow, | 0:50:40 | 0:50:44 | |
or an important way in which how galaxies grow. | 0:50:44 | 0:50:47 | |
And so this is an important part of understanding how galaxies form, not only ours but every galaxy. | 0:50:47 | 0:50:53 | |
The more data there is, the more discoveries can be made. | 0:50:56 | 0:51:00 | |
And the technology is getting better all the time. | 0:51:00 | 0:51:03 | |
The next big survey telescope starts its work in 2015. | 0:51:03 | 0:51:07 | |
It will leave Sloan in the dust! | 0:51:07 | 0:51:10 | |
Sloan has taken eight years to cover one quarter of the night sky. | 0:51:10 | 0:51:16 | |
The new telescope will scan the entire sky, in even greater resolution, every three days! | 0:51:17 | 0:51:25 | |
The vast amounts of data we have today allows researchers in all sorts of fields | 0:51:34 | 0:51:41 | |
to test their theories on a previously unimaginable scale. | 0:51:41 | 0:51:46 | |
But more than this, it may even change the fundamental way science is done. | 0:51:46 | 0:51:53 | |
With the power of today's computers applied to all this data, | 0:51:53 | 0:51:58 | |
the machines might even be able to guide the researchers. | 0:51:58 | 0:52:03 | |
We're at a potentially profoundly important | 0:52:14 | 0:52:17 | |
and potentially one of the most significant points in science, | 0:52:17 | 0:52:22 | |
and certainly one of the most exciting, | 0:52:22 | 0:52:24 | |
where the potential to transform not just how scientists do science but even what science is possible. | 0:52:24 | 0:52:32 | |
And what will power that transformation | 0:52:32 | 0:52:34 | |
of both how science is done and even what science is possible | 0:52:34 | 0:52:38 | |
is going to be computation. | 0:52:38 | 0:52:40 | |
Many of the dynamics of the natural world, like the interplay between the rainforests and the atmosphere, | 0:52:41 | 0:52:49 | |
are so complex that we don't as yet really understand them. | 0:52:49 | 0:52:53 | |
But now computers are generating literally tens of thousands of different simulations | 0:52:53 | 0:52:59 | |
of how these biological systems might work. | 0:52:59 | 0:53:03 | |
It's like creating thousands of hypothetical parallel worlds. | 0:53:03 | 0:53:07 | |
Each and every one of these simulations | 0:53:07 | 0:53:10 | |
is analysed with statistics to see if any are a good match for what is observed in nature. | 0:53:10 | 0:53:18 | |
The computers can now automatically generate, | 0:53:18 | 0:53:21 | |
test and discard hypotheses with scarcely a human in sight. | 0:53:21 | 0:53:26 | |
This new application of statistics will become absolutely vital for the future of science. | 0:53:28 | 0:53:35 | |
It's creating a new paradigm, if you like, | 0:53:35 | 0:53:39 | |
in science, in the way in which we can do science, | 0:53:39 | 0:53:42 | |
which is increasingly... | 0:53:42 | 0:53:45 | |
Which one might characterise as... data-centric or data driven | 0:53:45 | 0:53:51 | |
rather than being hypothesis-driven or experimentally-driven. | 0:53:51 | 0:53:55 | |
So, it's exciting times in terms of the science, | 0:53:55 | 0:53:58 | |
in terms of the computation and in terms of the statistics. | 0:53:58 | 0:54:02 | |
Now, if all that sounds a bit abstract and theoretical to you, how about one final frontier? | 0:54:08 | 0:54:15 | |
Could statistics even make sense of your feelings? | 0:54:15 | 0:54:19 | |
In California - where else? - one computer scientist | 0:54:21 | 0:54:25 | |
is harvesting the internet to try to divine the patterns of our innermost thoughts and emotions. | 0:54:25 | 0:54:32 | |
This is the madness movement. | 0:54:44 | 0:54:46 | |
The madness movement represents a skyscraper view of the world. | 0:54:46 | 0:54:50 | |
Each of these brightly coloured dots is an individual feeling | 0:54:50 | 0:54:54 | |
expressed by someone out there in a blog or a tweet. | 0:54:54 | 0:54:58 | |
And when you click on the dot it explodes to reveal the underlying feeling of that person. | 0:54:58 | 0:55:04 | |
This is what people say they're feeling today. | 0:55:04 | 0:55:07 | |
Better...safe... | 0:55:07 | 0:55:10 | |
crappy... | 0:55:10 | 0:55:12 | |
well... | 0:55:12 | 0:55:14 | |
pretty...special... | 0:55:14 | 0:55:18 | |
sorry...alone... | 0:55:18 | 0:55:20 | |
So, every minute, We Feel Fine crawls the world's blogs, | 0:55:25 | 0:55:29 | |
takes all the sentences that start with the words "I feel" or "I am feeling", | 0:55:29 | 0:55:34 | |
and puts them in a database. | 0:55:34 | 0:55:35 | |
We collect all the feelings and we count the most common. | 0:55:35 | 0:55:40 | |
They are better...bad... | 0:55:40 | 0:55:43 | |
good...right... | 0:55:43 | 0:55:45 | |
guilty...sick... | 0:55:45 | 0:55:48 | |
the same...like shit... | 0:55:48 | 0:55:51 | |
sorry...well... | 0:55:51 | 0:55:54 | |
and so on. | 0:55:54 | 0:55:56 | |
And we can take a look at any one feeling and analyse it. | 0:55:58 | 0:56:01 | |
Right now a lot of people are feeling happy. | 0:56:01 | 0:56:04 | |
We can take a look at all the people who are happy and break it down by age, gender or location. | 0:56:04 | 0:56:11 | |
Since bloggers have public profiles we have that information and so we can ask questions like, | 0:56:11 | 0:56:16 | |
"Are women happier than men?" or, "Is England happier than the United States?" | 0:56:16 | 0:56:21 | |
We find that, as people get older, they get happier. | 0:56:30 | 0:56:33 | |
And, moreover, we find that for younger people they associate happiness more with excitement, | 0:56:33 | 0:56:40 | |
and, as people get older, they associate happiness more with peacefulness. | 0:56:40 | 0:56:47 | |
And we also find that women feel loved more often than men, but also more guilty. | 0:56:51 | 0:56:57 | |
While men feel good more often than women, but also more alone. | 0:56:57 | 0:57:02 | |
As people lead more and more of their lives online, they leave behind digital traces, | 0:57:06 | 0:57:12 | |
and with these digital traces we can begin to statistically analyse what it means to be human. | 0:57:12 | 0:57:19 | |
So where does all of this leave us? | 0:57:51 | 0:57:54 | |
We generate unimaginable quantities of data about everything you can think of. | 0:57:54 | 0:58:00 | |
We analyse it to reveal the patterns. | 0:58:00 | 0:58:02 | |
And now not only experts but all of us can understand the stories in the numbers. | 0:58:02 | 0:58:10 | |
Instead of being led astray by prejudice, | 0:58:18 | 0:58:21 | |
with statistics at our fingertips, our eyes can be open for a fact-based view of the world. | 0:58:21 | 0:58:28 | |
So, more than ever before, we can become authors of our own destiny. | 0:58:28 | 0:58:33 | |
And that's pretty exciting isn't it?! | 0:58:33 | 0:58:36 | |
# 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 | 0:58:37 | 0:58:44 | |
# 1, 22, 3, 24, 25, 26, 27, 28, 9, 30, 31, 32, 3, 34, 35, 36, 7 | 0:58:44 | 0:58:50 | |
# 38, 39, 40, 41, 42, 3, 44, 45, 46, 47 | 0:58:50 | 0:58:54 | |
LYRICS DEGENERATE INTO GIBBERISH | 0:58:54 | 0:58:58 | |
GIBBERISH DEGENERATES INTO NOISE | 0:59:08 | 0:59:13 | |
# 100. # | 0:59:13 | 0:59:14 |