WTFcsv: Titanic Passengers.csv

This is a manifest of passengers from on the doomed Titanic cruise. It was downloaded from the Kaggle Machine Learning Challenge in 2014 by Catherine D'Ignazio.

891 rows of data grouped into 12 columns.
Here's some metadata about each column.

PassengerId

This column is full of numbers
The smallest number is 1.0
The biggest number is 891.0
The total is 397386.0
The average is 446.0
The median is 446.0
The standard deviation is 257.21
There are 891 unique values

value	frequency
1 - 90	89
90 - 179	89
179 - 268	89
268 - 357	89
357 - 446	89
446 - 535	89
535 - 624	89
624 - 713	89
713 - 802	89
802 - 891	89

PassengerId

This column is full of numbers
The smallest number is 1.0
The biggest number is 891.0
The total is 397386.0
The average is 446.0
The median is 446.0
The standard deviation is 257.21
There are 891 unique values

Survived

This column is full of numbers
The most frequent values in this column are:
- 0.0 (549)
- 1.0 (342)

value	frequency
0.0	549
1.0	342

Survived

This column is full of numbers
The most frequent values in this column are:
- 0.0 (549)
- 1.0 (342)

Pclass

This column is full of numbers
The most frequent values in this column are:
- 1.0 (216)
- 2.0 (184)
- 3.0 (491)

value	frequency
1.0	216
2.0	184
3.0	491

Pclass

This column is full of numbers
The most frequent values in this column are:
- 1.0 (216)
- 2.0 (184)
- 3.0 (491)

Name

This column is full of text
The longest string has 82 characters
There are 891 unique values

value	frequency
mr	521
miss	182
mrs	129
william	64
john	44
master	40
henry	34
george	24
james	24
charles	24
thomas	21
mary	20
edward	18
anna	17
joseph	16
johan	15
frederick	15
elizabeth	15
samuel	13
richard	13

Name

This column is full of text
The longest string has 82 characters
There are 891 unique values

Sex

This column is full of text
The unique values in this column are:
- male (577)
- female (314)

value	frequency
male	577
female	314

Sex

This column is full of text
The unique values in this column are:
- male (577)
- female (314)

Age

This column is full of numbers
The smallest number is 0.42
The biggest number is 80.0
The total is 21205.17
The average is 29.7
The median is 28.0
The standard deviation is 14.52
There are 177 rows of missing data
There are 88 unique values

value	frequency
0 - 8	54
8 - 16	46
16 - 24	177
24 - 32	169
32 - 40	118
40 - 48	70
48 - 56	45
56 - 64	24
64 - 72	9
72 - 80	1

Age

This column is full of numbers
The smallest number is 0.42
The biggest number is 80.0
The total is 21205.17
The average is 29.7
The median is 28.0
The standard deviation is 14.52
There are 177 rows of missing data
There are 88 unique values

SibSp

This column is full of numbers
The most frequent values in this column are:
- 0.0 (608)
- 1.0 (209)
- 2.0 (28)
- 3.0 (16)
- 4.0 (18)
- 5.0 (5)
- 8.0 (7)

value	frequency
0.0	608
1.0	209
2.0	28
3.0	16
4.0	18
5.0	5
8.0	7

SibSp

This column is full of numbers
The most frequent values in this column are:
- 0.0 (608)
- 1.0 (209)
- 2.0 (28)
- 3.0 (16)
- 4.0 (18)
- 5.0 (5)
- 8.0 (7)

Parch

This column is full of numbers
The most frequent values in this column are:
- 0.0 (678)
- 1.0 (118)
- 2.0 (80)
- 3.0 (5)
- 4.0 (4)
- 5.0 (5)
- 6.0 (1)

value	frequency
0.0	678
1.0	118
2.0	80
3.0	5
4.0	4
5.0	5
6.0	1

Parch

This column is full of numbers
The most frequent values in this column are:
- 0.0 (678)
- 1.0 (118)
- 2.0 (80)
- 3.0 (5)
- 4.0 (4)
- 5.0 (5)
- 6.0 (1)

Ticket

This column is full of numbers
The smallest number is 693.0
The biggest number is 3101298.0
The total is 172070561.0
The average is 260318.55
The median is 3101265.0
The standard deviation is 471252.39
There are 514 unique values

value	frequency
693 - 310754	389
310754 - 620814	256
620814 - 930874	0
930874 - 1240935	0
1240935 - 1550996	0
1550996 - 1861056	0
1861056 - 2171116	0
2171116 - 2481177	0
2481177 - 2791238	0
2791238 - 3101298	15

Ticket

This column is full of numbers
The smallest number is 693.0
The biggest number is 3101298.0
The total is 172070561.0
The average is 260318.55
The median is 3101265.0
The standard deviation is 471252.39
There are 514 unique values

Fare

This column is full of numbers
The smallest number is 0.0
The biggest number is 512.3292
The total is 28693.95
The average is 32.2
The median is 14.45
The standard deviation is 49.67
There are 248 unique values

value	frequency
0 - 51	732
51 - 102	106
102 - 154	31
154 - 205	2
205 - 256	11
256 - 307	6
307 - 359	0
359 - 410	0
410 - 461	0
461 - 512	0

Fare

This column is full of numbers
The smallest number is 0.0
The biggest number is 512.3292
The total is 28693.95
The average is 32.2
The median is 14.45
The standard deviation is 49.67
There are 248 unique values

Cabin

This column is full of text
The most frequent values in this column are:
- B96 B98 (4)
- C23 C25 C27 (4)
- G6 (4)
- C22 C26 (3)
- D (3)
The longest string has 15 characters
There are 687 rows of missing data
There are 147 unique values

value	frequency
B96 B98	4
C23 C25 C27	4
G6	4
C22 C26	3
D	3
Other	186

Cabin

This column is full of text
The most frequent values in this column are:
- B96 B98 (4)
- C23 C25 C27 (4)
- G6 (4)
- C22 C26 (3)
- D (3)
The longest string has 15 characters
There are 687 rows of missing data
There are 147 unique values

Embarked

This column is full of text
The unique values in this column are:
- S (644)
- C (168)
- Q (77)
There are 2 rows of missing data

value	frequency
S	644
C	168
Q	77

Embarked

This column is full of text
The unique values in this column are:
- S (644)
- C (168)
- Q (77)
There are 2 rows of missing data

What do I do next?

Understanding the data in your csv file is the first step in analyzing it for stories. Looking at individual columns can help you identify questions that might be fun to ask about your data. For instance, is it surprising that "0.0" is the most frequent value in the SibSp column? Does it make any sense to compare the Sex column to the Survived column? Are there any other datasets you could find to ask interesting questions about the SibSp column?

Asking these types of questions is the first step in understanding the data you have, and what kind of stories you can find in it. Check out our activity guide for more help on asking questions of data sets.

Try these other tools to do more full-fledged analysis:

Titanic Passengers.csv × Share your results You can share these results by emailing this url:

PassengerId

PassengerId

Survived

Survived

Pclass

Pclass

Name

Name

Sex

Sex

Age

Age

SibSp

SibSp

Parch

Parch

Ticket

Ticket

Fare

Fare

Cabin

Cabin

Embarked

Embarked

What do I do next?

Titanic Passengers.csv

Share your results

You can share these results by emailing this url: