This is a manifest of passengers from on the doomed Titanic cruise. It was downloaded from the Kaggle Machine Learning Challenge in 2014 by Catherine D'Ignazio.

891 rows of data grouped into 12 columns.
Here's some metadata about each column.

.

PassengerId

  • This column is full of numbers
  • The smallest number is 1.0
  • The biggest number is 891.0
  • The total is 397386.0
  • The average is 446.0
  • The median is 446.0
  • The standard deviation is 257.21
  • There are 891 unique values
value frequency
1 - 90 89
90 - 179 89
179 - 268 89
268 - 357 89
357 - 446 89
446 - 535 89
535 - 624 89
624 - 713 89
713 - 802 89
802 - 891 89

PassengerId

  • This column is full of numbers
  • The smallest number is 1.0
  • The biggest number is 891.0
  • The total is 397386.0
  • The average is 446.0
  • The median is 446.0
  • The standard deviation is 257.21
  • There are 891 unique values

Survived

  • This column is full of numbers
  • The most frequent values in this column are:
    • 0.0 (549)
    • 1.0 (342)
value frequency
0.0 549
1.0 342

Survived

  • This column is full of numbers
  • The most frequent values in this column are:
    • 0.0 (549)
    • 1.0 (342)

Pclass

  • This column is full of numbers
  • The most frequent values in this column are:
    • 1.0 (216)
    • 2.0 (184)
    • 3.0 (491)
value frequency
1.0 216
2.0 184
3.0 491

Pclass

  • This column is full of numbers
  • The most frequent values in this column are:
    • 1.0 (216)
    • 2.0 (184)
    • 3.0 (491)

Name

  • This column is full of text
  • The longest string has 82 characters
  • There are 891 unique values
value frequency
mr 521
miss 182
mrs 129
william 64
john 44
master 40
henry 34
george 24
james 24
charles 24
thomas 21
mary 20
edward 18
anna 17
joseph 16
johan 15
frederick 15
elizabeth 15
samuel 13
richard 13

Name

  • This column is full of text
  • The longest string has 82 characters
  • There are 891 unique values

Sex

  • This column is full of text
  • The unique values in this column are:
    • male (577)
    • female (314)
value frequency
male 577
female 314

Sex

  • This column is full of text
  • The unique values in this column are:
    • male (577)
    • female (314)

Age

  • This column is full of numbers
  • The smallest number is 0.42
  • The biggest number is 80.0
  • The total is 21205.17
  • The average is 29.7
  • The median is 28.0
  • The standard deviation is 14.52
  • There are 177 rows of missing data
  • There are 88 unique values
value frequency
0 - 8 54
8 - 16 46
16 - 24 177
24 - 32 169
32 - 40 118
40 - 48 70
48 - 56 45
56 - 64 24
64 - 72 9
72 - 80 1

Age

  • This column is full of numbers
  • The smallest number is 0.42
  • The biggest number is 80.0
  • The total is 21205.17
  • The average is 29.7
  • The median is 28.0
  • The standard deviation is 14.52
  • There are 177 rows of missing data
  • There are 88 unique values

SibSp

  • This column is full of numbers
  • The most frequent values in this column are:
    • 0.0 (608)
    • 1.0 (209)
    • 2.0 (28)
    • 3.0 (16)
    • 4.0 (18)
    • 5.0 (5)
    • 8.0 (7)
value frequency
0.0 608
1.0 209
2.0 28
3.0 16
4.0 18
5.0 5
8.0 7

SibSp

  • This column is full of numbers
  • The most frequent values in this column are:
    • 0.0 (608)
    • 1.0 (209)
    • 2.0 (28)
    • 3.0 (16)
    • 4.0 (18)
    • 5.0 (5)
    • 8.0 (7)

Parch

  • This column is full of numbers
  • The most frequent values in this column are:
    • 0.0 (678)
    • 1.0 (118)
    • 2.0 (80)
    • 3.0 (5)
    • 4.0 (4)
    • 5.0 (5)
    • 6.0 (1)
value frequency
0.0 678
1.0 118
2.0 80
3.0 5
4.0 4
5.0 5
6.0 1

Parch

  • This column is full of numbers
  • The most frequent values in this column are:
    • 0.0 (678)
    • 1.0 (118)
    • 2.0 (80)
    • 3.0 (5)
    • 4.0 (4)
    • 5.0 (5)
    • 6.0 (1)

Ticket

  • This column is full of numbers
  • The smallest number is 693.0
  • The biggest number is 3101298.0
  • The total is 172070561.0
  • The average is 260318.55
  • The median is 3101265.0
  • The standard deviation is 471252.39
  • There are 514 unique values
value frequency
693 - 310754 389
310754 - 620814 256
620814 - 930874 0
930874 - 1240935 0
1240935 - 1550996 0
1550996 - 1861056 0
1861056 - 2171116 0
2171116 - 2481177 0
2481177 - 2791238 0
2791238 - 3101298 15

Ticket

  • This column is full of numbers
  • The smallest number is 693.0
  • The biggest number is 3101298.0
  • The total is 172070561.0
  • The average is 260318.55
  • The median is 3101265.0
  • The standard deviation is 471252.39
  • There are 514 unique values

Fare

  • This column is full of numbers
  • The smallest number is 0.0
  • The biggest number is 512.3292
  • The total is 28693.95
  • The average is 32.2
  • The median is 14.45
  • The standard deviation is 49.67
  • There are 248 unique values
value frequency
0 - 51 732
51 - 102 106
102 - 154 31
154 - 205 2
205 - 256 11
256 - 307 6
307 - 359 0
359 - 410 0
410 - 461 0
461 - 512 0

Fare

  • This column is full of numbers
  • The smallest number is 0.0
  • The biggest number is 512.3292
  • The total is 28693.95
  • The average is 32.2
  • The median is 14.45
  • The standard deviation is 49.67
  • There are 248 unique values

Cabin

  • This column is full of text
  • The most frequent values in this column are:
    • B96 B98 (4)
    • C23 C25 C27 (4)
    • G6 (4)
    • C22 C26 (3)
    • D (3)
  • The longest string has 15 characters
  • There are 687 rows of missing data
  • There are 147 unique values
value frequency
B96 B98 4
C23 C25 C27 4
G6 4
C22 C26 3
D 3
Other 186

Cabin

  • This column is full of text
  • The most frequent values in this column are:
    • B96 B98 (4)
    • C23 C25 C27 (4)
    • G6 (4)
    • C22 C26 (3)
    • D (3)
  • The longest string has 15 characters
  • There are 687 rows of missing data
  • There are 147 unique values

Embarked

  • This column is full of text
  • The unique values in this column are:
    • S (644)
    • C (168)
    • Q (77)
  • There are 2 rows of missing data
value frequency
S 644
C 168
Q 77

Embarked

  • This column is full of text
  • The unique values in this column are:
    • S (644)
    • C (168)
    • Q (77)
  • There are 2 rows of missing data

What do I do next?

Understanding the data in your csv file is the first step in analyzing it for stories. Looking at individual columns can help you identify questions that might be fun to ask about your data. For instance, is it surprising that "1.0" is the most frequent value in the Pclass column? Does it make any sense to compare the Age column to the PassengerId column? Are there any other datasets you could find to ask interesting questions about the Pclass column?

Asking these types of questions is the first step in understanding the data you have, and what kind of stories you can find in it. Check out our activity guide for more help on asking questions of data sets.

Try these other tools to do more full-fledged analysis: