Archive for January, 2012

h1

Lesson 5: Statistics, Humour and Quotations (IV)

January 25, 2012

This a good joke about the feelings that Statistics often inspire

Finally, Statistics provided a great number of quotations that can make you think about. These are some of my favourites:

“There are three kinds of lies: lies, damned lies, and statistics”.

Benjamin Disraeli
British politician (1804 – 1881)

“USA Today has come out with a new survey – apparently, three out of every four people make up 75% of the population”.

David Letterman (1947 – )

“Statistician: A man who believes figures don’t lie, but admits that under analysis some of them won’t stand up either.”

Evan Esar (1899 – 1995), Esar’s Comic Dictionary

“Statistics: The only science that enables different experts using the same figures to draw different conclusions.”

Evan Esar (1899 – 1995), Esar’s Comic Dictionary

“A single death is a tragedy; a million deaths is a statistic.”

Joseph Stalin (1879 – 1953)

 

 

h1

Lesson 5: Statistics. Mean, Median and Mode (III)

January 18, 2012

4. Averages or central tendency measurements

When you are carrying out a statistical study your first task is organising data. Secondly we will try to interpret data and draw some conclusions. Average or central tendency measurements are different ways of determining or indicating which value from the information is the central value. We are going to study:

  • Arithmetic mean
  • Median
  • Mode

4.1. Arithmetic mean 

Arithmetic mean, or mean for short, of a set of data is the quotient of the sum of all the data divided by the number of data. If the variable considered is named  , it is represented by .

As a consequence of its definition it can only be calculated in quantitative variables.

To calculate the mean we have to:

  • Add up all the values of the variable if a value is repeated, include both or all the values in the sum.
  • Divide the sum above  by the total number of data

Abbreviated calculations for finding the mean

Usually some values of the variable are repeated, so we can calculate their frequencies and make calculations easier.

Let xi, , x2 ,…xn ,  be the (different) values of the variable. Let, fi, , f2 ,…fn  be their corresponding absolute frequencies.

To calculate the mean:

  • Multiply each different data by its frequency.
  • Add up all of them (last column at the bottom)
  • Divide the sum before by the total number of data

This is:

x = (xi,· fi, +  x·, f2 + …+ xn·fn)/N

 4.2. Median

The median of a set of data Me, is the central value of the data, this means there are as many values less than the median as there are greater than it.

It can only be calculated in quantitative variables, it is unique and it can’t coincide with any other data.

 Calculation of the median

First, we order data from smallest to biggest (in ascending order). There are two cases:

  • If the total number of data is odd, the median is the value that occupies the central position.
  • If the total number of data is even, the median is the arithmetic mean of the two values that occupy the central positions.

4.3. Mode

 The mode in a set of data, Mo refers to the value of the set of data that occurs most frequently.

This measurement can be calculated with any variable, qualitative or quantitative.

It is important to note that there can be more than one mode and if no number occurs more than once in the set, then there is no mode for that set of numbers.

Calculation of the Mode

  • Given a set of data, the mode is the value of the set of data that occurs most frequently. If we are given a table of frequencies, the mode is the value of the variable whose absolute frequency is the greatest. If there are two values with maximum frequency, then the distribution is bimodal, if there are several values with maximum frequency, then the distribution is multimodal.
  • If we are given a bar chart, the mode is the data corresponding to the highest bar

If we are given a pie chart, the mode is the data corresponding to the circular sector of largest degree of angle.

The links below provide you some practice:

The video below gives you a quick explanation about the averages by using a song. Try to understand it! Post your translation the first and you will get an extra grade!

h1

Lesson 5: Tabulation, Frequencies and Graphs (II)

January 17, 2012

2. Tabulation. Frequency distribution table

In any statistical investigation, the collection of the numerical data is the first and the most important matter to be attended

2.1.Counting data: tabulation

This is the process of condensation of the data for convenience, in statistical processing, presentation and interpretation of the information.

A tally mark|” is put next to the variable each time a value for that variable appears,

 Frequency Distribution is a tallying of the number of times (frequency) each score value (or interval of score values) is represented within a group of scores.

 A frequency distribution is ungrouped if the frequency of each score value or piece of data is given

2.2. Absolute frequency and relative frequency

Absolute frequency or frequency of a statistical datum, xi, is the number of times this datum appears. It is represented by fi.

fi. is the absolute frequency of xi

 PROPERTY: The sum of the absolute frequencies of a set of statistical data is the total number of data, N.

f1.+ f2.+ ….+ fn.= N

 The relative frequency of a statistical datum is the frequency of the datum divided by the total number of data. Many times it is expressed as a percentage. It is represented by hi

hi = fi./Nis the relative frequency of xi,

PROPERTY: The sum of the relative frequencies of a set of statistical data is equal to one. 

h1.+ h2.+ ….+ hn.= 1 

2.3 Cumulative Frequency

Cumulative absolute frequency of a datum xi,  is the sum of all the absolute frequencies of all values less than or equal to . It is represented by Fi .

Fi = f1.+ f2.+ ….+ fi.

 Cumulative relative frequency of a datum xi,  is the sum of all the relative frequencies of all values less than or equal to xi,. It is represented by Hi

 PROPERTY: The cumulative relative frequency of a datum xi,, Hi., is equivalent to the cumulative absolute frequency of the datum xi,  divided by the total number of datum.

Hi.= h1.+ h2.+ ….+ hi = f1./N+ f2./N+ ….+ fi /N = Fi /N

 To work out the cumulative frequencies the data must be ordered, this is the statistical variable must be quantitative.

On the directions below you can practice with absolute frquencies:

Interpret tables

Create frequency charts

3. Statistical graphs

We can also organize data using graphs. Graphs are a visual method to summarize data and allow the observer to see the relevant features of the statistical study.

4.1 Bar charts

Bar charts or bar graphs consist of a vertical axis and a series of labelled or vertical bars that show the different values of the variable for each bar. The numbers along the vertical axis of the bar graph are called the scale and show the frequency of each value.

They are used when the variable has several different values.

They are used for qualitative variables and discrete quantitative variables (but ungrouped).

So we follow these steps:

  • We write the values of the variable on the X axis
  • We write the frequency with the appropriate scale on the Y axis.
  • The frequency corresponding to a value is represented by a bar. The heights of the bars are proportional to the frequency.

When the variable is quantitative, the tops of the bars can be joined by segments obtaining a polygonal line named frequency polygon.

You can practice with bar charts on the pages below

The video below tries to teach you how to avoid misleadings in bar charts (or bar graphs)

3.2. Pie charts

A pie chart is a circle graph divided into pieces, or circular sectors, each displaying the size of some related piece of information. Pie charts are used to display the sizes of parts that make up some whole.

They can be used for any type of variable (qualitative or quantitative).

The measurement in grades of each circular sector is directly proportional to the frequency of the value of the variable that it represents.

If we set up a rule of three direct, we get the degrees corresponding to each xi:

                   Number of data                             Degrees

                                    N  ———————————–  360º

                                   fi  ——————————————— 

Then:

aº = (f/ N) · 360º 

This way the amplitude of the circular sector in degrees can be obtained by applying this formula:

Degree of the central angle of the sector = (f/ N) · 360º = hi

 It is advisable to use the rule of three instead of the formula. You will need a protractor to draw pie charts.

On these links you can practice pie charts

The video below shows how to create a pie chart (or pie graph or circle graph!)

h1

Lesson 5: Statistic Variables (I).

January 17, 2012

Vocabulary on Statistics

Lesson 5: Statistics (Notes)

Three Statistics Worksheets

I Have to Know by the  End of this Lesson

 

1. Introduction

The word ‘Statistics’ is derived from the Latin word ‘Statis’ which means a “political state.” Clearly, statistics is closely linked with the administrative affairs of a state such as facts and figures regarding defence force, population, housing, food, financial resources etc. What is true about a government is also true about industrial administration units, and even one’s personal life.

The word statistics has several meanings. In the first place, it is a plural noun which describes a collection of numerical data such as employment statistics, accident statistics, population statistics, birth and death, income and expenditure, of exports and imports etc. It is in this sense that the word ‘statistics’ is used by a layman or a newspaper.

Secondly the word statistics as a singular noun is used to describe a branch of applied mathematics, whose purpose is to provide methods of dealing with a collection of data and extracting information from them in compact form by tabulating, summarizing and analyzing the numerical data or a set of observations. In this case it is written “Statistics” with capital letter.

 The word ‘Statistics’ in the second sense is defined as follows:-

“The collection, presentation, analysis and interpretation of numerical data.”

1.1. Statistical terms

Statistics/ Estadísticas.  Statistics is the use of data to help the decision maker to reach better decisions.

Data/ Datos. It is any group of measurements that interests us. These measurements provide information for the decision maker. The data that reflects non-numerical features or qualities of the experimental units, is known as qualitative data. The data that possesses numerical properties is known as quantitative data.

Population/Población. Any well defined set of objects about which a statistical enquiry is being made is called a population or universe.The total number of objects (individuals) in a population is known as the size of the population. This may be finite or infinite.

Individual/Individuo .Each object belonging to a population is called as an individual of the population.

Sample/Muestra .A finite set of objects drawn from the population with a particular aim, is called a sample.The total number of individuals in a sample is called the sample size.

Statistical variable/Variable estadística. The information required from an individual, from a population or from a sample, during the statistical enquiry (survey) is known as the statistical variable. It is either numerical or non-numerical. For e.g. the size of shoes is a numerical variable which refers to a quantity, whereas the mother tongue of a person is a non-numerical variable which refers to a quality. Thus we have quantitative and qualitative types of variables.

1.2. Types of statistical variables

Statistical variables may be classified as either quantitative or qualitative:

Type Definition Example
Qualitative A variable that is expressed in categories rather than numbers. It has no natural sense of ordering. The mother tongue of a person, the colour of the eyes or the hair colour of a person
Quantitative  A variable of an individual which can be expressed numerically. It may take different values at different times, places, or situations. Data measures either how much or how many of something The height of a person, the number of members of a family

Quantitative data or quantitative variables can be divided into:

Type Definition Example
Discrete variables The set of all possible values which consists only of isolated points, e.g. counting variables (1, 2, 3 …). The number of children in a class
Continuous variables The set of all values which consists of intervals, e.g. 0-9, 10-19, 20-29… etc. The weight of a person

Qualitative data or qualitative variables can be divided into:

Type Definition Example
Nominal variables Variables with no inherent order or ranking sequence, e.g. numbers used as names (group 1, group 2…) Gender, race
Ordinal variables Variables with an ordered series  “Greatly dislike, moderately dislike, indifferent, moderately like, greatly like”. Numbers assigned to such variables indicate rank order only – the “distance” between the numbers has no meaning.

EXERCISE 

This is a list of statistical variables. Classify them and post your answer.

Individual Characteristic we are studying (Variable)    

 1. A person         

 Waist  in cm   ®  
  Colour of skin   ®  
  Age   ®  
  Weight in lbs   ®  
   Sex   ®  
  Mother tongue   ®  
  Marks in statistics   ®  
         

2. Mr. Brown’s family            

 Number of members   ®  
   Monthly income in dollars   ®  

3. A washer         

Diameter and thickness in cm   ®  
  Defective or non-defective   ®