Lift and Chi Squared Analysis

Drum Roll, and for my last trick, heres the last assignment of the term – Lift Analysis and Chi Squared Analysis. First off, heres two lift analysis algorithm computations.

  • Lift Analysis

Please calculate the following lift values for the table correlating burger and chips below:

 

Lift(Burger, Chips)

Lift(Burgers, ^Chips)

Lift(^Burgers, Chips)

Lift(^Burgers, ^Chips)

 

Please also indicate if each of your answers would suggest independent, positive correlation, or negative correlation?

 

Chips ^Chips Total Row
Burgers 600 400 1000
^Burgers 200 200  400
Total Column 800 600 1400

 

Lift(Burgers, Chips)

(Burgers u Chips) = 600/1400=3/7=0.43

(Burgers) = 1000/1400 = 5/7 = 0.71

(Chips) = 800/1400 = 4/7 = 0.57

LIFT(Burgers, Chips) = 0.43/0.71*0.57 = 0.43/0.4=1.075

LIFT(Burgers, Chips) > 1 means Burgers and Chips are positively correlated.

 

Lift(Burgers, ^Chips)

(Burgers u ^Chips) = 400/1400 = 2/7 = 0.29

(Burgers) = 1000//1400 = 5/7 = 0.71

(^Chips) = 600/1400 = 3/7 = 0.43

LIFT (Burgers, ^Chips) = 0.29/0.71*0.43 = 0.29/0.31 = 0.94

LIFT(Burgers, ^Chips) <1 means Burgers and ^Chips are negatively correlated.

 

Lift(^Burgers, Chips)

(^Burgers u Chips) = 200/1400 = 1/7 = 0.14

(^Burgers) = 400/1400=2/7 = 0.29

(Chips) = 800/1400 = 4/7 = 0.57

LIFT(^Burgers, Chips) = 0.14/0.29*0.57 = 0.14/0.17 = 0.82

LIFT(^Burgers, Chips)<1 means that ^Burgers and Chips are negatively correlated.

 

Lift(^Burgers, ^Chips)

(^Burgers u ^Chips) = 200/1400=1/7=0.14

(^Burgers) = 400/1400 = 2/7=0.29

(^Chips) = 600/1400 = 3/7 = 0.43

LIFT(^Burgers, ^Chips) = 0.14/0.29*0.43=0.14/0.12=1.17

LIFT(^Burgers, ^Chips)>1 meaning that ^Burgers and ^Chips are positively correlated.

 

 

  • Please calculate the following lift values for the table correlating shampoo and ketchup below:

 

  • Lift(Ketchup, Shampoo)
  • Lift(Ketchup, ^Shampoo)
  • Lift(^Ketchup, Shampoo)
  • Lift(^Ketchup, ^Shampoo)

 

Please also indicate if each of your answers would suggest independent, positive correlation, or negative correlation?

 

 

Shampoo ^Shampoo Total Row
Ketchup 100 200 300
^Ketchup 200 400 600
Total Column 300 600 900

 

Lift(^Burgers, ^Chips)

(Ketchup u Shampoo) = 100/900 = 1/9 = 0.11

(Ketchup) 300/900 = 1/3 = 0.33

(Shampoo) 300/900 = 1/3 = 0.33

LIFT (Ketchup, Shampoo) = 0.11/0.33*0.33= 0.11/0.11 =1

LIFT(Ketchup, Shampoo) = 1 meaning that Ketchup and Shampoo are Independent.

 

Lift(Ketchup, ^Shampoo)

(Ketchup u ^Shampoo) = 200/900 = 2/9 = 0.22

(Ketchup) = 300/900 = 1/3=0.33

(Shampoo) = 600/900 = 2/3 = 0.67

LIFT(Ketchup, ^Shampoo)=0.22/0.33*0.67 = 0.22/0.22 = 1

LIFT(Ketchup, ^Shampoo) = 1 meaning that Ketchup and Shampoo are independent.

 

Lift(^Ketchup, Shampoo)

(^Ketchup u Shampoo) = 200/900 = 200/900 = 2/9 = 0.22

(^Ketchup) = 600/900 = 2/3=0.67

(Shampoo)=300/900=1/3=0.33

LIFT (^Ketchup, Shampoo) =0.22/0.67*0.33=0.22/0.22=1

LIFT(Ketchup, Shampoo) = 1 meaning that Ketchup and Shampoo are independent.

 

 

Lift(^Ketchup, ^Shampoo)

(^Ketchup u ^Shampoo)= 400/900= 4/9=0.44

(^Ketchup)= 600/900=2/3=0.67

(^Shampoo) = 600/900 = 2/3=0.67

LIFT(^Ketchup, ^Shampoo) = 0.44/0.67*0.67= 0.44/0.44 = 1

LIFT(Ketchup, Shampoo) = 1 meaning that Ketchup and Shampoo are independent.

math-1141309_1280

OK, now heres how you tackle a question on Chi Squared Analysis.

  • Chi Squared Analysis

Please calculate the following chi squared values for the table correlating burger and chips below (Expected values in brackets).

  • Burgers & Chips
  • Burgers & Not Chips
  • Chips & Not Burgers
  • Not Burgers and Not Chips

 

For the above options, please also indicate if each of your answer would suggest independent, positive correlation, or negative correlation?

 

Chips ^Chips Total Row
Burgers 900 (800) 100 (200) 1000
^Burgers 300 (400) 200 (100)  500
Total Column 1200 300 1500

 

 

Chi Squared = S(observed-expected)2/(expected)

 

X2= (900-800)2/800+(100-200)2/200+(300-400)2/400 + (200-100)2/100

=1002/800 + (-100)2 /200+ (-100)2 / 400+1002/100

=10000/800 + 10000/200+10000/400+10000/100

= 12.5 + 50 + 25 + 100 = 187.5

 

Burgers and Chips are correlated because x2 >0.

As the expected value is 800 and the observed value is 900 we can say that Burgers and Chips are positively correlated.

As the expected value is 200 and the observed value is 100 we can say that Burgers and ^Chips are positively correlated.

As the expected value is 400 and the observed value is 300 we can say that ^Burgers and Chips are positively correlated.

As the expected value is 100 and the observed value is 200 we can say that ^Burgers and ^Chips are positively correlated.

 

 

 

 

 

  • Chi Squared Analysis

Please calculate the following chi squared values for the table correlating burger and sausages below (Expected values in brackets).

 

  • Burgers & Sausages
  • Burgers & Not Sausages)
  • Sausages & Not Burgers
  • Not Burgers and Not Sausages

 

For the above options, please also indicate if each of your answer would suggest independent, positive correlation, or negative correlation?

 

Sausages ^Sausages Total Row
Burgers 800 (800) 200 (200) 1000
^Burgers 400 (400) 100 (100)  500
Total Column 1200 300 1500

 

X2 = (800-800)2 /800 = (200-200)2 /200+ (400-400)2 /400 + (100-100)2 /100

= 02 /800 + 02 /200+ 02 / 400 + 02 /100 = 0

Burgers and Sausages are independent because X2 = 0.

Burgers and Sausages – the observed and expected values are the same (800) and are independent.

Burgers and ^Sausages – the observed and expected values are the same (200) and are independent.

^Burgers and Sausages – the observed and expected values are the same (400) and are independent.

^Burgers and ^Sausages – the observed and expected values are the same (100) and are independent.

 

 

 

  • Q: Under what conditions would Lift and Chi Squared analysis prove to be a poor algorithm to evaluate correlation/dependency between two events?

 

A: Lift and Chi Squared analysis are not the best algorithms to use when there are too many Null Transactions.

 

Q: Please suggest another algorithm that could be used to rectify the flaw in Lift and Chi Squared?

 

A: There are a number of other possible algorithms that could be used – for instance, Jaccard Coefficient, Cosine, AllConf, MaxConf and Kulczynski.

Advertisements

Published by

Data Hothead

Student of Data Analytics, and erstwhile Oracle Applications Consultant.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s