Drum Roll, and for my last trick, heres the last assignment of the term – Lift Analysis and Chi Squared Analysis. First off, heres two lift analysis algorithm computations.

**Lift Analysis**

Please calculate the following lift values for the table correlating burger and chips below:

**Lift(Burger, Chips)**

**Lift(Burgers, ^Chips)**

**Lift(^Burgers, Chips)**

**Lift(^Burgers, ^Chips)**

Please also indicate if each of your answers would suggest independent, positive correlation, or negative correlation?

Chips | ^Chips | Total Row | |

Burgers | 600 | 400 | 1000 |

^Burgers | 200 | 200 | 400 |

Total Column | 800 | 600 | 1400 |

**Lift(Burgers, Chips)**

(Burgers u Chips) = 600/1400=3/7=0.43

(Burgers) = 1000/1400 = 5/7 = 0.71

(Chips) = 800/1400 = 4/7 = 0.57

LIFT(Burgers, Chips) = 0.43/0.71*0.57 = 0.43/0.4=1.075

**LIFT(Burgers, Chips) > 1 means Burgers and Chips are positively correlated.**

**Lift(Burgers, ^Chips)**

(Burgers u ^Chips) = 400/1400 = 2/7 = 0.29

(Burgers) = 1000//1400 = 5/7 = 0.71

(^Chips) = 600/1400 = 3/7 = 0.43

LIFT (Burgers, ^Chips) = 0.29/0.71*0.43 = 0.29/0.31 = 0.94

**LIFT(Burgers, ^Chips) <1 means Burgers and ^Chips are negatively correlated.**

** **

**Lift(^Burgers, Chips)**

(^Burgers u Chips) = 200/1400 = 1/7 = 0.14

(^Burgers) = 400/1400=2/7 = 0.29

(Chips) = 800/1400 = 4/7 = 0.57

LIFT(^Burgers, Chips) = 0.14/0.29*0.57 = 0.14/0.17 = 0.82

**LIFT(^Burgers, Chips)<1 means that ^Burgers and Chips are negatively correlated.**

** **

**Lift(^Burgers, ^Chips)**

(^Burgers u ^Chips) = 200/1400=1/7=0.14

(^Burgers) = 400/1400 = 2/7=0.29

(^Chips) = 600/1400 = 3/7 = 0.43

LIFT(^Burgers, ^Chips) = 0.14/0.29*0.43=0.14/0.12=1.17

**LIFT(^Burgers, ^Chips)>1 meaning that ^Burgers and ^Chips are positively correlated.**

**Please calculate the following lift values for the table correlating shampoo and ketchup below:**

** **

**Lift(Ketchup, Shampoo)****Lift(Ketchup, ^Shampoo)****Lift(^Ketchup, Shampoo)****Lift(^Ketchup, ^Shampoo)**

Please also indicate if each of your answers would suggest independent, positive correlation, or negative correlation?

Shampoo | ^Shampoo | Total Row | |

Ketchup | 100 | 200 | 300 |

^Ketchup | 200 | 400 | 600 |

Total Column | 300 | 600 | 900 |

**Lift(^Burgers, ^Chips)**

(Ketchup u Shampoo) = 100/900 = 1/9 = 0.11

(Ketchup) 300/900 = 1/3 = 0.33

(Shampoo) 300/900 = 1/3 = 0.33

LIFT (Ketchup, Shampoo) = 0.11/0.33*0.33= 0.11/0.11 =1

**LIFT(Ketchup, Shampoo) = 1 meaning that Ketchup and Shampoo are Independent. **

** **

**Lift(Ketchup, ^Shampoo)**

(Ketchup u ^Shampoo) = 200/900 = 2/9 = 0.22

(Ketchup) = 300/900 = 1/3=0.33

(Shampoo) = 600/900 = 2/3 = 0.67

LIFT(Ketchup, ^Shampoo)=0.22/0.33*0.67 = 0.22/0.22 = 1

**LIFT(Ketchup, ^Shampoo) = 1 meaning that Ketchup and Shampoo are independent.**

**Lift(^Ketchup, Shampoo)**

(^Ketchup u Shampoo) = 200/900 = 200/900 = 2/9 = 0.22

(^Ketchup) = 600/900 = 2/3=0.67

(Shampoo)=300/900=1/3=0.33

LIFT (^Ketchup, Shampoo) =0.22/0.67*0.33=0.22/0.22=1

**LIFT(Ketchup, Shampoo) = 1 meaning that Ketchup and Shampoo are independent.**

** **

** **

**Lift(^Ketchup, ^Shampoo)**

(^Ketchup u ^Shampoo)= 400/900= 4/9=0.44

(^Ketchup)= 600/900=2/3=0.67

(^Shampoo) = 600/900 = 2/3=0.67

LIFT(^Ketchup, ^Shampoo) = 0.44/0.67*0.67= 0.44/0.44 = 1

**LIFT(Ketchup, Shampoo) = 1 meaning that Ketchup and Shampoo are independent.**

OK, now heres how you tackle a question on Chi Squared Analysis.

**Chi Squared Analysis**

Please calculate the following chi squared values for the table correlating burger and chips below (Expected values in brackets).

- Burgers & Chips
- Burgers & Not Chips
- Chips & Not Burgers
- Not Burgers and Not Chips

For the above options, please also indicate if each of your answer would suggest independent, positive correlation, or negative correlation?

Chips | ^Chips | Total Row | |

Burgers | 900 (800) | 100 (200) | 1000 |

^Burgers | 300 (400) | 200 (100) | 500 |

Total Column | 1200 | 300 | 1500 |

Chi Squared = S(observed-expected)^{2}/(expected)

X^{2}= (900-800)^{2}/800+(100-200)^{2}/200+(300-400)^{2}/400 + (200-100)^{2}/100

=100^{2}/800 + (-100)^{2 }/200+ (-100)^{2 }/ 400+100^{2}/100

=10000/800 + 10000/200+10000/400+10000/100

= 12.5 + 50 + 25 + 100 = 187.5

Burgers and Chips are correlated because x_{2} >0.

As the expected value is 800 and the observed value is 900 we can say that Burgers and Chips are positively correlated.

As the expected value is 200 and the observed value is 100 we can say that Burgers and ^Chips are positively correlated.

As the expected value is 400 and the observed value is 300 we can say that ^Burgers and Chips are positively correlated.

As the expected value is 100 and the observed value is 200 we can say that ^Burgers and ^Chips are positively correlated.

**Chi Squared Analysis**

Please calculate the following chi squared values for the table correlating burger and sausages below (Expected values in brackets).

- Burgers & Sausages
- Burgers & Not Sausages)
- Sausages & Not Burgers
- Not Burgers and Not Sausages

For the above options, please also indicate if each of your answer would suggest independent, positive correlation, or negative correlation?

Sausages | ^Sausages | Total Row | |

Burgers | 800 (800) | 200 (200) | 1000 |

^Burgers | 400 (400) | 100 (100) | 500 |

Total Column | 1200 | 300 | 1500 |

X^{2} = (800-800)^{2} /800 = (200-200)^{2} /200+ (400-400)^{2} /400 + (100-100)^{2} /100

= 0^{2} /800 + 0^{2} /200+ 0^{2} / 400 + 0^{2 }/100 = 0

Burgers and Sausages are independent because X^{2} = 0.

Burgers and Sausages – the observed and expected values are the same (800) and are independent.

Burgers and ^Sausages – the observed and expected values are the same (200) and are independent.

^Burgers and Sausages – the observed and expected values are the same (400) and are independent.

^Burgers and ^Sausages – the observed and expected values are the same (100) and are independent.

- Q: Under what conditions would Lift and Chi Squared analysis prove to be a poor algorithm to evaluate correlation/dependency between two events?

A: Lift and Chi Squared analysis are not the best algorithms to use when there are too many Null Transactions.

Q: Please suggest another algorithm that could be used to rectify the flaw in Lift and Chi Squared?

A: There are a number of other possible algorithms that could be used – for instance, Jaccard Coefficient, Cosine, AllConf, MaxConf and Kulczynski.