Lift and Chi Squared Analysis

Drum Roll, and for my last trick, heres the last assignment of the term – Lift Analysis and Chi Squared Analysis. First off, heres two lift analysis algorithm computations.

  • Lift Analysis

Please calculate the following lift values for the table correlating burger and chips below:

 

Lift(Burger, Chips)

Lift(Burgers, ^Chips)

Lift(^Burgers, Chips)

Lift(^Burgers, ^Chips)

 

Please also indicate if each of your answers would suggest independent, positive correlation, or negative correlation?

 

Chips ^Chips Total Row
Burgers 600 400 1000
^Burgers 200 200  400
Total Column 800 600 1400

 

Lift(Burgers, Chips)

(Burgers u Chips) = 600/1400=3/7=0.43

(Burgers) = 1000/1400 = 5/7 = 0.71

(Chips) = 800/1400 = 4/7 = 0.57

LIFT(Burgers, Chips) = 0.43/0.71*0.57 = 0.43/0.4=1.075

LIFT(Burgers, Chips) > 1 means Burgers and Chips are positively correlated.

 

Lift(Burgers, ^Chips)

(Burgers u ^Chips) = 400/1400 = 2/7 = 0.29

(Burgers) = 1000//1400 = 5/7 = 0.71

(^Chips) = 600/1400 = 3/7 = 0.43

LIFT (Burgers, ^Chips) = 0.29/0.71*0.43 = 0.29/0.31 = 0.94

LIFT(Burgers, ^Chips) <1 means Burgers and ^Chips are negatively correlated.

 

Lift(^Burgers, Chips)

(^Burgers u Chips) = 200/1400 = 1/7 = 0.14

(^Burgers) = 400/1400=2/7 = 0.29

(Chips) = 800/1400 = 4/7 = 0.57

LIFT(^Burgers, Chips) = 0.14/0.29*0.57 = 0.14/0.17 = 0.82

LIFT(^Burgers, Chips)<1 means that ^Burgers and Chips are negatively correlated.

 

Lift(^Burgers, ^Chips)

(^Burgers u ^Chips) = 200/1400=1/7=0.14

(^Burgers) = 400/1400 = 2/7=0.29

(^Chips) = 600/1400 = 3/7 = 0.43

LIFT(^Burgers, ^Chips) = 0.14/0.29*0.43=0.14/0.12=1.17

LIFT(^Burgers, ^Chips)>1 meaning that ^Burgers and ^Chips are positively correlated.

 

 

  • Please calculate the following lift values for the table correlating shampoo and ketchup below:

 

  • Lift(Ketchup, Shampoo)
  • Lift(Ketchup, ^Shampoo)
  • Lift(^Ketchup, Shampoo)
  • Lift(^Ketchup, ^Shampoo)

 

Please also indicate if each of your answers would suggest independent, positive correlation, or negative correlation?

 

 

Shampoo ^Shampoo Total Row
Ketchup 100 200 300
^Ketchup 200 400 600
Total Column 300 600 900

 

Lift(^Burgers, ^Chips)

(Ketchup u Shampoo) = 100/900 = 1/9 = 0.11

(Ketchup) 300/900 = 1/3 = 0.33

(Shampoo) 300/900 = 1/3 = 0.33

LIFT (Ketchup, Shampoo) = 0.11/0.33*0.33= 0.11/0.11 =1

LIFT(Ketchup, Shampoo) = 1 meaning that Ketchup and Shampoo are Independent.

 

Lift(Ketchup, ^Shampoo)

(Ketchup u ^Shampoo) = 200/900 = 2/9 = 0.22

(Ketchup) = 300/900 = 1/3=0.33

(Shampoo) = 600/900 = 2/3 = 0.67

LIFT(Ketchup, ^Shampoo)=0.22/0.33*0.67 = 0.22/0.22 = 1

LIFT(Ketchup, ^Shampoo) = 1 meaning that Ketchup and Shampoo are independent.

 

Lift(^Ketchup, Shampoo)

(^Ketchup u Shampoo) = 200/900 = 200/900 = 2/9 = 0.22

(^Ketchup) = 600/900 = 2/3=0.67

(Shampoo)=300/900=1/3=0.33

LIFT (^Ketchup, Shampoo) =0.22/0.67*0.33=0.22/0.22=1

LIFT(Ketchup, Shampoo) = 1 meaning that Ketchup and Shampoo are independent.

 

 

Lift(^Ketchup, ^Shampoo)

(^Ketchup u ^Shampoo)= 400/900= 4/9=0.44

(^Ketchup)= 600/900=2/3=0.67

(^Shampoo) = 600/900 = 2/3=0.67

LIFT(^Ketchup, ^Shampoo) = 0.44/0.67*0.67= 0.44/0.44 = 1

LIFT(Ketchup, Shampoo) = 1 meaning that Ketchup and Shampoo are independent.

math-1141309_1280

OK, now heres how you tackle a question on Chi Squared Analysis.

  • Chi Squared Analysis

Please calculate the following chi squared values for the table correlating burger and chips below (Expected values in brackets).

  • Burgers & Chips
  • Burgers & Not Chips
  • Chips & Not Burgers
  • Not Burgers and Not Chips

 

For the above options, please also indicate if each of your answer would suggest independent, positive correlation, or negative correlation?

 

Chips ^Chips Total Row
Burgers 900 (800) 100 (200) 1000
^Burgers 300 (400) 200 (100)  500
Total Column 1200 300 1500

 

 

Chi Squared = S(observed-expected)2/(expected)

 

X2= (900-800)2/800+(100-200)2/200+(300-400)2/400 + (200-100)2/100

=1002/800 + (-100)2 /200+ (-100)2 / 400+1002/100

=10000/800 + 10000/200+10000/400+10000/100

= 12.5 + 50 + 25 + 100 = 187.5

 

Burgers and Chips are correlated because x2 >0.

As the expected value is 800 and the observed value is 900 we can say that Burgers and Chips are positively correlated.

As the expected value is 200 and the observed value is 100 we can say that Burgers and ^Chips are positively correlated.

As the expected value is 400 and the observed value is 300 we can say that ^Burgers and Chips are positively correlated.

As the expected value is 100 and the observed value is 200 we can say that ^Burgers and ^Chips are positively correlated.

 

 

 

 

 

  • Chi Squared Analysis

Please calculate the following chi squared values for the table correlating burger and sausages below (Expected values in brackets).

 

  • Burgers & Sausages
  • Burgers & Not Sausages)
  • Sausages & Not Burgers
  • Not Burgers and Not Sausages

 

For the above options, please also indicate if each of your answer would suggest independent, positive correlation, or negative correlation?

 

Sausages ^Sausages Total Row
Burgers 800 (800) 200 (200) 1000
^Burgers 400 (400) 100 (100)  500
Total Column 1200 300 1500

 

X2 = (800-800)2 /800 = (200-200)2 /200+ (400-400)2 /400 + (100-100)2 /100

= 02 /800 + 02 /200+ 02 / 400 + 02 /100 = 0

Burgers and Sausages are independent because X2 = 0.

Burgers and Sausages – the observed and expected values are the same (800) and are independent.

Burgers and ^Sausages – the observed and expected values are the same (200) and are independent.

^Burgers and Sausages – the observed and expected values are the same (400) and are independent.

^Burgers and ^Sausages – the observed and expected values are the same (100) and are independent.

 

 

 

  • Q: Under what conditions would Lift and Chi Squared analysis prove to be a poor algorithm to evaluate correlation/dependency between two events?

 

A: Lift and Chi Squared analysis are not the best algorithms to use when there are too many Null Transactions.

 

Q: Please suggest another algorithm that could be used to rectify the flaw in Lift and Chi Squared?

 

A: There are a number of other possible algorithms that could be used – for instance, Jaccard Coefficient, Cosine, AllConf, MaxConf and Kulczynski.

Getting Predictive Analytics Right – The Challenges

ball-958950_1280

Challenges ahead for businesses implementing Predictive Analytics.

Predictive Analytics is one tool that many organisations are turning to to enhance their bottom line and gain competitive advantage. In recent years its moved away somewhat from being solely the domain of mathematicians and statisticians. Modern software has and continues to make it more accessible for organisations, though getting the best from it may still be challenging and a learning process!

According to research conducted by the Ventana research group, the biggest challenges faced by organisations today undertaking predictive analytics are firstly surmounting the difficulties around preparing data for analysis, and secondly, getting the required access to the data in order to prepare it. In order to get around both of these issues, Ventana report that a number of organisations are moving their data from on premise to cloud based storage solutions. This trend would seem to suggest that in the future cloud based big data analysis tools (including predictive tools) will also grow in importance as scalable predictive analysis solutions will be key as data continues to accumulate.

In a separate post, Ventana discuss the challenges of a skills gap in implementing predictive analytics technology. Research indicates there are significant skills deficits and the performance of  both people and process inhibit the design and deployment of the technology. For example there are skills deficits in Maths and Statistics, technical knowledge and the ability to integrate predictive analytics systems into broader technological systems and domains. In only half of surveyed organisations did business users of the predictive analysis output get involved in its creation, because of the complexity of the mathematics or skills training thats required. In organisations where training is given in predictive analytics technology to solve business problems half were very satisfied with their predictive analytics solution. Where the training was inadequate, percentages dropped substantially. As a variety of tools are used in predictive modelling techniques, it follows that training must be given in multiple tools. Ventana point to Excel and SQL being the most commonly used, but that R, Java and Python increasing in importance. So in effect, organisations need pay more attention to developing a variety of skills, provide training opportunities and seek to combine business and technical acumen. This means in essence the creation of highly skilled cross functional teams, with members drawn from both business and technology.

A point made by Techtarget is that leading organisations use predictive analysis techniques to make correlations that are of use and deliver genuine value to the business. You need to understand very clearly what the business problem is you are trying to solve. There is so much data and so much analysis you can do that its easy to get side tracked into something that doesn’t really solve a business problem or deliver value to a business. So having business user input is key to keep the focus on solving business problems.

To sum up, to really harness the potential of predictive analysis training deficits need to be identified and addressed. Secondly, cross functional groups with diverse skill sets from both technology and the business should  be put together to identify how the predictive analytic technology can deliver real business value and put it into practice in Customer Relationship Management.

References:

Data Preparation is Essential for Predictive Analytics

Skills Gap Challenges Potential of Predictive Analytics

http://searchbusinessanalytics.techtarget.com/feature/Business-focus-is-key-when-applying-predictive-analytics-models

 

Big Data – Putting Digital Insights into Action

I found a blog by a Forrester Analyst  which quoted a statistic that organisations’ satisfaction with big data between 2014-15 actually decreased despite huge investments . In research they undertook Forrester started to talk about ‘digital insights’ as being the thing that organisations investing in Big Data were really interested in, and the actions inspired by new found knowledge. Interestingly, they found in their research that what differentiated leading companies from the rest was that they were able to recreate consistent action by building into their organisations whats termed “systems of insight”using people, processes and technology. The way most organisations work is that there is a one way street where it is hoped (and there is no defined process) for turning analytics insights into real, demonstrable action.

Forrester conclude that to be most effective there needs to be a closed loop where teams of people explore, test and implement the insights, and a return loop occurs when feedback comes from measurement of outcomes. In other words, the data (insight) is connected to an outcome, then feedback is obtained and fed back in a continuous loop of learning and optimisation. The hope is that if more organisations were to create systematic ways of creating action from digital feedback then feeding back the outcome, that the organisation can take advantage of process of continuous learning and gain competitive advantage.

 

http://blogs.forrester.com/brian_hopkins/15-04-27-systems_of_insight_will_power_digital_business

 

 

A modern day Oracle MIS, aka SaaS, PaaS, DaaS, IaaS or even XaaS

 

image_management_info_systems_pc

According to Wikipedia,  “A management information system (MIS) focuses on the management of information systems to provide efficiency and effectiveness of strategic decision making. “.  Until fairly recently, enterprise MIS systems were generally “on premise” systems (or held in computers on the premises), but over the last number of years we have witnessed a period where organisations have started to transition from enterprise to cloud computing.

In this short blog, I want to look at the offerings of one of the big boys of MIS computing, namely Oracle.

Oracle is one of the older generation of computer companies established in 1977  best known for their database technology, though they have an extensive repertoire of enterprise business applications encompassing ERP/SCM/CRM/HCM/EPM product lines using the Oracle RDBMS as a backend and Oracle middleware (mid-tier) products. Traditionally the Oracle eBusiness suite (eBS) product line up to and including Release 12 “on premise” software, was expensive to buy and maintain, but was business critical! It required highly skilled and often diverse IT teams to implement, support, extend and upgrade. The modular Oracle eBS ERP system encompassed many lines of organisation business including Financial Management, Project Management, SCM, CRM, HCM, and Business Intelligence Applications. As the modules were integrated, they were built from the ground up to talk to one another. So ‘Person’ information held in HCM flows to Purchasing in a buyer set up and PO information transfers to Accounts Payable  when a payables invoice is created and on to the General Ledger module, all data flows taking place via standard product interfaces. For some not all companies, the adoption of Oracle eBS went hand in hand with (unfortunately for them!) a rather long list of application customisations which were implemented, rather than reengineer business processes to suit standard application functionality. Thus the eBS application was no longer an out of the box implementation that Oracle would support, the in house customisation meant that the customer would have to support a mission critical system themselves! It also led to difficult and expensive upgrades when Oracle decided that they were desupporting the old version, as in house customisations would need to be compatible with perhaps a new code base or changes to a database table structure.

Then one day the cloud came along…Image_Cloud_computing

Initially Oracle senior management were in denial about the ‘threat’ posed by newer cloud companies such as Workday (primarily HCM/Finance), Netsuite, Salesforce (CRM) to name a few. However it was clear by 2012 Oracle were reinventing themselves and had shifted their corporate strategy to the cloud . A lot of work was done to get all its product lines up on the cloud. At the time of writing, a wide portfolio of Cloud services including Saas (Software as a Service), Daas (Data As a Service), PaaS (Platform as a Service) and IaaS (Infrastructure as a Service) are available. In terms of SaaS, Oracle continue to enhance an integrated suite of Financial Management, HCM, SCM, EPM & Data Analytics applications all with built in social networking functionality.  Thats a huge breadth of integrated modular application product lines in the Cloud that are built from scratch to talk to another.  That is in in fact one of the major value propositions that Oracle push, in that they see themselves as a one stop shop for database, middleware, applications and analytics for every type of business rather than other cloud companies they consider to be “niche players”.

As well as the SaaS offering, Oracle IaaS offers “elastic” scaleable storage and compute in the cloud, Oracle PaaS offers the capability to handle your (big) data applications, plus other tools to handle the seamless integration of your oracle and non-oracle applications in the cloud. The PaaS offering also includes data analytics including ways to prepare data as well as business intelligence and visualisation tools. Oracle PaaS cloud platform is the technology which underpins and provides the foundation for their SaaS applications.

Sounds pretty impressive, are there no drawbacks?

There are pros and cons to cloud computing versus an on premise Oracle solution.

To date whilst Oracle Cloud applications are mostly available they are not as comprehensive as in Release 12 Oracle eBS, particularly in the SCM modules, though new functionality continues to come on stream. So if you are in the Supply Chain or Manufacturing business, you should verify if the cloud can do the same as eBusiness Suite  for now. On the other hand if you will only be using Financial Modules, maybe Oracle cloud functionality is closer to meeting your requirements.  If your business requires application customisation, this will be easier if you hold your applications on premise, though there is more scope for customisation in a single tenant rather than multi tenant  SaaS deployment where the instance may be shared for multiple users.

Implementation speeds are impressive for an Oracle application cloud implementation compared to on premise, therefore implementation costs lower. In general, the on premise costs are higher overall due to ongoing infrastructure and staffing costs. The cost of upgrading is much lower if you are on the cloud also since you don’t need to be concerned with new equipment or additional staff to undertake them. Upgrades on the cloud will be ongoing and automatic.

As with any move to the Cloud, Organisations must consider the loss of Control when they consider outsourcing the entire system that runs their business. Also to be considered is security. However, maybe a Tier 1 Managed Services cloud provider offer superior security to an in house system in a data centre that is under patched and under resourced.  Below is a summary of the pros and cons of Cloud versus On premise to be reflected upon.

Image_pros_cons_onpremise_cloud

And what exactly is XaaS? XaaS is an acronym for everything as a service. Oracle offer PaaS, SaaS, IaaS, DaaS so I think they could be included under the XaaS banner.

References:

https://cloud.oracle.com/home

http://www.oracle.com/us/corporate/press/2313818

http://blog.appsassociates.com/top-5-considerations-for-oracle-cloud-erp

The Pros and Cons of SaaS vs On-premises Deployment

Create a heat map using Google fusion tables

HeatMap Ireland Population 2011

Our first assignment of the term was to create an image of a (Google) Fusion table outlining an Irish population heatmap based on 2011 census data from the Central Statistics Office (CSO).

I was tasked with the creation of a random distribution of counties based on population density, describing as follows:
– how to achieve a heat map.

– what information could be gleaned from a heat map

-what other ideas/concepts could be represented in the heat map.

After downloading the population data from the CSO, the first step was to clean it up into 26 distinct counties, and keeping the province data separate. After this the excel worksheet was loaded to Google fusion tables, and also the KML data was loaded separately. The spreadsheet and KML tables were then merged to create a new file. (Note you need to have a Google account to load anything up to Google fusion tables.)

From the ‘Location’ Geometry I configured the heatmap to divide the 26 counties of Ireland  into 5 custom buckets of population. I adjusted the default colours to shades of blue and adjusted the suggested defaulted values of population to make more sense.  I’ve included a screenshot below of the steps to do this.

Heat map_first_image

To add the county boundaries in black, under border color I selected ‘Use one color’ as in the following screenshot:Heat map_second_picture

I also appended the legend and updated the title of the legend to be more descriptive as shown in the below screenshot:Heat map_picture3

Also under ‘Change info window layout’ I selected county and total persons only so when you click on a county you can see the name and its population.

What information could be gleaned from the heat map

In general, its instantly clear that the the counties immediately surrounding the main cities have the highest populations in the state. (Namely Dublin, Kildare, Galway and Cork). Heat maps are useful because they allow us put into context big data through a visualisation. So rather than look up and down through a spreadsheet to pick out the counties with the largest populations, the heat map visualisation makes it clear in a matter of seconds.

You can also use the filter option to (for example ) show just those counties that have populations in the region of 100000 – 150000 people. In my map as currently configured, these counties could fall under two colour schemes.

One thing to note is that the source KML file has no geometry data for County Carlow. Also the geometry for county Cavan is incorrect in the KML file (the KML file contains coordinates located in county Carlow for Cavan), and therefore the population for Cavan is shown as that for Carlow. Similarly KML coordinates for Clare are located in Cavan as a result the Clare population whilst correctly shown in Clare is also shown within the Cavan county boundaries.

What other ideas/concepts could be represented in the heat map

If CSO information on age profiles were available, this could also be represented in the map, enabling an analysis of the location of younger versus older members of the population. This could help with the provision of state services and/or a non profit organisation like SVP as previously discussed.

One more idea is that rather than colour coding the counties, you could represent the population using the markers, however colour coding the counties in my view is much easier to see as a visual representation.

https://www.google.com/fusiontables/DataSource?docid=1jBZt05BS1EnUt9-gFlgAEj4b5TVBnWmvBRv2il7S

Intrepid first steps in R …

Here is a print screen of the R Language Course Completion proof:

Image_R_proof_course_completion

I was hoping that with the R project it would be simple enough to install R and R studio on my beloved Mac. Luckily (unlike MS SQL server installation) it proved to be straightforward if you followed the guidelines on CRAN – https://cran.r-project.org

Continue reading Intrepid first steps in R …