Getting Predictive Analytics Right – The Challenges


Challenges ahead for businesses implementing Predictive Analytics.

Predictive Analytics is one tool that many organisations are turning to to enhance their bottom line and gain competitive advantage. In recent years its moved away somewhat from being solely the domain of mathematicians and statisticians. Modern software has and continues to make it more accessible for organisations, though getting the best from it may still be challenging and a learning process!

According to research conducted by the Ventana research group, the biggest challenges faced by organisations today undertaking predictive analytics are firstly surmounting the difficulties around preparing data for analysis, and secondly, getting the required access to the data in order to prepare it. In order to get around both of these issues, Ventana report that a number of organisations are moving their data from on premise to cloud based storage solutions. This trend would seem to suggest that in the future cloud based big data analysis tools (including predictive tools) will also grow in importance as scalable predictive analysis solutions will be key as data continues to accumulate.

In a separate post, Ventana discuss the challenges of a skills gap in implementing predictive analytics technology. Research indicates there are significant skills deficits and the performance of  both people and process inhibit the design and deployment of the technology. For example there are skills deficits in Maths and Statistics, technical knowledge and the ability to integrate predictive analytics systems into broader technological systems and domains. In only half of surveyed organisations did business users of the predictive analysis output get involved in its creation, because of the complexity of the mathematics or skills training thats required. In organisations where training is given in predictive analytics technology to solve business problems half were very satisfied with their predictive analytics solution. Where the training was inadequate, percentages dropped substantially. As a variety of tools are used in predictive modelling techniques, it follows that training must be given in multiple tools. Ventana point to Excel and SQL being the most commonly used, but that R, Java and Python increasing in importance. So in effect, organisations need pay more attention to developing a variety of skills, provide training opportunities and seek to combine business and technical acumen. This means in essence the creation of highly skilled cross functional teams, with members drawn from both business and technology.

A point made by Techtarget is that leading organisations use predictive analysis techniques to make correlations that are of use and deliver genuine value to the business. You need to understand very clearly what the business problem is you are trying to solve. There is so much data and so much analysis you can do that its easy to get side tracked into something that doesn’t really solve a business problem or deliver value to a business. So having business user input is key to keep the focus on solving business problems.

To sum up, to really harness the potential of predictive analysis training deficits need to be identified and addressed. Secondly, cross functional groups with diverse skill sets from both technology and the business should  be put together to identify how the predictive analytic technology can deliver real business value and put it into practice in Customer Relationship Management.


Data Preparation is Essential for Predictive Analytics

Skills Gap Challenges Potential of Predictive Analytics


Big Data – Putting Digital Insights into Action

I found a blog by a Forrester Analyst  which quoted a statistic that organisations’ satisfaction with big data between 2014-15 actually decreased despite huge investments . In research they undertook Forrester started to talk about ‘digital insights’ as being the thing that organisations investing in Big Data were really interested in, and the actions inspired by new found knowledge. Interestingly, they found in their research that what differentiated leading companies from the rest was that they were able to recreate consistent action by building into their organisations whats termed “systems of insight”using people, processes and technology. The way most organisations work is that there is a one way street where it is hoped (and there is no defined process) for turning analytics insights into real, demonstrable action.

Forrester conclude that to be most effective there needs to be a closed loop where teams of people explore, test and implement the insights, and a return loop occurs when feedback comes from measurement of outcomes. In other words, the data (insight) is connected to an outcome, then feedback is obtained and fed back in a continuous loop of learning and optimisation. The hope is that if more organisations were to create systematic ways of creating action from digital feedback then feeding back the outcome, that the organisation can take advantage of process of continuous learning and gain competitive advantage.



Create a heat map using Google fusion tables

HeatMap Ireland Population 2011

Our first assignment of the term was to create an image of a (Google) Fusion table outlining an Irish population heatmap based on 2011 census data from the Central Statistics Office (CSO).

I was tasked with the creation of a random distribution of counties based on population density, describing as follows:
– how to achieve a heat map.

– what information could be gleaned from a heat map

-what other ideas/concepts could be represented in the heat map.

After downloading the population data from the CSO, the first step was to clean it up into 26 distinct counties, and keeping the province data separate. After this the excel worksheet was loaded to Google fusion tables, and also the KML data was loaded separately. The spreadsheet and KML tables were then merged to create a new file. (Note you need to have a Google account to load anything up to Google fusion tables.)

From the ‘Location’ Geometry I configured the heatmap to divide the 26 counties of Ireland  into 5 custom buckets of population. I adjusted the default colours to shades of blue and adjusted the suggested defaulted values of population to make more sense.  I’ve included a screenshot below of the steps to do this.

Heat map_first_image

To add the county boundaries in black, under border color I selected ‘Use one color’ as in the following screenshot:Heat map_second_picture

I also appended the legend and updated the title of the legend to be more descriptive as shown in the below screenshot:Heat map_picture3

Also under ‘Change info window layout’ I selected county and total persons only so when you click on a county you can see the name and its population.

What information could be gleaned from the heat map

In general, its instantly clear that the the counties immediately surrounding the main cities have the highest populations in the state. (Namely Dublin, Kildare, Galway and Cork). Heat maps are useful because they allow us put into context big data through a visualisation. So rather than look up and down through a spreadsheet to pick out the counties with the largest populations, the heat map visualisation makes it clear in a matter of seconds.

You can also use the filter option to (for example ) show just those counties that have populations in the region of 100000 – 150000 people. In my map as currently configured, these counties could fall under two colour schemes.

One thing to note is that the source KML file has no geometry data for County Carlow. Also the geometry for county Cavan is incorrect in the KML file (the KML file contains coordinates located in county Carlow for Cavan), and therefore the population for Cavan is shown as that for Carlow. Similarly KML coordinates for Clare are located in Cavan as a result the Clare population whilst correctly shown in Clare is also shown within the Cavan county boundaries.

What other ideas/concepts could be represented in the heat map

If CSO information on age profiles were available, this could also be represented in the map, enabling an analysis of the location of younger versus older members of the population. This could help with the provision of state services and/or a non profit organisation like SVP as previously discussed.

One more idea is that rather than colour coding the counties, you could represent the population using the markers, however colour coding the counties in my view is much easier to see as a visual representation.

Intrepid first steps in R …

Here is a print screen of the R Language Course Completion proof:


I was hoping that with the R project it would be simple enough to install R and R studio on my beloved Mac. Luckily (unlike MS SQL server installation) it proved to be straightforward if you followed the guidelines on CRAN –

Continue reading Intrepid first steps in R …