I’ve decided to start afresh with a new blog, Data Mining Down Under. While I have blogged previously on my personal site, I thought it would be good to start fresh with a new focus and more regular posts. Therefore this site will not be about personal posts but rather a journal of my thinking around various data mining topics. Generally speaking, this blog will cover a wide range of data mining topics from the latest research and development efforts to all the trends and best practices for industry. Hopefully it provides for some interesting reading!
Welcome to Data Mining, Down Under
September 26th, 2008 · General
→ No CommentsTags:welcome
Data Mining the Financial Markets
April 25th, 2008 · Industry, Tips & Tutorials
Thomas A. Rathburn has written a series of three articles on data mining the financial markets. Rathburn takes a detailed look into the success and failures of his efforts in the markets and with 10 year US bonds in particular. You can check it out here part 1, part 2, and part 3. The articles are also available as a podcast here: 1, 2, 3.
[via KDnuggets]
→ No CommentsTags:bonds·finance
Experian Bolsters Data With Hitwise Acqusition
May 4th, 2007 · Industry
Tim O’Reilly points to the news that Experian has made a significant move to improve the quality of their online and demographic data with the acqusition of Hitwise for US$240 Million. Hitwise collects user traffic from ISPs in several countries including Australia and uses that information to provide companies with insight into their online marketshare. Although not mentioned in the press release, the Hitwise data will likely be a huge boon for Experian’s marketing services, and will probably allow them to develop more accurate geo-demographic profiles.
→ No CommentsTags:advertising·big brother
Winning the DARPA Grand Challenge
September 17th, 2006 · Research
Sebastian Thrun of Stanford Racing gives a great a talk on what it took build an autonomous vehicle to win the DARPA Grand Challenge. There are lots of cool technical details on the use of machine learning to achieve this. You can watch it on Google Video here.
→ No CommentsTags:darpa
In-cell Graphing
August 11th, 2006 · Tips & Tutorials
The guys from Juice Analytics have put together an interesting series on in cell graphing (parts 1, 2, & 3). This is a feature that is due in the upcoming version of Excel 2007, however the technique the Juice guys use works across all versions of Excel and is quite visually appealing too. Added bonus, I can confirm it works in OpenOffice.org, Gnumeric and even Google Spreadsheets (all to varying degrees).
→ No CommentsTags:excel·graphs
Article: HCF gets a helping hand from predictive analytics
June 13th, 2006 · Industry
From the ComputerWorld article:
Private health insurer HCF has implemented a predictive analytics suite to help weed out fraudulent claims, target individual members and streamline the monotonous labour of data analysis.
→ No CommentsTags:customer analytics·fraud·insurance
Data Mining with Oracle
May 30th, 2006 · Software, Tips & Tutorials
If you are interested in data mining and haven’t already seen the Oracle Data Mining and Analytics blog, it is worth checking out. It has some great how to’s, including time series forcasting (parts 1, 2, 3) and real-time scoring & model management (parts 1, 2, 3).
→ No CommentsTags:oracle·sql
Smart SPAM & Fighting it
May 13th, 2006 · Research
For any machine learning based SPAM filters, such as the popular Bayesian methods, the key to success is the body of previously identified SPAM and HAM (valid emails) or training data. In order for the spammer to trick the filter, they must try to be more HAM-like. The way to beat this is by giving your email classifier as much training data as possible, and continually updating it. Just learning from your company’s emails is probably not fool-proof when you consider the volume and variety of SPAM on the net. Web-based email on the other hand, like Gmail and the hosted version, should never have this problem because the filter learns from thousands of user’s SPAM folders.
Researchers from University of Calgary claim that the next evolution of will be smart SPAM, which will infiltrate your computer via spyware/viruses and ‘mine’ your emails. By creating emails based on the your actual messages you’ve previously sent, the spammers hope they will be more believable to readers.
I would argue, however, that such a situation would merely make services Gmail, more attractive. Firstly because they have a truly massive body of knowledge to use to fine tune their spam filters, and secondly because it is unlikely such spyware could infiltrate a web-based system. Even if a program was distributed that waited for someone to log on and then took over, Google could have it effectively neutralised in a matter of hours.
→ 1 CommentTags:bayes·google·spam
Data Mining Cup 2006
May 5th, 2006 · Research
The Data Mining Cup (DMC2006), has launched for 2006. This year the competition focuses on eBay auctions. The target is to predict for each new auction whether the actual sales revenue is higher than the average sales revenue of the product category.
→ No CommentsTags:
DARPA Grand Challenge
May 4th, 2006 · Research
Start your engines, the DARPA Grand Challenge is on again only this time its an urban challenge! The last two competitions were to race an autonomous vehicle through a desert, with the 2005 winner, Standford, taking home a US$2 million prize.
![]()
Stanford’s software in action: Input from GPS and many sensors feed the algorithms to determine the safe path (see tech report).
