Data Mining, Down Under

Welcome to “Data Mining, Down Under”, a blog by Aussie data miner Shane Butler.

Data Mining, Down Under header image 1

Welcome to Data Mining, Down Under

September 26th, 2008 · General

I’ve decided to start afresh with a new blog, Data Mining Down Under.  While I have blogged previously on my personal site, I thought it would be good to start fresh with a new focus and more regular posts.  Therefore this site will not be about personal posts but rather a journal of my thinking around various data mining topics. Generally speaking, this blog will cover a wide range of data mining topics from the latest research and development efforts to all the trends and best practices for industry.  Hopefully it provides for some interesting reading!

→ No CommentsTags:

Data Mining the Financial Markets

April 25th, 2008 · Industry, Tips & Tutorials

Thomas A. Rathburn has written a series of three articles on data mining the financial markets. Rathburn takes a detailed look into the success and failures of his efforts in the markets and with 10 year US bonds in particular. You can check it out here part 1, part 2, and part 3. The articles are also available as a podcast here: 1, 2, 3.

[via KDnuggets]

→ No CommentsTags:·

Experian Bolsters Data With Hitwise Acqusition

May 4th, 2007 · Industry

Tim O’Reilly points to the news that Experian has made a significant move to improve the quality of their online and demographic data with the acqusition of Hitwise for US$240 Million. Hitwise collects user traffic from ISPs in several countries including Australia and uses that information to provide companies with insight into their online marketshare. Although not mentioned in the press release, the Hitwise data will likely be a huge boon for Experian’s marketing services, and will probably allow them to develop more accurate geo-demographic profiles.

→ No CommentsTags:·

Winning the DARPA Grand Challenge

September 17th, 2006 · Research

Sebastian Thrun of Stanford Racing gives a great a talk on what it took build an autonomous vehicle to win the DARPA Grand Challenge. There are lots of cool technical details on the use of machine learning to achieve this. You can watch it on Google Video here.

→ No CommentsTags:

In-cell Graphing

August 11th, 2006 · Tips & Tutorials

The guys from Juice Analytics have put together an interesting series on in cell graphing (parts 1, 2, & 3). This is a feature that is due in the upcoming version of Excel 2007, however the technique the Juice guys use works across all versions of Excel and is quite visually appealing too. Added bonus, I can confirm it works in OpenOffice.org, Gnumeric and even Google Spreadsheets (all to varying degrees).

→ No CommentsTags:·

Article: HCF gets a helping hand from predictive analytics

June 13th, 2006 · Industry

From the ComputerWorld article:

Private health insurer HCF has implemented a predictive analytics suite to help weed out fraudulent claims, target individual members and streamline the monotonous labour of data analysis.

→ No CommentsTags:··

Data Mining with Oracle

May 30th, 2006 · Software, Tips & Tutorials

If you are interested in data mining and haven’t already seen the Oracle Data Mining and Analytics blog, it is worth checking out. It has some great how to’s, including time series forcasting (parts 1, 2, 3) and real-time scoring & model management (parts 1, 2, 3).

→ No CommentsTags:·

Smart SPAM & Fighting it

May 13th, 2006 · Research

For any machine learning based SPAM filters, such as the popular Bayesian methods, the key to success is the body of previously identified SPAM and HAM (valid emails) or training data. In order for the spammer to trick the filter, they must try to be more HAM-like. The way to beat this is by giving your email classifier as much training data as possible, and continually updating it. Just learning from your company’s emails is probably not fool-proof when you consider the volume and variety of SPAM on the net. Web-based email on the other hand, like Gmail and the hosted version, should never have this problem because the filter learns from thousands of user’s SPAM folders.

Researchers from University of Calgary claim that the next evolution of will be smart SPAM, which will infiltrate your computer via spyware/viruses and ‘mine’ your emails. By creating emails based on the your actual messages you’ve previously sent, the spammers hope they will be more believable to readers.

I would argue, however, that such a situation would merely make services Gmail, more attractive. Firstly because they have a truly massive body of knowledge to use to fine tune their spam filters, and secondly because it is unlikely such spyware could infiltrate a web-based system. Even if a program was distributed that waited for someone to log on and then took over, Google could have it effectively neutralised in a matter of hours.

→ 1 CommentTags:··

Data Mining Cup 2006

May 5th, 2006 · Research

The Data Mining Cup (DMC2006), has launched for 2006. This year the competition focuses on eBay auctions. The target is to predict for each new auction whether the actual sales revenue is higher than the average sales revenue of the product category.

→ No CommentsTags:

DARPA Grand Challenge

May 4th, 2006 · Research

Start your engines, the DARPA Grand Challenge is on again only this time its an urban challenge! The last two competitions were to race an autonomous vehicle through a desert, with the 2005 winner, Standford, taking home a US$2 million prize.

stanford1.png stanford2.png
Stanford’s software in action: Input from GPS and many sensors feed the algorithms to determine the safe path (see tech report).

→ No CommentsTags:···