<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Data Mining, Down Under &#187; Research</title>
	<atom:link href="http://www.dataminingdownunder.com/category/research/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.dataminingdownunder.com</link>
	<description>Welcome to "Data Mining, Down Under", a blog by Aussie data miner Shane Butler.</description>
	<lastBuildDate>Tue, 23 Feb 2010 09:34:28 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>AusDM 09 &amp; Analytic Challenge</title>
		<link>http://www.dataminingdownunder.com/2009/07/ausdm09-2/</link>
		<comments>http://www.dataminingdownunder.com/2009/07/ausdm09-2/#comments</comments>
		<pubDate>Tue, 07 Jul 2009 13:47:52 +0000</pubDate>
		<dc:creator>Shane Butler</dc:creator>
				<category><![CDATA[Australia]]></category>
		<category><![CDATA[Industry]]></category>
		<category><![CDATA[Research]]></category>
		<category><![CDATA[ausdm]]></category>

		<guid isPermaLink="false">http://www.dataminingdownunder.com/?p=282</guid>
		<description><![CDATA[Australian Data Mining conference (AusDM09) will be held in Melbourne next December and Dr Phil Brierley of Tiberius Data Mining has put out the call for proposals for an analytic challenge to accompany the conference.  Competitions are quite popular in data mining circles and provide a good training ground for new practitioners to get access [...]]]></description>
			<content:encoded><![CDATA[<p>Australian Data Mining conference (AusDM09) will be held in Melbourne next December and Dr Phil Brierley of <a href="http://www.tiberius.biz/" target="_blank">Tiberius Data Mining</a> has put out the call for proposals for an analytic challenge to accompany the conference.  Competitions are quite <a href="http://www.kdnuggets.com/datasets/competitions.html">popular</a> in data mining circles and provide a good training ground for new practitioners to get access to real data and solve real problems.  They also often have surprising results, such as the team who used <a href="http://www.cybaea.net/Blogs/Data/How-to-win-the-KDD-Cup-Challenge-with-R-and-gbm.html">laptop with 2GB RAM</a> to beat IBM&#8217;s mighty clusters.</p>
<p>For businesses, this is a great opportunity to find out what is available by having others suggest new ideas and methods, or even to test your internally deployed models against the best of the best. <strong>So if you&#8217;re a business who has data, please consider being invloved!</strong> For further details, see the <a href="http://ausdm09.togaware.com/competition.html">competition webpage</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dataminingdownunder.com/2009/07/ausdm09-2/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Winning the DARPA Grand Challenge</title>
		<link>http://www.dataminingdownunder.com/2006/09/grand-challenge-video/</link>
		<comments>http://www.dataminingdownunder.com/2006/09/grand-challenge-video/#comments</comments>
		<pubDate>Sun, 17 Sep 2006 04:21:05 +0000</pubDate>
		<dc:creator>Shane Butler</dc:creator>
				<category><![CDATA[Research]]></category>
		<category><![CDATA[darpa]]></category>

		<guid isPermaLink="false">http://sbutler.com/blog/2006/09/grand-challenge-video/</guid>
		<description><![CDATA[Sebastian Thrun of Stanford Racing gives a great a talk on what it took build an autonomous vehicle to win the DARPA Grand Challenge. There are lots of cool technical details on the use of machine learning to achieve this. You can watch it on Google Video here.
]]></description>
			<content:encoded><![CDATA[<p>Sebastian Thrun of <a href="http://www.stanfordracing.org/">Stanford Racing</a> gives a great a talk on <a href="http://video.google.com/videoplay?docid=8594517128412883394">what it took build an autonomous vehicle</a> to win the <a href="http://www.darpa.mil/grandchallenge/index.asp">DARPA Grand Challenge</a>. There are lots of cool technical details on the use of machine learning to achieve this. You can watch it on Google Video <a href="http://video.google.com/videoplay?docid=8594517128412883394">here</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dataminingdownunder.com/2006/09/grand-challenge-video/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Smart SPAM &amp; Fighting it</title>
		<link>http://www.dataminingdownunder.com/2006/05/smart-spam/</link>
		<comments>http://www.dataminingdownunder.com/2006/05/smart-spam/#comments</comments>
		<pubDate>Sat, 13 May 2006 02:26:28 +0000</pubDate>
		<dc:creator>Shane Butler</dc:creator>
				<category><![CDATA[Research]]></category>
		<category><![CDATA[bayes]]></category>
		<category><![CDATA[google]]></category>
		<category><![CDATA[spam]]></category>

		<guid isPermaLink="false">http://sbutler.com/blog/2006/05/smart-spam/</guid>
		<description><![CDATA[For any machine learning based SPAM filters, such as the popular Bayesian methods, the key to success is the body of previously identified SPAM and HAM (valid emails) or training data. In order for the spammer to trick the filter, they must try to be more HAM-like. The way to beat this is by giving [...]]]></description>
			<content:encoded><![CDATA[<p>For any machine learning based SPAM filters, such as the popular Bayesian methods, the key to success is the body of previously identified SPAM and HAM (valid emails) or training data. In order for the spammer to trick the filter, they must try to be more HAM-like. The way to beat this is by giving your email classifier as much training data as possible, and continually updating it. Just learning from your company&#8217;s emails is probably not fool-proof when you consider the volume and variety of SPAM on the net. Web-based email on the other hand, like <a href="http://mail.google.com">Gmail</a> and <a href="https://www.google.com/hosted">the hosted version</a>, should never have this problem because the filter learns from thousands of user&#8217;s SPAM folders.</p>
<p>Researchers from University of Calgary <a href="http://pharos.cpsc.ucalgary.ca/Dienst/UI/2.0/Describe/ncstrl.ucalgary_cs/2006-808-01">claim</a> that the next evolution of will be smart SPAM, which will infiltrate your computer via spyware/viruses and <a href="http://arstechnica.com/news.ars/post/20060502-6726.html">&#8216;mine&#8217; your emails</a>. By creating emails based on the your actual messages you&#8217;ve previously sent, the spammers hope they will be more believable to readers.</p>
<p>I would argue, however, that such a situation would merely make services Gmail, more attractive. Firstly because they have a truly massive body of knowledge to use to fine tune their spam filters, and secondly because it is unlikely such spyware could infiltrate a web-based system. Even if a program was distributed that waited for someone to log on and then took over, Google could have it effectively neutralised in a matter of hours.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dataminingdownunder.com/2006/05/smart-spam/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Data Mining Cup 2006</title>
		<link>http://www.dataminingdownunder.com/2006/05/data-mining-cup-2006/</link>
		<comments>http://www.dataminingdownunder.com/2006/05/data-mining-cup-2006/#comments</comments>
		<pubDate>Fri, 05 May 2006 03:02:11 +0000</pubDate>
		<dc:creator>Shane Butler</dc:creator>
				<category><![CDATA[Research]]></category>

		<guid isPermaLink="false">http://sbutler.com/blog/2006/05/data-mining-cup-2006/</guid>
		<description><![CDATA[The Data Mining Cup (DMC2006), has launched for 2006. This year the competition focuses on eBay auctions. The target is to predict for each new auction whether the actual  sales revenue is higher than the average sales revenue of the product category.
]]></description>
			<content:encoded><![CDATA[<p>The <a href="http://www.data-mining-cup.com/2006/Wettbewerb/Aufgabe/1146583837/">Data Mining Cup</a> (DMC2006), has launched for 2006. This year the competition focuses on eBay auctions. The target is to predict for each new auction whether the actual  sales revenue is higher than the average sales revenue of the product category.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dataminingdownunder.com/2006/05/data-mining-cup-2006/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>DARPA Grand Challenge</title>
		<link>http://www.dataminingdownunder.com/2006/05/darpa-urban-challenge/</link>
		<comments>http://www.dataminingdownunder.com/2006/05/darpa-urban-challenge/#comments</comments>
		<pubDate>Thu, 04 May 2006 02:18:23 +0000</pubDate>
		<dc:creator>Shane Butler</dc:creator>
				<category><![CDATA[Research]]></category>
		<category><![CDATA[automotive]]></category>
		<category><![CDATA[autonomous]]></category>
		<category><![CDATA[darpa]]></category>
		<category><![CDATA[defence]]></category>

		<guid isPermaLink="false">http://sbutler.com/blog/2006/05/darpa-urban-challenge/</guid>
		<description><![CDATA[Start your engines, the DARPA Grand Challenge is on again only this time its an urban challenge! The last two competitions were to race an autonomous  																vehicle through a desert, with the 2005 winner, Standford, taking home a US$2 million prize.
 
 Stanford&#8217;s software in action: Input from GPS and many sensors feed the [...]]]></description>
			<content:encoded><![CDATA[<p><em>Start your engines</em>, the <a href="http://www.darpa.mil/grandchallenge">DARPA Grand Challenge</a> is on again only this time its an urban challenge! The last two competitions were to race an autonomous  																vehicle through a desert, with the 2005 winner, <a href="http://www-cs.stanford.edu/group/roadrunner/">Standford</a>, taking home a US$2 million prize.</p>
<p><a class="imagelink" title="stanford1.png" href="http://sbutler.com/blog/wp-content/uploads/stanford1.png"><img id="image132" src="http://sbutler.com/blog/wp-content/uploads/stanford1.thumbnail.png" alt="stanford1.png" /></a> <a class="imagelink" title="stanford2.png" href="http://sbutler.com/blog/wp-content/uploads/stanford2.png"><img id="image133" src="http://sbutler.com/blog/wp-content/uploads/stanford2.thumbnail.png" alt="stanford2.png" /></a><br />
<strong> Stanford&#8217;s software in action:</strong> Input from GPS and many sensors feed the algorithms to determine the safe path (see <a href="http://www.darpa.mil/grandchallenge05/TechPapers/Stanford.pdf">tech report</a>).</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dataminingdownunder.com/2006/05/darpa-urban-challenge/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>What&#8217;s in a name?</title>
		<link>http://www.dataminingdownunder.com/2006/04/whats-in-a-name/</link>
		<comments>http://www.dataminingdownunder.com/2006/04/whats-in-a-name/#comments</comments>
		<pubDate>Wed, 05 Apr 2006 09:19:27 +0000</pubDate>
		<dc:creator>Shane Butler</dc:creator>
				<category><![CDATA[Research]]></category>
		<category><![CDATA[dns]]></category>
		<category><![CDATA[internet]]></category>

		<guid isPermaLink="false">http://sbutler.com/blog/2006/04/whats-in-a-name/</guid>
		<description><![CDATA[Dennis Forbes gives a fantastic analysis of one of the biggest databases on the Internet &#8211; the DNS records. His analysis includes insights into domain name length, personal and family name usage and other characteristics.  For example, did you know that all 2- and 3-letter domains are taken? Dennis is planning a second part [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.yafla.com/dforbes/">Dennis Forbes</a> gives a fantastic analysis of one of the biggest databases on the Internet &#8211; the <a href="http://www.yafla.com/dforbes/2006/03/29.html">DNS records</a>. His analysis includes insights into domain name length, personal and family name usage and other characteristics.  For example, did you know that all 2- and 3-letter domains are taken? Dennis is planning a second part so keep a look out for that too.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dataminingdownunder.com/2006/04/whats-in-a-name/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Got Zeitgeist? Mining Online Trends</title>
		<link>http://www.dataminingdownunder.com/2006/03/mining-online-trends/</link>
		<comments>http://www.dataminingdownunder.com/2006/03/mining-online-trends/#comments</comments>
		<pubDate>Mon, 06 Mar 2006 04:12:31 +0000</pubDate>
		<dc:creator>Shane Butler</dc:creator>
				<category><![CDATA[Research]]></category>
		<category><![CDATA[google]]></category>
		<category><![CDATA[search]]></category>

		<guid isPermaLink="false">http://sbutler.com/blog/2006/03/mining-online-trends/</guid>
		<description><![CDATA[Each week, Google provides a taste of the top search queries,  a site called Google Zeitgeist. At the end of each year, they compile a more comprehensive report of what people have been searching for. The 2005 Zeitgeist has been out since December and provides some interesting insights into online trends over past year. [...]]]></description>
			<content:encoded><![CDATA[<p>Each week, Google provides a taste of the top search queries,  a site called <a href="http://www.google.com/intl/en/press/zeitgeist.html">Google Zeitgeist</a>. At the end of each year, they compile a more comprehensive report of what people have been searching for. The <a href="http://www.google.com/press/zeitgeist2005.html">2005 Zeitgeist</a> has been out since December and provides some interesting insights into online trends over past year. My favourites were <a href="http://www.google.com/press/zeitgeist2005/worldaffairs.html">world affairs</a> and <a href="http://www.google.com/press/zeitgeist2005/nature.html">nature</a>.</p>
<p>Beyond just being interesting, companies such as <a href="http://www.buzzmetrics.com/">BuzzMetrics</a> and <a href="http://www.blogpulse.com/">BlogPulse</a> have realised that analysis of Internet activity will be a useful tool for many companies. They produce tools that mine blogs in a bid to capture consumer sentiment on particular product(s), for example to improve product marketing.  <a href="http://datamining.typepad.com/">Matthew Hurst of BlogPulse has an interesting blog</a> with the odd post covering Internet blogging activity, such as <a href="http://datamining.typepad.com/data_mining/oscars/index.html">this pre-Oscars analysis</a>.</p>
<p>Another interesting data mining application is the one pioneered by <a href="http://www.majesticresearch.com/">Majestic Research</a>. They provide stock research and earnings forecasts to analysts before actual company information is released. Using web-based data mining, they <a href="http://today.reuters.com/news/articleinvesting.aspx?type=fundsFundsNews&amp;storyid=2006-02-14T191858Z_01_N14387237_RTRIDST_0_FINANCIAL-MAJESTIC-HEDGE.XML">track the sales of the top consumer-sensitive web companies, and then use this information to infer the company&#8217;s performance</a>. Nice!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dataminingdownunder.com/2006/03/mining-online-trends/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Profiling Amazon Users</title>
		<link>http://www.dataminingdownunder.com/2006/01/profiling-amazon-users/</link>
		<comments>http://www.dataminingdownunder.com/2006/01/profiling-amazon-users/#comments</comments>
		<pubDate>Thu, 19 Jan 2006 05:57:30 +0000</pubDate>
		<dc:creator>Shane Butler</dc:creator>
				<category><![CDATA[Australia]]></category>
		<category><![CDATA[Research]]></category>
		<category><![CDATA[amazon]]></category>
		<category><![CDATA[big brother]]></category>

		<guid isPermaLink="false">http://sbutler.com/blog/2006/01/profiling-amazon-users/</guid>
		<description><![CDATA[Here&#8217;s an interesting read. Data Mining 101: Finding Subversives with Amazon Wishlists takes a look at just how much information we can extract from publicly available data such as Amazon.com&#8217;s Wish List service. The wish list allows a user to bookmark items they would like either by coming back and purchase at a later date [...]]]></description>
			<content:encoded><![CDATA[<p>Here&#8217;s an interesting read. <a href="http://www.applefritter.com/bannedbooks" target="_blank">Data Mining 101: Finding Subversives with Amazon Wishlists</a> takes a look at just how much information we can extract from publicly available data such as Amazon.com&#8217;s Wish List service. The wish list allows a user to bookmark items they would like either by coming back and purchase at a later date or telling people about as gift ideas. Thus it was the ideal database to extract amazon user&#8217;s political interests. More and More, the term data mining seems to be used in the USA to refer to <a href="http://en.wikipedia.org/wiki/Data_Mining#Privacy_concerns">privacy concerns</a> of a government tool for spying on it&#8217;s citizens. That&#8217;s certainly the tone this article takes but it is still interesting to see what can be achieved using freely available data.</p>
<p>Searching on political and religious keywords the author generated a list of users and the books interested in. Although the user&#8217;s address information is hidden, he was able to use Yahoo! People Search to find their location and display the highlighted users on a customized <a href="http://local.google.com">Google Map</a>. Anyway, its worth the read, <a href="http://www.applefritter.com/bannedbooks">so check it out</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dataminingdownunder.com/2006/01/profiling-amazon-users/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
