Posted: April 21st, 2013 | Author: Mark | Filed under: Projects | Tags: analysis, data, hottest100, music, R, triplej | No Comments »
At the end of 2012 I set myself a goal to try & predict the 2012 Hottest 100. Little did I realise what a daunting, complex task I had set myself. After screwing around with a pile of hand-rolled scripts, 3 virtual machines, about a dozen SSIS packages, around 6 SSAS prediction models, a PostgreSQL database, a SQL Server OLTP DB, a SQL Server OLAP DB and more knowledge of the Fuzzy Lookup transform that I will ever admit to, I decided to start over with simpler, incremental goals.
Eventually I still want to predict a Hottest 100 chart but to get there I’m starting with smaller data sets and asking simpler questions. This page will be my diary.
H100.1: Is there a correlation between play counts in the year and Hottest 100 success?
H100.2: Is there a correlation between play counts during certain times of the year and Hottest 100 success? (in progress)
H100.3: Is there a correlation between artist nationality & Hottest 100 success? (planned)
Pick some of the more useful factors and build a more complex model (clustering & decision trees is about all I could do at the moment).
- YouTube Plays
- Social Media mentions
- Try and extract genre from MusicBrainz
- Play data from ARIA & other non-JJJ sources
I’m doing this using the R statistical programming language and the RStudio IDE. I’ll include all of my exploratory files as well as the clean write-ups, R markdown files and data when I upload something. It’s a pretty cool tool & I’m fairly new at this stage. If you know a better way to do something I’ve done, please tell me!
I’m trying to predict 2 outcomes from my data: a binary outcome defining if the song is in the top 100 or not and given that a song IS in the top 100, it’s rank. I’m hoping to generate a probability model for the former and a continuous variable for the latter. This will mean prediction takes two steps, or two models but it makes the problem easier to solve. In theory the body of eligible songs are ranked by JJJ from 1 to N everything between 101 and N is hidden from us.
My wife & I were talking about moving to a new place and one potential candidate came up that didn’t seem to solve my #1 priority of spending less time in transit. It was closer to our nearest freeway but at an exit further away than our current place. (Pro tip: skip the article and just download the code right here).
She seemed pretty keen on the place but I couldn’t quite justify a new place that didn’t save me any time. I figured there are a few places I visit on a pretty regular basis, so I needed some way to work out if, on average, I’d be spending less time travelling.
A quick scan of the Google Maps API showed that I can pretty easily pull out the distance & duration between two addresses (as simple strings). Now to get the data into some form that can be manipulated easily.
Excel 2010 provides a few ways of importing data; the Google API supports XML so I checked that first. Unfortunately the XML import isn’t live, it’s only a once-off import, and I want to be able to update a value and have it update my sheet straight away.
Excel also supports a “Web Query” for importing data, which is pretty cool, you give it a URL and it will take a <table/> out of a web page and bring it in for you to manipulate. Cool, but not helpful here.
Since it didn’t look like Excel could do it natively using it’s default tools I got my hands dirty. This question on Stack Overflow had the goods: a class library in VBA that pulls a string data feed from a URL and a link to this project that provides a VBA class for parsing JSON string data.
With those two modules neatly embedded in my sheet, all it took was a few lines into a custom function that retrieved the JSON API result and parsed it to the single value I needed. Since I don’t want to get black-listed from the API, I popped a (very) simple cache in place. Don’t get too confident though, it only caches to memory, so will still need to retrieve every calculation each time you load the sheet.
Dim DistCache As New Scripting.Dictionary
Function CalculateDistance(startAddress As String, endAddress As String)
Dim key As String
key = startAddress & "|" & endAddress
If DistCache.Exists(key) Then
v = DistCache(key)
Dim request As New SyncWebRequest
request.AjaxGet ("http://maps.googleapis.com/maps/api/directions/json?origin=" & startAddress & "&destination=" & endAddress & "&sensor=false")
Dim json As String
json = request.Response
Dim parser As New JSONLib
Set result = parser.parse(json)
Set routes = result("routes")
Set route = routes(1)
Set legs = route("legs")
Set leg = legs(1)
Set dist = leg("distance")
v = dist("value")
DistCache(key) = v
CalculateDistance = v
I saved the spread sheet (with non personal data) – if you want to check it out you can download it here. Remember you will need to enable macros before it will work. To check out the source, hit the “Developer” tab and click “Visual Basic”. It looks like this place will save me a heap of time after all – now that the hard bit is done we just have to buy it
Being ever frustrated with the world’s misspelling of the words ‘a lot’, and being inspired by Hyperbole and a Half’s clever coping mechanisms, I thought I’d enhance the web with a Greasemonkey script that will show our friend, the Alot, on screen when his name is mentioned.
Here is the little critter:
You can install the plugin yourself from the page on userscripts.org.
Posted: May 1st, 2011 | Author: Mark | Filed under: Projects | Tags: hehpic, images, tools | No Comments »
For a while now I’ve been working on an image-sharing site called hehpic. It’s a work in progress & still has some fairly rough edges but it’s online & available for all to use.
- Browser extensions for easy sharing
- Public API (there’s a private API now but it’s changing too frequently to be released)
- More search features
- Sex up the browsing interface
- Tag combos (works now, just has to be done manually, use a comma)
- Paging/Searching/Tag counts
- Dedicated hosting environment (I love my Dreamhost but it’s probably not going to scale well – I’ll deal with that when/if it happens)
- Feature poor
- IE rendering is awful
Check it out. Upload something. Make an account. Subscribe to some feeds. Comment here.
Posted: May 27th, 2010 | Author: Mark | Filed under: Projects | Tags: android, bit.ly, link shrink, mobile, tools | No Comments »
Just a quick one this time, Link Shrink 0.2 is in the market. Should hit the “100 downloads” mark soon, woo
- Links can now be shared out through the clipboard (i.e. copy & paste)
- App handles already shortened URLs gracefully
- Slightly more sensible logic around the API credential validation
Posted: May 18th, 2010 | Author: Mark | Filed under: Projects | Tags: android, bit.ly, link shrink, mobile, tools | No Comments »
I’ve just put together a 0.000000001 version of my bit.ly URL shortening tool for Android, called “Link Shrink”.
It integrates into the Android OS fairly tightly, at the moment only through the “Share Page” actions (technically speaking, android.intent.action.SEND + android.intent.category.DEFAULT + text/plain, copied straight from the Android browser) so when browsing (or doing anything else that supports URL sharing) you can hit share, hit Link Shrink & it will generate a bit.ly URL and send it back through the Share Page “intent” again so you can pass it on to SMS, email, delicious or whatever tickles your fancy.
Since it’s only a 0.1, it only supports the basics.
- URL shortening using bit.ly (and only bit.ly)
- Users can optionally provide their own bit.ly login & API key to add URLs to their account
- The only option for what to do with the shortened URL is re-sharing
There are plans for the future though and I’m kinda sweet on this tiny little app so it’ll probably be sooner rather than later.
- UI improvements
- Better error handling (e.g. detect when attempting to shorten an already shortened URL)
- More shortening services (if anyone has preferences please let me know, I only use bit.ly)
- More things to do with shortened URLs, such as:
- Provide text boxes with long & short URLs for copy & paste
- Send directly to the clipboard
- Open bit.ly stats page
- Prompt user every time for one of the above
- Better configuration screen to support the above
- Your idea here!
Ultimately though, the most important feature (URL shortening) is done & working fairly solidly on the emulator & my phone so I think I’ll let it loose on the market. Please whack a comment in or email me if you have any feedback, problems, ideas or suggestions.
As always, the source is on Github, plus you can get the latest APK from my Dropbox, plus it’s on the market (scan the barcode below or click here if you’re on an Android phone).
Posted: December 31st, 2009 | Author: Mark | Filed under: Projects | Tags: android, augmented reality, iphone, layar, meat in a park, mobile | 2 Comments »
BBQs are awesome. Augmented reality is awesome. Imagine (if you will, nay, can) the awesomeness unleashed by combining the two. Well brace yourself, it’s been done. Combining the oh-so-great Meat in a Park API with the augmented reality browser Layar for Android (and yes, iPhone too), I present “the meat in a park layar” (thanks, I know it’s a great name). Use it to find a BBQ near you (although the MiaP service only has data for Australia – if this changes then I’ll lift the country restrictions).
It’s been approved, so first get Layar from the Android Market or the iPhone App Store and you can find it listed under ‘Eating and Dining’. As always the code is on github.
Posted: December 19th, 2009 | Author: Mark | Filed under: Projects | Tags: algerian, awesome algerian, rails, ruby, typography | No Comments »
Wow… my first blog post. I never thought the day would come where I had reason to tell the world something in ink as I’ve always had trouble finding “content” worthy of publishing, despite the constant stream of rubbish I spill out on twitter & other social forums. My new years resolution for 2009 was to get writing code & put something out there and in my typical style, I’ve left it to the very last minute. My day job these days is particularly mundane & repetetive so I’ve been finding mental stimulation elsewhere, this week I spent my evenings learning Ruby & Rails, building the “Hello World” of web development – a single-serve site.
Announcing: Awesome Algerian, the world’s smallest in-joke & a showcase of the finest use of the Algerian font across the globe. I built the functionality of the site including a posting model & user management using this YouTube tutorial and the guide for Authlogic. My good friend dos4gw provided the hilarious foundation, designed the layout and some content. User registration is currently open while those privvy to the joke sign up & posting is restricted to these users.
Things I’ve learnt:
- Ruby looks and feels so much more elegant than Perl
- Ruby on Rails is super-easy to pull together a quick, functional website
- Typeface.js is so much nicer than sIFR
- Syndicate the content (RSS) – done!
- Write tests
- Clean up the routes
- Try and break it