Return a limited but random selection of rows per category

I manage my MP3 collection with my own hand-written tool because I already have a big database with information about the bands and songs.

One of the features of the playlist manager is to specify custom SQL code to select any of the 30,000+ songs in the database for the playlist.

This allows me to have playlists like:

Any songs released in 1986
Any songs by bands from Sweden before 1993
Any songs released on "Sarah Records" on 10" vinyl.
Songs by bands added to the database in the last 12 months

You get the idea.

For my next big trip I was thinking of which songs to take along. I only have about 40 Gig space for music on my player, which is enough for roughly 10,000 files.

One idea was to take songs from my all-time favourite bands, but the filter already returns over 10,000 songs and I wanted to add some other tracks as well. So I said, lets take just 50 songs from each of my 250 favourite bands. Which songs? I don't care, they can be random, but I don't want a song twice, regardless of the fact that I have duplicate MP3s because they have been released on multiple albums or compilations. The SQL to do this was not super simple, so that's why I documented here.

To keep things simpler, I first created two views, the first one filters out duplicate song titles:

CREATE VIEW UniqueSongs 
AS 
SELECT MIN(FileID) AS FileId, Title, BandKey  
FROM SoundFiles 
GROUP BY Title, BandKey

the second one returns my favourite bands:

CREATE VIEW FavouriteBands
AS 
SELECT BandKey FROM Bands WHERE Rank > 8

Now to the main query, which uses a Common Table Expression:

WITH CTE AS
(
SELECT Row_Number() OVER (Partition BY b.BandKey ORDER BY ABS(CAST(CAST(NEWID() AS VARBINARY) AS INT))) As RowNo, 
u.FileId FROM UniqueSongs u INNER JOIN FavouriteBands b ON b.BandKey = u.BandKey
)
SELECT FileName, RootFolder + Location as Location, BandName, Title, Duration 
FROM SoundFiles 
WHERE FileId IN 
(
SELECT FileId FROM CTE WHERE RowNo <= 50
)

The first SELECT gets all the MP3s for my favourite bands,

ABS(CAST(CAST(NEWID() AS VARBINARY) AS INT)))

just returns a random number which we order by, so we get a different set of up to 50 songs each time we run the query.
The really cool part here is:

Row_Number() OVER (Partition BY b.BandKey ORDER BY

We partition our data by the BandKey in a random order and each row gets an new row number one higher than the previous one. For each new band, the row_number is reset to 1.

Here's how that looks for 3 songs per band:

The second SELECT gets the actual data required to build a playlist, the interesting part is in the sub-select in the WHERE clause:

SELECT FileId FROM CTE WHERE RowNo <= 50

Here we use our common table expression from above to get the first 50 fileIDs per band. Remember the RowNo is a continues increasing number, so filtering for anything less 51, gets us the first 50. Cool!

Return a limited but random selection of rows per category

Pages in this section

Categories