Creating partitioned record IDs/ranks in Alteryx Designer
by Nathan Purvis
Background
We’ve probably all come across the need to assign a grouped record ID or rank at some point whilst working on various analytics projects. Wondering what exactly I mean by this? Well, if we have a dataset containing the population of various counties (if you’re on the other side of the pond then think of these like states) in the North, East, South and West regions of England, we may want to rank these based on their size. However, instead of doing so for the entire dataset, we might want to keep them separate and do so within their respective regions.
For any SQL users amongst us, the syntax will naturally vary but you’ll know that ought to look something at least similar to this:
ROW_NUMBER() OVER (PARTITION BY <Grouping field> ORDER BY <Ordering field>)
Likewise for any Python users out there, you might use something along the lines of:
df.groupby(<Grouping field>)[<Value field>].rank(method=<method>, ascending=<bool>)
In Alteryx Designer, we just need to use a couple of tools to achieve the result and there are two easy/common options to choose from! It’s worth noting here that we’ll only be covering an ordinal rank/ID.
Option 1 - The Multi-Row Formula approach
In this example, we’re looking at some mock workplace injury data and we want to add a record ID to each row, grouped by [Location code] and in order of [Injury Date]. Now, in Alteryx Designer, there’s no way to imply the sort order within the tool that’s assigning the ID. Therefore, we need to do this before we pass in our data, using the Sort tool - in this case, by [Location code] (this isn’t mandatory but helps us keep track of what’s happening in the workflow) and [Injury Date], ascending here as we want things in chronological order:
Once we’ve handled our ordering (ORDER BY or ascending=), we can then conduct the grouping (PARTITION BY or df.groupby) and assign our IDs. The first method involves using a Multi-Row Formula tool and the configuration here is very simple. All we need to do is
Create a New Field and give it a name
In the Group By (Optional) box, tick the grouping/partitioning field(s) - our [Location code] in this demonstration
Use the following expression: [Row-1:<NewFieldName>]+1
Obviously replace <NewFieldName> with your actual column header!
This gives us the output we’re looking for - records grouped into their [Location code] and assigned an ID in ascending [Injury Date] order!
Note: When using the AMP engine, especially with groupings in the Multi-Row Formula, your data will often come out in a different order and you may therefore want to use another Sort tool downstream to rectify this.
Option 2 - The Tile tool approach
As mentioned above, there is a second method of achieving this result and it actually involves even less effort! Much like option 1, we need to apply our ordering first and so the first step with the Sort tool is exactly the same. When we get to the Tile tool itself, the configuration is an incredibly simple 2-step process:
Choose Tile Method: Unique Value
Under Unique Column, tick your grouping/partition field(s)
… and that’s it! However, this time, you’ll likely want to add a Select tool afterwards in order to get rid of the ‘Tile_Num’ column (essentially an index of the partition itself), and rename the ‘Tile_SequenceNum’ (which is the grouped record ID/rank we want). See the screenshot below for reference:
Hopefully you found this to be a quick and useful guide to replicate a fairly common data preparation request using Alteryx Designer! As usual, any feedback, criticism or suggestions for future content are most welcome - don’t forget to check out our other blogs as well!