How to beat 10:00 (and maybe even 3:47)…

We hope you took our 3:47 data challenge before reading this post!  Even if you didn’t you’re welcome to learn how to use Excel to process your data at least 10x faster. We know that if you use the steps we suggest, you will complete this data challenge in less than 10:00 min.

Power Query

To complete the solution in under 10:00 minutes, you’re definitely going to need Power Query.  If you’re using Excel 2010 or Excel 2013, you’ll probably need to download and install Power Query as an add-in since it is not part of the native Excel software.  There’s a great guide to installing Power Query which can be found here if you’re getting stuck.  Excel 2016 has Power Query as part of the standard menu ribbon (called Get&Transform).

Screenshot (35)

If you’re using a Mac, I’m sorry…

Apple OS are not supported for these amazing tools.  A good place to vote for this change is on this Power BI feature request.

The 3:47 Solution

 

Are you ready?  Let’s start the stopwatch (click on images to enlarge if you need to see the screenshots)…

Start by locating the folder into which the source data files were downloaded.  Copy and paste the file path to save time.

Using Power Query or Get&Transform, create a “New Query”>”From File”>”From Folder”

Screenshot (8)

Paste the folder path of the source data (or browse if you don’t have it copied)

Screenshot (9)

Select “Edit” to open the Query Editor

Screenshot (10)

Right-click on the “Content” column and select “Remove Other Columns”

Screenshot (11)

This leaves only the “Content” column

Screenshot (12)

On the menu ribbon, select “Add Column”>”Add Custom Column” and then type “=Excel.Workbook([Content])” and click “OK”.

Screenshot (13)

This adds a column that interprets the binary content (1’s and 0’s) as the meta-data of each spreadsheet (will make sense in the next screen)

Screenshot (14)

Click on the “Expand” button in the column header of the “Custom” column and then deselect all except for “Data”

Screenshot (15)

Now click “OK” to expand only the “Data” column of the meta-data (other meta data will show the name and type of information contained in the spreadsheets together with some other details)

Screenshot (17)

Remove the “Content” column similar to before but click and select “Remove” this time

Screenshot (18)

Click on the expand button of the “Data” column and click “OK”

Screenshot (19)

On the menu ribbon click “Home”> “Use First Row as Headers” to promote the top row as headers

Screenshot (20)

To remove all the heading rows of the other spreadsheets we’ve just combined…

(oh…did you not yet realise that you’ve already combined 20 spreadsheets with your first click.  If not, take 15 precious seconds to breathe deeply and think about the implications of this for your current work in Excel)

…Click on the Menu button of the “RunNumber” column.  Because Power Query uses only a preview of the data and not the complete dataset, we need to click “Load More” to be able to see the rest of the data in this column.

Screenshot (21)

We de-select “RunNumber” which removes each row in the query that has this value (i.e. header rows)

Screenshot (22)

We now load this to our spreadsheet by selecting “Close & Load”

Screenshot (23)

The query takes a couple of seconds to load the 32,928 rows of data into our spreadsheet (“slightly” faster than copy-paste  😉)

Screenshot (24)

From the “Table” menu we select “Summarize with Pivot Table”

Screenshot (25)

The default is the table we are in so we can just click “OK”

Screenshot (26)

In our pivot table we drag and drop “RunNumber” in the Rows section of our report

Screenshot (27)

We drop “parkrunner” into the Values section which defaults to a count aggregation (which is what we want to calculate).  In this format, the count of “parkrunner” is the total for each “RaceNumber” so we’re nearly there!

Screenshot (28)

We now drop “parkrunner” again into the Filter section of the report

Screenshot (29)

Copy and Paste Values of the Count of “parkrunner” (i.e. the Totals)

Screenshot (30)

Now that the values are here we can change the report filter

Screenshot (31)

In the report filter search bar type “Unknown” and then select this value by clicking on it

Screenshot (32)

The values have updated to give us the Count of “Unknown” runners

Screenshot (33)

We find the number of “Known” runners by difference (use a regular Excel formula)

Screenshot (34)

Copy and paste the formula next to the pivot table and voilà: we have the number of Known runners so you can stop your watch!

Power to You

We hope that this challenge gave you some new ideas about how to generate reports and other routine Excel tasks faster.  It’s helped us immensely and we’d love to continue to share with those keen to learn.  Please interact with us through this blog (in the comments section) so we can understand what’s valuable to you about what we’re doing.

Leave a Reply