Many of the reports I run are a mashup of data from multiple sources which may contain millions of data points. Combining these reports often locked my computer up for several minutes making it impossible to do anything while they ran so I needed to look for any way I could cut down the processing time.
The first step seems obvious and that was to cut out any data I didn’t need, but it was also important to be sure this was one of the first steps so that I wasn’t loading, moving or saving more data than needed. Cutting out extra data may be possible with better SQL queries, but often times my data was pre-run during batch or came from third party vendors. As a result I would typically use excel to sort, filter and remove anything I didn’t need.
The second part of this process involved looking at where calculations were being done. It was important to make sure that I wasn’t running calculations each time I added new data. This may mean using a macro to fill in formulas all the data had been combined. It was also important to make sure I wasn’t entering 100,000 lines of formulas when I only had 30,000 lines on the report.
It is also important to look at the order in which formulas are entered. If one formula uses data from another formula take a look at the order in which they are entered into the spread sheet to avoid recalculations.
Reports change over time as different managers request new data points or decide they don’t need some information after all. I found that many old reports contained steps and calculations which were no longer needed but were never removed.
The final step comes as I gain knowledge, which involves looking to see if things can be done in a better way. I have often looked at reports I built 6 months ago and realized how many inefficient methods I used. In excel replacing VLOOKUP with INDEX MATCH can be faster, and in some cases they have been used to check if a piece of data appears on another sheet when COUNTIF would have been much faster.
When asked to pull an eight digit account number out of a random string there are many methods that can be used. I will attempt to explain my method and some of the reasons why I chose it, but first I think I should mention more about the problem.
The strings in question were made up of user typed sentence which should include the account number but did not contain a consistent format. This was made more difficult by the fact that the account number could be continuous, contain hyphens or spaces and in some cases may not exist at all. I did know that an account number should be eight digits and could not begin with a zero. Some examples of the problem are shown below:
“Please transfer 500.00 to 12345678”
“For the benefit of 1234-5678, John”
“Please send $1000, 12/19 to 1234 5678 for Christmas.”
“12345678 Mary Smith, please call with any questions (555)-555-5555”
As the examples indicate other numbers may be present and the account could be anywhere in the string. Removing all non numbers would occasionally leave strings integers with no clue which belonged to the account. Removing just spaces and special characters could have similar problems in cases such as the one above with $1000 12/19 which would become 10001219 without spaces or special characters.
I decided to start with a search using regular expressions to find the digit 1-9 followed by nine more digits. I also had to check that the nine digit number did not contain an added number before or after the string I found. It was possible that a person included a phone number or SSN which could be picked up in error if I found my string but did not check for either nothing before or after or at least a non digit.
The regular expression check wouldn’t work if the number contained a space or hyphen so the next step was to then remove these likely account spacings and run the search again. Removing the characters after the first run helped to reduce the chance of picking up some other number incase two other numbers happened to form an eight digit number.
Once I had completed the two rounds of searches I was typically left with a usable account number if one was provided. If an account could not be found the case would have to be manually reviewed.
Hello, I plan to use this site to explain how I work through some of the projects I have done professionally as well as independently. I hope to explain some of the thought process as well as well as some code or at least psudo code in the case of larger concepts.
This is your very first post. Click the Edit link to modify or delete it, or start a new post. If you like, use this post to tell readers why you started this blog and what you plan to do with it.
This is a featured content post. Click the Edit link to modify or delete it, or start a new post.