So currently I'm at work waiting for this heinous script to finish running. Sadly, certain constraints forced me to write some pretty inefficient code when I knew there was a better way to do it and it's just killing me to watch my CPU run at 75-85% because of something I wrote.
To get an idea of the problem and the need for the monstrosity, KPMG is currently assisting our internal audit group (I suppose I ought to be careful about how much I let on here) develop standards for their own auditing of our disaster recovery procedures. Part of that means we need robust and effective calling trees to ensure smooth communication in the event of an emergency. We use a system called eBRP which is supposed to manage our employee information and generate calling trees from the data. Unfortunately, that seems to have been wishful thinking, so we have to come up with another solution. Suddenly my status as the only person on my team with programming experience becomes painfully obvious.
I recently got read-only rights to the eBRP database (internally hosted SQL Server) and I subsequently set up an ODBC connection. Since I don't have a development machine, I can't really install the majority of software publicly available and I don't think my needs justified submitting a request for SQL Management Studio. What I did instead was use Excel's data import tool to consume the ODBC data. This is kind of a neat trick, and it helped me understand the database schema a little better. However, once the request came down to figure out how to generate calling trees for KPMG, the beauty of the workaround lost its luster.
So the basic problem goes like this: for each employee, identify any employees that may report to him or her. For each of those employees, look up their contact information and record it in another Excel workbook. Seems simple enough, except this problem scales up to over 5,000 employee records linked across multiple worksheets (it did come from a database).
Again, since I'm not on a development machine, I had to figure out what language I would do this in. The requirements were that it needed to be able to run on Windows with minimal supporting installations and it needed to be able to access Excel's object models to manipulate and move data from spreadsheet to spreadsheet. VBScript came to mind as the most likely solution. I've never used VBScript before, and I really had no desire to ever try it, however it was easy enough to learn as I went and there were many examples online. I used SciTE which is a great multi-purpose editor and software deemed acceptable by the powers that be in InfoSec.
Now on to the performance issues. This is basically a search and compare problem, so I need to be able to iterate through the data and make comparisons. VBScript doesn't have access to Excel's search functions, so I had to loop through the data. Now, if you have any algorithm analysis experience, searching records iteratively for this solution would require one loop to go through each employee, another loop to identify their direct reports, and another loop to find contact information for each identified employee. So that's two levels of nested loops from and an n3 algorithm. It's not very pretty.
So with all that being said, I ran this script over an hour ago and it's still cooking. Some ideas popped into my head in my last meeting for a way to make the search algorithm a little smarter, and it's nice to be able to sort the data before I execute. Either way, it would've been great to just write a script to fire SQL queries and make this go a lot faster.
Friday, June 27, 2008
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment