control, and governance
July 2007
Extracting Information From Spreadsheet Records
Spreadsheet lookup functions enable internal auditors to perform important processing tasks, such as extracting information from entire record sets or finding unmatched records within them.
Mark G. Simkin, PHD
College Of Business Administration
University of Nevada
A common task in many auditing environments is to extract selected information from large sets of records — for example, selecting those employee records satisfying specific criteria or displaying a limited amount of information from particular vendor records. A second common task is to find those records in one record set that have no matching records in a second record set. If the initial records for such operations reside in a Microsoft Excel spreadsheet, auditors can easily perform both of these tasks by treating the underlying records as large lookup tables.
Although data-extraction tasks are commonly performed using database or auditing software, this article demonstrates how to perform such jobs easily using readily available spreadsheet tools. These techniques can save auditors time because the datasets need not be reloaded into secondary software packages and are useful auditing tools when such software is not conveniently available. Because spreadsheets are self documenting, such data extraction techniques also enable auditors share formulas, enhance their understanding of spreadsheet tools, and leverage their understanding of spreadsheet capabilities to other applications.
USING SPREADSHEETS AS LARGE LOOKUP TABLES
A key internal audit task is sampling — that is, selecting a limited number of records from a large set of them. For data residing in an Excel spreadsheet, auditors can use several methods for this task. Two of the most useful are (1) the Autofilter tool and (2) the VLOOKUP function. To illustrate, imagine that an auditor of the fictitious MGS Company wishes to verify that all salaried employees have proper employee codes. To start, assume that this means extracting all employee records with code D from the record set of Figure 1. The auditor begins with the spreadsheet in Figure 1, which contains the names and addresses of all MGS employees. The desired records could be identified manually by searching for code D in all the records in the spreadsheet and then copying the desired ones, one by one, into another worksheet. But this would be a daunting task if there were thousands of records. An easier way is to use Excel's AutoFilter tools to extract the desired employee information.
|
|
To use the AutoFilter tool, first highlight the entire set of records in the spreadsheet, including headings, and then select Data/Filter/AutoFilter from the main menu. This will cause Excel to display arrows next to each heading in the record set as shown in Figure 1. Next, click on the arrow next to the heading for Employee Code and select D from the choices provided. Excel will then filter the entire list of records and display only those records with this code. Users can now copy this list to another worksheet if they like. Auditors should note that they can also extract records that satisfy more than one requirement, such as employees with code D and with zip code 12345, by sequentially clicking on these choices using the filter arrows in the appropriate columns. Finally, the user can turn off the filter operation by selecting Data/Filter/AutoFilter once again, which toggles the filter to Off and restores the view of the entire set of spreadsheet records.
A potential problem with this solution is that each employee record might contain extraneous information that may not be needed or desired as part of the final sample. This is because Excel's AutoFilter displays complete rows of data that satisfies the specific selection criteria from the drop-down lists. But what if the user only wants a small subset of the information from each row or, alternatively, the list of employees needed is not limited to those with the same code?
The requirements of this new task can easily be satisfied using VLOOKUP functions. To use this function, the first step is to create formulas for the primary cells in row 6 of the spreadsheet, which can be copied into the remaining cells in the spreadsheet. Recall that the general form of the VLOOKUP function is:
= VLOOKUP(LookupValue, TableArray, ColumnOffset).
In this formula, Lookup Value is the value that needs to be found in the table, which is usually a reference to another cell; TableArray refers to the range of cells containing the original data; and ColumnOffset is the number of the column within the table that will be used. Thus, the formula for cell B6 in Figure 2 is:
=VLOOKUP($A6, 'Employee N&A Records’!$A$6:$H$18, 2).
The first entry (i.e., $A6) is the LookupValue, which contains a dollar sign in front of the A so that users can freeze the reference to the employee number and copy the formula into the cells to the right. The next entry in the lookup formula is the TableArray reference, which in this case refers to all the cells in the Employee N&A Records worksheet — that is, the employee records in Figure 1 treated as one large lookup table. Finally, the value of 2 is the ColumnOffset.
This VLOOKUP formula will cause Excel to display the employee code shown in the lower portion of Figure 2 for employee number 525 (i.e., the employee code for Anderson). After users have created this first formula in cell B6, they can copy it into the other primary cells in row 6 to the right of cell B6. The lookup formula for these cells are similar, except that the ColumnOffset for the cells in the first row will be 4 for the employee's first name, 6 for the employee's City address, and so forth.
After users have entered the formulas for the top row (i.e., row 6) of the spreadsheet, they can copy these formulas into the remaining cells. The lower portion of Figure 2 shows the results. Excel will perform the desired table lookups instantly. The end result is a complete set of new records showing only the columns needed and only the information required in the desired records.
Auditors might be wondering how long it takes Excel to search for information in large record sets. To answer this question, a 1,000-row database was used to perform the same query. The result — Excel was able to perform the lookups for 50 records from this database as fast as the formulas were entered into the spreadsheet.
FINDING UNMATCHED RECORDS IN SPREADSHEETS
Another common auditing task is to identify the records in one record set that do not have corresponding records in a second record set. Examples include identifying employee payroll checks with no corresponding records in a payroll master file, vendor payments with no corresponding supplier records in a vendor file, or refunds to customers with no corresponding record of payments in a payments file. Again, auditors can use Excel's table lookup capabilities to accomplish this task.
|
|
To program the cells in column F in Figure 3, users can employ the VLOOKUP function to find the employee's last name in the Employee N&A Records worksheet (i.e., the worksheet in Figure 1 that contains the names of all legitimate employees). The appropriate formula for cell F6 is:
=VLOOKUP(A6, 'Employee N&A Records'!$A$6:$H$18, 3, FALSE).
In this formula, the first three entries should be familiar. The reference to cell A6, for example, is the value to be found in the table (e.g., employee number 123 in cell A6). The second entry, 'Employee N&A Records'!$A$6:$H$18, references the worksheet containing the master list of employees and the range of cells within this worksheet that contains the data (i.e., the range A6:H18). Again, please note that users must include the name of the worksheet as well as the cells within the worksheet for this. The third entry is the column offset for the VLOOKUP formula (i.e., 3), which is the column offset required to reference the employee's last name. The last entry in the formula, FALSE, is an optional parameter for an Excel lookup formula that instructs Excel to only look for exact matches. If this parameter were set to TRUE or omitted, Excel would find either exact matches or records with employee numbers less than or equal to the search value. Inasmuch as the application requires exact matches, the entry must be FALSE for this parameter.
Once users have created the formula for cell F6, they can copy it to the other cells in column F of Figure 3. As shown in Figure 3, Excel will display the last name of the employee corresponding to the employee number, if it can find the number in the record set, or use #N/A, which stands for not found, if it cannot. Figure 3 contains four instances of this not-found condition — that is, there are four unmatched records. Auditors can then use Excel's AutoFilter option as discussed earlier to isolate the four records by turning on the AutoFilter option and selecting #N/A from the list of options in the drop down list for column F.
After the filtered list of unmatched records is created, users can copy the information to a new worksheet using Paste Special/Values for further analysis. For example, it might be useful to find out how many unmatched payroll records came from the same department in the example described here or how many unmatched vendor payments came from the same purchasing agent. Such tasks would simply require users to extract the required information from the underlying master tables first, after which simple filtering operations make it easy to extract the desired information.
LOOKING FORWARD
Some of the most useful spreadsheet tools are those that allow auditors to extract useful information from large data sets. One such tool is Excel's AutoFilter capability. Auditors will find this feature useful when searching for selected records that satisfy specific search criteria — for example, for finding those individuals working in a particular corporate division who are also salaried employees or those credit customers lacking phone-number entries in their files.
A second spreadsheet extraction tool is the ability to treat an entire set of spreadsheet records as a large lookup table. As explained in the article, this capability not only allows auditors to select specific information from selected records, but also to identify unmatched records. As illustrated here, one useful application of this feature is to identify those employees who received paychecks but who did not have matching records in a payroll masterfile. Other examples include the ability to identify vendor payments with no matching supplier records in a vendor masterfile or customers who purchased goods on credit but who lacked approval codes in a customer masterfile.
Spreadsheets are not the only tools that can perform the data extraction tasks described here, but there are several reasons why using spreadsheets for such tasks might make sense. One is the likelihood that a particular data set already resides in a spreadsheet, or can be easily downloaded into one. Another reason is that an auditor may not be familiar with alternate database or auditing software, in contrast to spreadsheet software which is widely available and commonly understood. A third reason is that such an approach might save time because an auditor does not have to reload data into alternate software. A fourth reason is that spreadsheets are self documenting and therefore allow users to share formulas with colleagues. Finally, understanding how data-extraction tools work in spreadsheets enhances an understanding of spreadsheet capabilities in general, enables an auditor to leverage these skills for other tasks, and increases an auditor's effectiveness.
FOR FURTHER READING
For more information on using Excel's AutoFilter feature, see:
http://www.contextures.com/xlautofilter01.html
http://office.microsoft.com/en-us/excel/HA011127901033.aspx
For more information on using VLookUp functions, see:
http://office.microsoft.com/en-us/excel/HP052093351033.aspx
http://www.techonthenet.com/excel/formulas/vlookup.php
http://faculty.ucc.edu/business-greenbaum/C100SS_HVLOOKUP.htm
Mark G. Simkin, PHD, is a professor of Computer Information Systems at the University of Nevada, Reno. His research in end-user computing, computer education, and computer crime appears in 100 academic journal articles, including Decision Sciences, The Decision Sciences Journal of Innovative Education, The Journal of Accountancy, Communications of the ACM, and Communications of the Association for Information Systems. Simkin is also the author of 15 books, the most recent of which is Core Concepts of Accounting Information Systems (John Wiley and Sons, 2008) with co-authors Nancy A. Bagranoff and Carolyn Norman Strand.
Internal Auditor is pleased to provide you an opportunity to share your thoughts about the articles posted on this site. Some comments may be reprinted elsewhere, online, or offline. We encourage lively, open discussion and only ask that you refrain from personal comments and remarks that are off topic. Internal Auditor reserves the right to edit/remove comments.