control, and governance
October 2006
Seek and Ye Shall Find
Several types of online search tools can facilitate auditors' information retrieval efforts.
Jerry Grenough, CPA
Principal, Taylor Forensics
Locating necessary documents for use in compliance, forensics, or investigative audits can be a daunting task. Fortunately, today's auditors live in an era where the Internet exists, providing a wealth of resources right at their fingertips. Auditors can access literally hundreds of millions of individual documents online, including an abundance of freely available information to facilitate audit work.
Navigating Web content and locating desired information, however, requires more than just a Web browser and an Internet connection. Unlike traditional research tools, such as library catalogs, searches conducted via the Web are not always well structured. And although the Web offers a host of search engines, choosing one arbitrarily can affect the quality of information retrieved. Auditors need the right tools to achieve optimal results.
Conventional search engines, meta-search engines, subject directories, and deep search tools each help users surf the Web with greater accuracy and ease. They provide much-needed structure to data searches and help bring Web content into focus. Understanding how each of these tools works can enhance internal auditors' ability to sift through Internet clutter and pinpoint the resources they need.
Of all search engines, Google appears to have the largest database of Web documents and pages. Google uses Boolean logic, which enables users to combine search terms using operators such as and, or, and not. and requires the search to produce records with all terms specified, or retrieves records with either term, and not excludes the terms specified. In addition, parentheses can be used to sequence the search results. For example, when researching the U.S. Sarbanes-Oxley Act of 2002, entering "(Sarbanes or Oxley) and (2002 or Act)" as search criteria would point to records that include Sarbanes or Oxley but only when they include either 2002 or Act. Enclosing multiple words in quotation marks returns documents with the exact phrase specified.
|
Fraud Auditing Anti-fraud resources abound on the Web, and many can be harnessed for audit work via search tools. For example, suppose your company is in need of fraud auditing programs. With the right approach, you should be able locate numerous practical resources in a matter of hours. First, suppose you need a printed handbook for fraud auditing as well as ready-to-use checklists. Searching for the keywords risk management, computer security, and fraud — with secondary keywords checklist and book — returns a hard-cover publication that covers financial statement fraud, corporate security, fraud investigation, procurement fraud, and sector-by-sector fraud prevention checklists. The engagement may also call for specific guidance on building a fraud audit program. A search on Google's government site using the keywords fraud auditing manual points to a PDF copy of a book published by the U.S. Department of Defense for contract auditors. Although the publication is geared toward government auditing, its sections contain generic indicators that are not government-specific, making it applicable to nongovernment practitioners as well. In addition, you may need to focus particular attention on the accounts payable function, given its key role in many fraud investigations. Auditing this area typically involves four main processes: developing a strategy, organizing the data collection process, recognizing red flags, and using specialized statistical tools for the audit. Using a conventional search engine, auditors can enter these items into a search tool as keywords, combined with the word database to mine deep Web book directories. The search yields information about a handbook geared toward accounts payable that covers all of these topics. Finally, suppose you also need to obtain internal control questionnaires and an electronic audit program related to cash assets. Using a natural language search returns information about a book published by the Association of Credit Union Internal Auditors on this precise topic. |
With Google's Advanced Search feature, users do not have to provide the logic terms. Clicking on the link for Advanced Search produces separate entry prompts, including "all the words," "exact phrase," "at least one of the words," and "without the words." Each of these tools enables users to narrow the search criteria. Users also can look for a specific file format such as Word, Excel, or portable document format (PDF), as well as specify when the pages were last updated (i.e., three, six, or 12 months). Additionally, users can specify whether the search terms should appear in the title, text, Web address, or anywhere on the page.
Google also offers a Book Search feature that enables users to search the full text of books and learn where to buy or borrow them. When Google finds a book whose content contains a match for the specific search item, it links this information to the search results. Clicking on the link enables the user to see basic information about that title.
Entering the key words "fraud auditing" in Google Book Search, for example, returns an extensive list of books on the topic. Each entry contains links directly to online bookstores where the book can be purchased, and many of the listings provide sample pages and a table of contents.
Google also has a U.S. government/military search engine that will return only .gov or .mil URLs. For example, entering the keyword fraud produces a listing of sites that includes the U.S. Internet Fraud Complaint Center, U.S. Dept. of Justice identity theft guidance, the U.S. Secret Service, U.S. Securities and Exchange Commission (SEC) information on Internet fraud, the U.S. Federal Bureau of Investigation, and information from the U.S. Food and Drug Administration on health fraud. The site provides access to extensive government content, including the ability to locate details on SEC indictments or other fraud-related information available through government sources.
META-SEARCH ENGINES
Meta-search engines enable users to access multiple search indexes simultaneously. They send user requests to several individual engines and databases and return the results from each one. In theory, the process enables users to search more of the Web in less time.
Meta-searches can provide greater scope and breadth of results than a conventional search, though sometimes at the expense of depth. Typically, meta-search engines only provide about 10 percent of the search results from any of the individual search engines used. Still, for quick searches geared toward obtaining an overview of a topic, meta-search engines can be an effective tool.
Ixquick, one of the largest meta-search indexes, returns top-10 results from multiple search engines. It awards sites one star for each search engine that placed the specified search item in its top 10 listings for that topic, indicating the quality of the results. Ixquick understands Boolean logic, and it also recognizes wild cards — or asterisks placed after a word where the spelling is uncertain — and returns results with close spelling matches. The index translates specified logic terms and forwards searches to the engines that can respond to them. Moreover, Ixquick can search in 18 different languages, exploring local engines where the specified language is spoken.
Subject directories, another type of meta-search engine, organize Internet sites by subject, enabling users to choose a subject of interest and then browse a list of resources in that category. Subject directory catalogs are organized by humans rather than generated automatically. Users conduct their searches by selecting a series of progressively more narrow search terms from several lists of descriptors provided in the directory.
Infomine, a subject directory developed by university librarians, is one of the largest engines in this category. It mines several types of sources, including databases, electronic journals, electronic books, bulletin boards, articles, and directories of researchers. Sites returned by Infomine have been carefully evaluated by editors and organized into hierarchical subject categories, including business, economics, and government.
At sldirectory.com, users can find an array of international subject directories and search engines organized by country. A link on the site's home page enables users to narrow listings specifically to subject directories, filtering out the conventional search engines.
Another international directory site, Search Engine Colossus, also enables users to view numerous search engines worldwide. The site contains 2,100 listings from several hundred countries and territories around the world, organized by individual country. Moreover, an "express button" feature enables users to link directly to search engines available in one of three specified languages — French, Spanish, or German.
|
Sarbanes-Oxley Compliance Used correctly, Web index tools can help auditors conduct searches that incorporate a host of criteria for Sarbanes-Oxley projects. For example, suppose your company needs a white paper on the best generic audit programs for Sarbanes-Oxley compliance. Specifically, assume you need a publication that contains actual documented work programs, can be viewed both in print and on CD-ROM, and meets the following additional criteria:
In less than 10 minutes of searching, a Google Book search points to a book that meets all of these requirements. A visit to IxQuick also provides access to the publication's table of contents and sample excerpts. Next, suppose you need 50-plus generic programs in Word or Excel that cover all phases of a Sarbanes-Oxley compliance engagement. Desired components include a framework for evaluation exceptions, a segregation of duties evaluator, and a testing template based on The Committee of Sponsoring Organizations of the Treadway Commission's Internal Control–Integrated Framework. A conventional Web search using these criteria returns a link for AuditNet, where a paid premium subscription provides access to 72 programs that fit these requirements exactly. Suppose you also need a management disclosure and analyses (MD&A) checklist. Because MD&A deficiencies were found at both Enron and WorldCom, the two main catalysts for Sarbanes-Oxley, MD&A activity may merit particular attention. A deep Web search for management disclosure returns a PDF copy of a report titled MD&A Checklist. The checklist covers application of critical accounting policies, results of operations, liquidity and capital resources, and special topics with broad effects such as pensions, environmental issues, off-balance sheet arrangements, and market risk disclosures. Each of these searches can return knowledge bases available at no immediate cost. Users’ only expenditure is the time spent conducting research. |
THE INVISIBLE WEB
The surface Web, also known as the visible Web or indexable Web, comprises Internet content that is indexed by conventional search engines and subject directories. The invisible Web — also called the deep Web — consists of information that search engines as a rule do not have the algorithms to locate. It contains, by some estimates, more than 900 billion pages of information — significantly more than the visible Web.
The reasons that some sites remain hidden to conventional search engines are varied. For example, content from specialized databases often do not have links to existing Web pages, making them irretrievable by most engines. Password protection can also keep search engines from accessing content. For some search engines, PDF files — as well as word processing, spreadsheet, and other documents without any hypertext markup language (HTML) content — may be invisible, though Google now provides the ability to search the full text of PDF files by converting them to text and encasing the text in HTML.
There are several ways to access invisible Web pages. One method is to use a conventional search engine to locate a searchable database by entering the desired subject term and the word database. If the database uses the word database in its pages, the search engine will likely be able to mine from it.
Complete Planet, powered by search company BrightPlanet, also enables users to access deep Web content. It uses a tool called Deep Query Manager, designed to harvest data from thousands of deep Web databases and search engines at one time. Users can enter a query and receive a Web page created dynamically for their search. These dynamic Web pages are not linked, however, since they do not exist before the query and cease to exist after being sent to the user.
To optimize deep search queries, some sites use natural language technology. Whereas keyword searching can often return ambiguous results, a natural language search involves typing a phrase or string of keywords for more targeted searches. Yahoo has integrated natural language search functions into its Yahoo Finance portal site, aiming to give professional and personal investors easier access to site content and services. Search engines AltaVista and InfoSeek use natural language searching as their basic search method.
BUSINESS INTELLIGENCE, DATA MINING
Business intelligence and data mining resources are available on the Web through each of the search techniques described. A search for information related to business intelligence, for example, yields two useful sites. Accurint, a locate-and-research tool available to government, law enforcement, and commercial customers, uses proprietary technology to fulfill search requests. It delivers reports on people, businesses, assets, and other items in real time. Accurint is used by a wide variety of professionals, including private investigators and lawyers, and would likely be helpful to government and law enforcement auditors.
At freeware site Download.com, auditors can access another data mining tool called Alyuda Forecaster XL — a Microsoft Excel add-in. Forecaster can be used for regression forecasting, time-series forecasting, and predictive classification. It also offers a combination of time-series and standard regression forecasting that enables users to forecast next-period values of output columns without entering corresponding values for that period's input columns. Forecaster can generate actual versus forecasted graphs, scatter plots, error distribution tables, and input column importance charts.
THE POWER OF THE WEB
By using available search tools, auditors can obtain a host of relevant, useful information for their audit work. Web-based research delivers a cost-benefit outcome and breadth of knowledge far superior to its paper-based equivalent. Putting search tools to effective use can enhance audit efficiency and add a new dimension to the auditor's research capabilities.
To comment on this article, e-mail the author at jerry.grenough@theiia.org.
Internal Auditor is pleased to provide you an opportunity to share your thoughts about the articles posted on this site. Some comments may be reprinted elsewhere, online, or offline. We encourage lively, open discussion and only ask that you refrain from personal comments and remarks that are off topic. Internal Auditor reserves the right to edit/remove comments.