Knowledge is power! We have all heard that phrase before, and though it may be a cliché, there is a lot of truth there, particularly related to marketing. No matter the industry, marketing gurus are paid (often higher than their peers) to find untapped niches and to push their companies’ products to the top. The task is never easy and always involves copious research.
If you are involved in marketing, then you probably do most of your secondary research on the web.
You may spend time looking at competitors’ websites and comparing their product offerings to your own. Perhaps you need to know how they are approaching pricing or what promotions they offer. Or, it may be that you need to compare service or product details and descriptions from one site to another in order to fine tune your own advertising plan. In other cases, you may wish to establish who your customers are and what needs they have by perusing market- or product-specific forums and blogs.
The problem is, information, just like old bread, gets stale and has to be replaced–and that is never more true than for information on the web. Stock prices, weather information, home listings, government records, insurance quotes, product prices all change on a daily basis. In fact, most all information worth having changes on a regular basis.
But if so much of marketing involves having the most up-to-date information, and most of marketing research is done on the Internet, then how do we approach the problem of having our statistics, descriptions, details, and other facts change so frequently? Who has the time to get online every day and update the previous day’s research? Fortunately, in this “information age” of ours, some very smart computer people have done some work toward reaching a solution for that very problem. By harnessing the power that computers have to perform the same mundane tasks over and over, researchers can do what has never before been possible.
Using specialized software, even individuals with limited programming experience can create to-do lists for computers to perform online-a list of sites and related data to retrieve from those sites. Imagine pointing your computer to a website, and an hour later you have a spreadsheet ready for you to manipulate. Although this process is known by a myriad of terms such as screen-scraping, web-data extraction, and page harvesting, that make it sound more like manual labor than anything, all you need is sitting right in front of you. Your own computer and the right software will do the job.
Beyond just the obvious benefits of automating repetitive tasks, screen-scraping offers better quality information than you might get otherwise.
Because your computer is doing all of the recording, fewer errors are made documenting the information. Computers don’t make typos, and they won’t transpose digits, even after hours of flipping through pages of the same website. With most screen-scraping software, you can even write simple scripts that perform data normalization. That means that you can format the information in a uniform manner to make it more usable. Maybe one site gives prices in US dollars and another site gives prices in British pounds-a simple script could convert the pounds to dollars (or vice versa) to allow you to compare apples to apples. Some software packages have additional features that allow for files (images, music, documents, etc.) to be downloaded from visited web pages. Others allow for emails to be sent to a list of addresses.
And what about really big websites? You may have the patience and stamina to run through a basic website and record details of a subset of products, but some websites have huge databases sitting behind them. That’s where page scraping really shines. Scrapes can run for anywhere from a few minutes to several days. You can put your computer to work while you are away from the office, or even while you are asleep!
The fact is, if you aren’t using screen-scraping to do your work, you are passing up a time-saving resource. Screen-scraping software allows you to define the work to be done and tell your computer do it for you. Website size is a non-issue, and the end result is tidier, more reliable, more valuable information.