11312

Module 4: Discovering ASEAN on the Internet

Reading Text & Presentation

Search engines

4.2.1 What are search engines?

Search engines are huge databases of web page files that have been assembled automatically by machine. For the best results, one should know the basic characteristics of their search engines or web search tools, and how to most effectively enter their search, given the requirements of that site. Although Google is by far the most popular search engine, several have noted that a Google search, despite being able to “see” a trillion websites with unique addresses, was only scratching the surface of the information that actually exists on the web. (Pagliery, 2014; Wright, 2009)

4.2.2 How do search engines work?

Text from University of South Carolina. Beaufort Library. (2006). Bare Bones Lesson 1: Search engines.

(Source: http://www.sc.edu/beaufort/library/pages/bones/lesson1.shtml retrieved 1/4/2014)

Search engines compile their databases by employing "spiders" or "robots" ("bots") to crawl through web space from link to link, identifying and perusing pages. Sites with no links to other pages may be missed by spiders altogether. Once the spiders get to a web site they typically index most of the words on the publicly available pages at the site. Web page owners may submit their URLs to search engines for "crawling" and eventual inclusion in their databases. The whole process is shown in Figure 6.

Figure 6 How a search engine works

Sharma, T. (2014, January 24). SEO: Get Enlighten With Search Engine Spiders – Crawler, Index & Cache.

(Source: http://blog.seoppctraining.in/ retrieved 1/4/2014)

Whenever you search the web using a search engine, you are asking the engine scan its index of sites and match your keywords and phrases with those in the texts of documents within the engine's database.

It is important to remember that when you are using a search engine, you are NOT searching the entire web as it exists at this moment. You are actually searching a portion of the web, captured in a fixed index created at an earlier date.

How much earlier? It's hard to say. Spiders regularly return to the web pages they index to look for changes. When changes occur, the index is updated to reflect the new information. However, the process of updating can take a while, depending upon how often the spiders make their rounds, and then how promptly the information they gather is added to the index. Until a page has been both "spidered" AND "indexed," you will not be able to access the new information.

Figure 7 Google, one of the most widely-used search engines
(Source: http://www.google.com)

4.2.3 What are the pros and cons of search engines?

Pros:
Search engines provide access to a fairly large portion of the publicly available pages on the web, which itself is growing exponentially.

Search engines are the best means devised yet for searching the web. They allow users to search on any topics of interest in several languages with ease, convenience and quick response.

Cons:

On the down side, the sheer number of words indexed by search engines increases the likelihood that they will return hundreds of thousands of retrieved items to simple search requests, albeit typographical errors. Although these items are ranked by the degree of relevancy, a great majority of users only grasp the first twenty results, leaving behind a larger number of results uninvestigated only because they are not on the top twenty. Additionally, many of these responses will be irrelevant to your search.

4.2.4 Are search engines all the same?

Search engines use selected software programs to search their indexes for matching keywords and phrases, presenting their findings to you in some kind of relevance ranking. Although software programs may be similar, no two search engines are exactly the same in terms of size, speed and content; no two search engines use exactly the same ranking schemes, and not every search engine offers you exactly the same search options. Therefore, your search is going to be different on every engine you use. The difference may not be a lot, but it could be significant. Recent estimates put search engine overlap at approximately 60 percent and unique content at around 40 percent.

In ranking web pages, search engines follow a certain set of rules. These may vary from one engine to another. Their goal, of course, is to return the most relevant pages at the top of their lists. To do this, they look for the location and frequency of keywords and phrases in the web page document and, sometimes, in the HTML META tags. They check out the title field and scan the headers and text near the top of the document. Some of them assess popularity by the number of links that are pointing to sites; the more links, the greater the popularity, i.e., value of the page.

Activities

Activity 2