Tuesday, January 6, 2015

How search engines work (EPP LESSON)

The easiest way to find information on the Internet is to use a search engine. The search engine is an Internet tool that searches Internet sites containing the words you type in as a search term.
For example, if you want to see if there are any Information and Communication Technology lesson plans that you can use, you can type in “ICT lesson plans” as the search term. After a brief wait, you can then choose from a list of web pages with those particular words.
Search engines do the following basic tasks:
1.     They search the Internet or select web pages based on important words.
2.     They keep an index of the words they find and where they find them.
3.     They allow users to look for words or phrases found in that index.
Before, search engines held an index of a few hundred thousand pages and documents, and were visited only a thousand times a day. Today, a top search engine like Google indexes hundreds of millions of pages and is visited tens of million times per day.

How do search engines do it?

Search engines do not search the Internet itself, but searches a database of information about the Internet. Whenever documents are placed on the Internet, it can only be found if the information in that document has already been documented in the search engine’s database.
To find information on numerous web pages, search engines use special software robots called spiders to build lists of the words found on websites. This process is called “web crawling.” In order to build and maintain a useful list of words, a search engine’s spiders look at a lot of pages.
How do spiders do it? The usual starting points are lists of heavily useful servers and very popular pages. The spiders begin with the popular site, following every link found within the site and indexing words on its pages. This way, the spiders spread out across the most widely used portions of the web.

How Google works

Google began as an academic search engine, and is now the most widely used search engine. Google spiders work quickly. They built an initial system to use multiple spiders, about three at a time.
Each spider could keep about 300 connections to web pages open at a time. Sometimes, using four spiders, their system could crawl over 100 pages per second, being able to generate up to 600 kilobytes of data each second.
When the Google spider looks at an HTML page, it notices two things:
1.     The words in the page
2.     Where the words were located
Google spiders take note of titles, subtitles, meta tags and other positions of relative importance, indexing every significant word on a page.
Sources:

FAQ#1: What's an Internet search engine? from
http://www.cln.org/searching_faqs.html#FAQ1
Franklin, Curt. “How Internet Search Engines Work.” from
http://computer.howstuffworks.com/internet/basics/search-engine1.htm

No comments:

Post a Comment