The easiest way to find
information on the Internet is to use a search engine. The search engine is an
Internet tool that searches Internet sites containing the words you type in as
a search term.
For example, if you want
to see if there are any Information and Communication Technology lesson plans
that you can use, you can type in “ICT lesson plans” as the search term. After
a brief wait, you can then choose from a list of web pages with those particular
words.
Search engines do the
following basic tasks:
1.
They search the Internet or select web pages
based on important words.
2.
They keep an index of the words they find and
where they find them.
3.
They allow users to look for words or phrases
found in that index.
Before, search engines
held an index of a few hundred thousand pages and documents, and were visited
only a thousand times a day. Today, a top search engine like Google indexes
hundreds of millions of pages and is visited tens of million times per day.
How do
search engines do it?
Search engines do not
search the Internet itself, but searches a database of information about the
Internet. Whenever documents are placed on the Internet, it can only be found
if the information in that document has already been documented in the search
engine’s database.
To find information on
numerous web pages, search engines use special software robots called spiders
to build lists of the words found on websites. This process is called “web
crawling.” In order to build and maintain a useful list of words, a search
engine’s spiders look at a lot of pages.
How do spiders do it?
The usual starting points are lists of heavily useful servers and very popular
pages. The spiders begin with the popular site, following every link found
within the site and indexing words on its pages. This way, the spiders spread
out across the most widely used portions of the web.
How
Google works
Google began as an
academic search engine, and is now the most widely used search engine. Google
spiders work quickly. They built an initial system to use multiple spiders,
about three at a time.
Each spider could keep
about 300 connections to web pages open at a time. Sometimes, using four
spiders, their system could crawl over 100 pages per second, being able to
generate up to 600 kilobytes of data each second.
When the Google spider
looks at an HTML page, it notices two things:
1.
The words in the page
2.
Where the words were located
Google spiders take note
of titles, subtitles, meta tags and other positions of relative importance,
indexing every significant word on a page.
Sources:
FAQ#1: What's
an Internet search engine? from
http://www.cln.org/searching_faqs.html#FAQ1
Franklin , Curt.
“How Internet Search Engines
Work.” from
http://computer.howstuffworks.com/internet/basics/search-engine1.htm
http://www.cln.org/searching_faqs.html#FAQ1
http://computer.howstuffworks.com/internet/basics/search-engine1.htm
No comments:
Post a Comment