Anatomy Of a Search Engine Spider

Anatomy Of a Search Engine Spider

Category Rss Feed - http://www.look4articles.com/rss.php?rss=136

By : Jackie Smijames 29 or more times read

Submitted 2011-03-22 04:56:33

The term "search engine" is often used generically to describe both crawler-based search engines and human-powered directories. These two types of search engines gather their listings in radically different ways.

Everything the crawler finds goes into the second part of the search engine: the index. The indexer takes every word on a web page, logs it, categorizes it and then stores the results in a huge database. Indexing every word allows most search engines to go beyond simple keyword searches and allows proximity searching for words close to each other.

Some indexers also index the HTML coding which allows the search engine to look by web page categories like URLs or titles. Most special searching features can be utilized in the advanced search areas of nearly all the major search engines. The Help section of every search tool will show you how to get maximum results from that specific search engine.

There is a time lag from when a web page is crawled to when it is indexed. Until it is indexed, it is unavailable to search engine users, which means it exists in their system, but is not yet accessible by you. This is why you have to be skeptical of some of the boasts of search tools. As an example, when Google announced in February, 2004, that it had increased its total number of pages to 4.28 billion, it did not mention that a portion of those results were un-indexed pages. Yes, you still had access to billions of Google's pages, just not all 4.28 billion!

If a crawler finds changes on a web page, then it updates the index to include the new information. The word "index" implies categorization and classification - activities that require human assessment and interpretation. In reality, the indexing for a search engine is done by computer (software, actually), and the rankings of the responses, or hits, are calculated by mathematical formulas as well.

To improve performance, many search engines eliminate certain common words like "is," "and," "or," and "of." These are called "stop words" that add no real benefit to the search. Search engines also have taken other steps to focus their searches by eliminating punctuation and converting all letters to lowercase. It is important to remember that each search engine has different rules and ways of working.
Query Process

The third part of a search engine is its query processing capability, the complicated part of the process. What happens is the query is taken by the search engine, the index is searched, and all kinds of different factors are weighed in deciding what is relevant, what is not, all before the results are returned. The exact process differs with every search engine and the search engine companies closely guard the specific mathematical algorithms used to make their calculations. The big difference is the way relevance is calculated.

Crawler-Based Search Engines Crawler-based search engines, such as Google, create their listings automatically. They "crawl" or "spider" the web, then people search through what they have found.If you change your web pages, crawler-based search engines eventually find these changes, and that can affect how you are listed. Page titles, body copy and other elements all play a role.

Human-Powered Directories A human-powered directory, such as the Open Directory, depends on humans for its listings. You submit a short description to the directory for your entire site, or editors write one for sites they review. A search looks for matches only in the descriptions submitted.

Author Resource:

Hello , I am Jackie Smijames , I am a SEO expert, website designer and article writer. I am writing articles for nearly 2 years. My new interest is in website development. So come visit my latest website that are Download Mp3 for free and Miley Cyrus Mp3 ,I hope you will enjoy my articles and the websites.

Related Articles

HTML Ready Article. Click on the "Copy" button to copy into your clipboard.

<!DOCTYPE HTML PUBLIC '-//W3C//DTD HTML 4.01 Transitional//EN' 'http://www.w3.org/TR/html4/loose.dtd'><html><head><title>Look For Articles - Articles Directory | Anatomy Of a Search Engine Spider</title></head><body><h3>Anatomy Of a Search Engine Spider</h3> By: Jackie Smijames The term "search engine" is often used generically to describe both crawler-based search engines and human-powered directories. These two types of search engines gather their listings in radically different ways. Everything the crawler finds goes into the second part of the search engine: the index. The indexer takes every word on a web page, logs it, categorizes it and then stores the results in a huge database. Indexing every word allows most search engines to go beyond simple keyword searches and allows proximity searching for words close to each other. Some indexers also index the HTML coding which allows the search engine to look by web page categories like URLs or titles. Most special searching features can be utilized in the advanced search areas of nearly all the major search engines. The Help section of every search tool will show you how to get maximum results from that specific search engine. There is a time lag from when a web page is crawled to when it is indexed. Until it is indexed, it is unavailable to search engine users, which means it exists in their system, but is not yet accessible by you. This is why you have to be skeptical of some of the boasts of search tools. As an example, when Google announced in February, 2004, that it had increased its total number of pages to 4.28 billion, it did not mention that a portion of those results were un-indexed pages. Yes, you still had access to billions of Google's pages, just not all 4.28 billion! If a crawler finds changes on a web page, then it updates the index to include the new information. The word "index" implies categorization and classification - activities that require human assessment and interpretation. In reality, the indexing for a search engine is done by computer (software, actually), and the rankings of the responses, or hits, are calculated by mathematical formulas as well. To improve performance, many search engines eliminate certain common words like "is," "and," "or," and "of." These are called "stop words" that add no real benefit to the search. Search engines also have taken other steps to focus their searches by eliminating punctuation and converting all letters to lowercase. It is important to remember that each search engine has different rules and ways of working. Query Process The third part of a search engine is its query processing capability, the complicated part of the process. What happens is the query is taken by the search engine, the index is searched, and all kinds of different factors are weighed in deciding what is relevant, what is not, all before the results are returned. The exact process differs with every search engine and the search engine companies closely guard the specific mathematical algorithms used to make their calculations. The big difference is the way relevance is calculated. Crawler-Based Search Engines Crawler-based search engines, such as Google, create their listings automatically. They "crawl" or "spider" the web, then people search through what they have found.If you change your web pages, crawler-based search engines eventually find these changes, and that can affect how you are listed. Page titles, body copy and other elements all play a role. Human-Powered Directories A human-powered directory, such as the Open Directory, depends on humans for its listings. You submit a short description to the directory for your entire site, or editors write one for sites they review. A search looks for matches only in the descriptions submitted. Author Resource:-> Hello , I am Jackie Smijames , I am a SEO expert, website designer and article writer. I am writing articles for nearly 2 years. My new interest is in website development. So come visit my latest website that are <a href="http://kidmp3.com/"> Download Mp3 for free</a> and <a href="http://kidmp3.com/mp3/Miley-Cyrus.html"> Miley Cyrus Mp3 </a> ,I hope you will enjoy my articles and the websites. Article From <a href='http://www.look4articles.com/'>Look For Articles - Articles Directory</a> </body></html>

Firefox users please select/copy/paste as usual

New Members
	Sign up
	Learn more
	ASK It!


Directory Menu
	Home
	Login to Directory
	Submit Articles
	Submission Guidelines
	Top Articles
	Link Directory
	About Us
	Articles Directory Advertisement Media Kit
	Contact Us
	Privacy Policy
	RSS Feeds

Categories


Accessories
Advice
Aging
Arts
Arts and Crafts
Automotive
Break-up
Business
Business Management
Cancer Survival
Career
Cars and Trucks
CGI
Cheating
Coding Sites
Computers
Computers and Technology
Cooking
Crafts
Culture
Current Affairs
Databases
Death
Education
Entertainment
Etiquette
Family Concerns
Film
Finances
Food and Drinks
Gardening
Healthy Living
Holidays
Home
Home Management
Internet
Jobs
Leadership
Legal
Medical
Medical Business
Medicines and Remedies
Men Only
Motorcyles
Opinions
Our Pets
Outdoors
Parenting
Pets
Recreation
Relationships
Religion
Self Help
Self Improvement
Society
Sports
Staying Fit
Technology
Travel
Web Design
Weddings
Wellness, Fitness and Di
Women Only
Womens Interest
World Affairs
Writing

Actions

Print This Article

Add To Favorites

Privacy Policy \|Advertising \| Contact us
Copyright LOOK 4 ARTICLES FREE DIRECTORY - 2005-2012 - Powered By: HYIP