Monday, March 6, 2017

How search engine works

How search engine works




How search engine works.

Internet
how-search-engine-works
a global computer network providing a variety of information and communication facilities, consisting of interconnected networks using standardized communication protocols.

There are  one billion websites.The size of the internet is approx about 1.2 million terabytes. (one terabyte is 1,000 gigabytes).On  internet,google search engine is used by the 80% of the world.So how does google  gives you result in just just millisecond from this huge data base .the answers are search engine .so how does these search engines works?
 search engine answer tens of millions of queries every day  

When you search for something on google you are actually searching web actually the googles index of the web.So there is 60 trillions of website and growing per second and google index every web page if owner has allowed.

So lets see ho do  this works?

The most important measure for a search engine is the search performance, quality of the results and ability to crawl, and index the web efficiently. The primary goal is to provide high quality search results over a rapidly growing World Wide Web. Some of the efficient and recommended search engines are Google, Yahoo and Teoma, which share some common features and are standardized to some extent

There are three basic stages for a search engine: 
  • crawling -collecting the data from the web pages.
  • indexing -analysing the collected data and its storage in server
  • retrieval-delivery of result on search query 


what is crawling of website or a page?

A Web crawler is a computer progarm which travels the web automatically  and  downloads data and  stores Web pages, often for a Web search engine.Crawlers or spider build lists of the words found on Web sites. When a spider is building its lists, the process is called Web crawling.
WebCrawler is a metasearch engine that blends the top search results from Google Search engine and Yahoo! Search

Crawling of website means acquisition of data from the website .crawling of website is done through the the computer software called spiders or bots .google crawlers is also know as googlebots .The web crawlers are as fast as they can scan 100 of pages in millisecond.

There is a URL Server that sends lists of URLs to be fetched to the crawlers. The web pages that are fetched are then sent to the store server. The store server then compresses and stores the web pages into a repository

crawlers visit the page as we do,they scan everything on the webpage .The page title,keywords,page links,etc .modern crawlers also scan page layout,the advertising space on webpage etc.

First thing the crawlers do when they visit any website that they search for they search for the file name "robots.txt" (robots protocol). robots.txt file all the information about which pages have to be crawled and which page not to be crawled .with out robots.txt file web crawlers will  not crawl the website and site will not be submitted to the google index.  

example of robots.txt

User-agent: *
 Disallow: /yoursite/temp/
User-agent: searchengine
 Disallow:

have a look at  facebook robot.txt file www.facebook.com/robots.txt

or my bloggers techtysechyblog.blogspot.in/robots.txt


google crawlers fetch entire page and then it fetch the links available on  the webpage and the it also fetches the links available on that page  and so on until the crawlers decide to stop.

every search engine either its is google,yahoo,bing,bidu,etc uses crawlers


what is indexing of the webpage ?

When you search for something on google you are actually searching the googles index of the web. it is search engine index which provide the result of your query. Without search engine index is would be not possible for search engines to give your result with in a second it would take much time and afford.


search engine index is a place where all the data which is collected by the crawler is stored in the severs.The indexer is is a progarm which reads the data collected by the web crawlers and decides what is the web page is about.


 Web indexing includes back-of-book-style indexes to individual websites or web documents and the creation of metadata (subject keywords and description tags) to provide a more useful vocabulary for Internet search engines.



once the crawling process is over .It compile the massive index of all the words  and store these data in googles server it remember the the location of each webpage.the stored is then organised and interpreted by the search engine’s algorithm to measure its importance compared to similar pages.


how google results are ordered ?


So after crawling and indexing, how does google result the search query ? .the result of your search query is based on the web page ranking ,the ranking of the web page is decide on at least  200 hundred factors the page which rank well is definitely going to be on the top  of the search result .   


the ranking algorithm is is set up by the human but no human can manually adjust the the ranking of the web page.

some of the factors deciding page rank:





  • Keyword usage:
  • Site structure
  • Site speed
  • Time spent on site
  • Number of inbound links
  • Quality of inbound link 
  • all most all the search engine uses this method foe crawling ,indexing ,and to result the query.hera google has been taken as example because most of the world uses google search engine.
    if you find this post useful please share .
    if any recommendation then please comment below.    
    Reference;
    www.wikipedia.com
    http://www.totallycommunications.com/latest/search-engine-basics-crawling-indexing-ranking/
    http://www.brickmarketing.com/define-search-engine-index.htm
    http://searchsoa.techtarget.com/definition/crawler
    http://www.slideshare.net/sanchitsaini/working-of-a-web-crawler
    http://computer.howstuffworks.com/internet/basics/search-engine1.htm
    http://www.makeuseof.com/tag/how-do-search-engines-work-makeuseof-explains/

    pdf:



    HOW SEARCH ENGINES WORK  
    AND A WEB CRAWLER APPLICATION  
    Monica Peshave Department of Computer Science University of Illinois at Springfield Springfield, IL 62703 mpesh01s@uis.edu  
    Advisor: Kamyar Dezhgosha University of Illinois at Springfield One University Plaza, MS HSB137 Springfield, IL 62703-5407 kdezh1@uis.edu   
    DRAFT! © April 1, 2009 Cambridge University Press
    J. Pei: Information Retrieval and Web Search -- Web Crawling 


    Go to link download