
Google’s First Production Server ... with the hair pulled back, revealing a rack of cheap networked PCs, circa 1999.
Each level has a couple of PC boards slammed in there, partially overlapping. This approach reflects a presumption of rapid obsolescence of cheap hardware, which would not need to be repaired. Several of the PCs never worked, and the system design optimized around multiple computer failures.
According to Larry and Sergey, the beta system used Duplo blocks for the chassis because generic brand plastic blocks were not rigid enough.
Original hardware
The original hardware (ca. 1998) that was used by Google when it was located at Stanford University, included:
- Sun Ultra II with dual 200 MHz processors, and 256MB of RAM. This was the main machine for the original Backrub system.
- 2 x 300 MHz Dual Pentium II Servers donated by Intel, they included 512MB of RAM and 9 x 9GB hard drives between the two. It was on these that the main search run.
- F50 IBM RS/6000 donated by IBM, included 4 processors, 512MB of memory and 8 x 9GB hard drives.
- Two additional boxes included 3 x 9GB hard drives and 6 x 4GB hard drives respectively (the original storage for Backrub). These were attached to the Sun Ultra II.
- IBM disk expansion box with another 8 x 9GB hard drives donated by IBM.
- Homemade disk box which contained 10 x 9GB SCSI hard drives.
Google's server infrastructure is divided in several types, each assigned to a different purpose:
- Google load balancers take the client request and forward it to one of the Google Web Servers via Squid proxy servers.
- Squid proxy servers take the client request from load balancers and return the result if present in local cache otherwise forward it to Google Web Server.
- Google web servers coordinate the execution of queries sent by users, then format the result into an HTML page. The execution consists of sending queries to index servers, merging the results, computing their rank, retrieving a summary for each hit (using the document server), asking for suggestions from the spelling servers, and finally getting a list of advertisements from the ad server.
- Data-gathering servers are permanently dedicated to spidering the Web. Google's web crawler is known as GoogleBot. They update the index and document databases and apply Google's algorithms to assign ranks to pages.
- Each index server contains a set of index shards. They return a list of document IDs ("docid"), such that documents corresponding to a certain docid contain the query word. These servers need less disk space, but suffer the greatest CPU workload.
- Document servers store documents. Each document is stored on dozens of document servers. When performing a search, a document server returns a summary for the document based on query words. They can also fetch the complete document when asked. These servers need more disk space.
0 comments:
Post a Comment