AdelPlex
this site the web

Was Not Soccer !!



- Preserve the spirit of sportsmanship around the world, while revealing the barbarity of those who seek to destroy the moral fiber of team spirit and spectator sports.

- Correct the carefully crafted campaigns to twist facts and propagate lies, especially those practiced in the qualifying matches leading to the 2010 South Africa World Cup on the part of Algerians.

- Call on civilized people to take a stand in expressing their condemnation to the unprecedented negative behavior of sports violence where soccer stadiums and its surroundings become occupied by hooligans, vigilantes and mobs waving their weapons, assaulting and intimidating fans.

Website: http://www.wasnotsoccer.com

Egypt launches Arabic web domain

Egypt will open the world's first Arabic language internet domain ..

Dr. Tarek Kamel said the new domain name would be ".masr" written in the Arabic alphabet.
It translates as ".Egypt".

"It is a great moment for us... The internet now speaks Arabic," Dr. Kamel said.

Last month, internet regulator Icann voted to allow non-Latin web addresses. Domain names can now be written in Arabic, Chinese and other scripts.

He said the new domain would "offer new avenues for innovation, investment and growth" in the Arabic-speaking world.

Internet Governance Forum, 09

The Internet Governance Forum live Streaming, Sharm El Sheikh .. Egypt

http://mcit.gov.eg/livestreaming.aspx




Jerry Yang, co-founder and former CEO of Yahoo addressing @ IGF ... Yahoooo !!30 minutes ago

Open Source Business Intelligence

Open source business applications have started to mature into robust platforms, serving sales, finance and operational needs. Now, open source business intelligence (OSBI) platforms are also gaining attention, as owners of proprietary BI applications are navigating market consolidation, product roadmap changes and ever-increasing licensing costs.

OSBI platforms are typically marketed as commercial open source software (COSS), similar to the model popularized by Red Hat. COSS companies generate revenue from support, subscriptions and training. Most CIOs feel it’s critical for their applications to have an identified commercial entity standing behind business infrastructure rather than relying on the promise of community alone.




Jaspersoft provides high quality software and conveniently packaged services for Open Source and Professional edition customers. Use this guide to determine which software edition is right for you. Businesses can choose between Open Source or Professional for internal business use. Developers can choose between Open Source or Professional OEM for publicly distributed applications.

http://www.jaspersoft.com/sites/default/files/downloads/Choosing%20the%20Right%20JasperSoft%20BI%20Edition-2009.pdf




Pentaho addresses our reporting and data integration needs, provides tremendous flexibility, and offered far better value than proprietary alternatives. Pentaho was also an attractive partner because of the quality of their team, and the size and activity of the Pentaho community.
Deployment Overview

Key Challenges
* Integrating multiple streams of consumer purchasing information
* Pentaho Reporting Enterprise Edition
* Pentaho Data Integration Enterprise Edition
* Debian Linux, MySQL database

Why Pentaho
* Technology maturity
* Market leadership
* Value vs. proprietary
* Open source advantages - community, openness, standards support

http://demo.pentaho.com/pentaho/Home

Business Intellegence White Papers

The Business Intelligence resource for business and technical professionals covering a wide range of topics including Performance Management, Data Warehouse, Analytics, Data Mining, Reporting, Customer Relationship Management and Balanced Scorecard.

You can download your selection at the following locations:-

http://www.businessintelligence.com/fwp/Defining_Business_Analytics.pdf
http://www.businessintelligence.com/fwp/Uncovering_Insight_Hidden_In_Text.pdf
http://www.businessintelligence.com/fwp/EI_In_Search_Of_Clarity.pdf
http://www.businessintelligence.com/fwp/Making_Business_Relevant_Information.pdf
http://www.businessintelligence.com/fwp/Expanding_BI_Role_By_Including_Predictive_Analytics.pdf
http://www.businessintelligence.com/fwp/All_Information_All_People_One_Platform.pdf
http://www.businessintelligence.com/fwp/Business_Intelligence_Now_More_Than_Ever.pdf
http://www.businessintelligence.com/fwp/Business_Intelligence_Standardization.pdf
http://www.businessintelligence.com/fwp/BI_for_Decision_Makers.pdf
http://www.businessintelligence.com/fwp/Business_Intelligence_The_Definitive_Guide.pdf
http://www.businessintelligence.com/fwp/Leveraging_Solutions.pdf

600 million unique visits per month, Yahoo Open Source Traffic Server

Today, Yahoo moved its open source cloud computing initiatives up a notch with the donation of its Traffic Server product to the Apache Software Foundation. Traffic Server is used in-house at Yahoo to manage its own traffic and it enables session management, authentication, configuration management, load balancing, and routing for entire cloud computing stacks. We asked the cloud computing team at Yahoo for a series of guest posts about Traffic Server, and you'll find the first one here.

Introducing Traffic Server
By The Yahoo Cloud Computing Team


Today, Yahoo is excited to open source Traffic Server software that we rely on extensively. An Apache Incubator project, Traffic Server is an extremely high performance Web proxy-caching server, and has a robust plugin API that allows you to modify and extend its behavior and capabilities.

Traffic Server ships with not only an HTTP web proxy and caching solution, but also provides a server framework, with which you can build very fast servers for other protocols. As an HTTP web proxy, Traffic Server sits between clients and servers and adds services like caching, request routing, filtering, and load balancing. Web sites frequently use a caching server to improve response times by locally storing web pages, web services, or web objects like images, JavaScript, and style sheets, and to relieve the burden of creating these pages/services from their front and back end infrastructure. Corporations and ISPs frequently use forward proxy servers to help protect their users from malicious content, and or speed delivery of commonly requested pages. The Traffic Server code and documentation is available today, and we'll be making a release version in the near future.

Traffic Server is fast. It was designed from the start as a multi-threaded event driven server, and thus scales very well on modern multi-core servers. With a quad core 1.86GHz processor, it can do more than 30,000 requests/second for certain traffic patterns. In contrast, some of the other caching proxy servers we've used max out at around 8,000 requests/second using the same hardware.

It's extensible. It has native support for dynamically loading shared objects that can interact with the core engine. Yahoo! has internal plugins that remap URLs; route requests to different services based on cookies; allow caching of oAuth authenticated requests; and modify behaviors based on Cache-Control header extensions. We've replaced the default memory cache with a plugin. It's even possible to write plugins to handle other protocols like FTP, SMTP, SOCKS, RTSP; or to modify the response body. There is documentation for the plugin APIs, and sample plugin code available today.

Traffic Server is serving more than 30 billion Web objects a day across the Yahoo! network, delivering more than 400 terabytes of data per day. It's in use as a proxy or cache (or both) by services like the Yahoo! Front Page, Mail, Sports, Search, News, and Finance. We continue to find new uses for Traffic Server, and it gets more and more ingrained into our infrastructure each day.

At its heart, Traffic Server is a general-purpose implementation that can be used to proxy and cache a variety of workloads, from single site acceleration to CDN deployment and very large ISP proxy caching. It has all the major features you'd expect from such a server, including behavior like cache partitioning. You can dedicate different cache volumes to selected origin servers, allowing you to serve multiple sites from the same cache without worrying about one of them being "pushed" out of the cache by the others.

The current version of Traffic Server is the product of literally hundreds of developer-years. It originated in Inktomi as the Inktomi Traffic Server, and was successfully sold commercially for several years. Chuck Neerdaels, one of the co-authors of Harvest, which became the popular open source Squid proxy caching server, has been integral in Traffic Server's history, managing the early development team, and leading the group today. Yahoo! acquired Inktomi in 2003, and has a full time development team working on the server. We plan to continue active development. For example, we are planning to add support for IPv6 and 64bit, and improve its performance when dealing with very large files. We'd love to work with the community on these and other efforts.

Of course, the server is neither perfect nor complete. Internally, Yahoo! uses Squid for some caching use cases where we need more fine-grained cache controls like refresh_patterns, stale-if-error, and stale-while-revalidate. By open sourcing, you the community can help add the features you need more quickly than Yahoo! can by itself. In exchange, the public gets access to a server that Yahoo! has found incredibly useful to speed page downloads and save back-end resources through caching.

As an Apache Incubator project, we hope to graduate to a full Apache top level project. We chose the Apache Software Foundation because of our experience with the Hadoop project; its great infrastructure to support long running projects; and its long history of delivering enterprise class; free software that supports large communities of users and developers alike.

Over the next few weeks, look for more detailed posts on plugins; how to get started with using the code; and more details on the roadmap and how to get involved in the project. In the meantime, grab the source; browse the documentation; send feedback; and help make the project even better

Matrix Runs over Microsoft Windows

Google’s First Production Server


Google’s First Production Server ... with the hair pulled back, revealing a rack of cheap networked PCs, circa 1999.

Each level has a couple of PC boards slammed in there, partially overlapping. This approach reflects a presumption of rapid obsolescence of cheap hardware, which would not need to be repaired. Several of the PCs never worked, and the system design optimized around multiple computer failures.

According to Larry and Sergey, the beta system used Duplo blocks for the chassis because generic brand plastic blocks were not rigid enough.

Original hardware

The original hardware (ca. 1998) that was used by Google when it was located at Stanford University, included:

  • Sun Ultra II with dual 200 MHz processors, and 256MB of RAM. This was the main machine for the original Backrub system.
  • 2 x 300 MHz Dual Pentium II Servers donated by Intel, they included 512MB of RAM and 9 x 9GB hard drives between the two. It was on these that the main search run.
  • F50 IBM RS/6000 donated by IBM, included 4 processors, 512MB of memory and 8 x 9GB hard drives.
  • Two additional boxes included 3 x 9GB hard drives and 6 x 4GB hard drives respectively (the original storage for Backrub). These were attached to the Sun Ultra II.
  • IBM disk expansion box with another 8 x 9GB hard drives donated by IBM.
  • Homemade disk box which contained 10 x 9GB SCSI hard drives.

Google's server infrastructure is divided in several types, each assigned to a different purpose:

  • Google load balancers take the client request and forward it to one of the Google Web Servers via Squid proxy servers.
  • Squid proxy servers take the client request from load balancers and return the result if present in local cache otherwise forward it to Google Web Server.
  • Google web servers coordinate the execution of queries sent by users, then format the result into an HTML page. The execution consists of sending queries to index servers, merging the results, computing their rank, retrieving a summary for each hit (using the document server), asking for suggestions from the spelling servers, and finally getting a list of advertisements from the ad server.
  • Data-gathering servers are permanently dedicated to spidering the Web. Google's web crawler is known as GoogleBot. They update the index and document databases and apply Google's algorithms to assign ranks to pages.
  • Each index server contains a set of index shards. They return a list of document IDs ("docid"), such that documents corresponding to a certain docid contain the query word. These servers need less disk space, but suffer the greatest CPU workload.
  • Document servers store documents. Each document is stored on dozens of document servers. When performing a search, a document server returns a summary for the document based on query words. They can also fetch the complete document when asked. These servers need more disk space.

The 2009 Linux Kernel Summit Report

The 2009 Linux Kernel Summit was held in Tokyo, Japan on October 19 and 20. Jet-lagged developers from all over the world discussed a wide range of topics.


The sessions held on the first day of the summit were:

  • Mini-summit readouts; reports from various mini-summit meetings which have happened over the last six months.

  • The state of the scheduler, the kernel subsystem that everybody loves to complain about.

  • The end-user panel, wherein Linux users from the enterprise and embedded sectors talk about how Linux could serve them better.

  • Regressions. Nobody likes them; are the kernel developers doing better at avoiding and fixing them?

  • The future of perf events; a discussion of where this new subsystem is likely to go next.

  • LKML volume and related issues. A session slot set aside for lightning talks was really mostly concerned with the linux-kernel mailing list and those who post there.

  • Generic device trees. The device tree abstraction has proved helpful in the creation of generic kernels for embedded hardware. This session talked about what a device tree is and why it's useful.

The discussions on the second day were:

  • Legal issues; a lawyer visits the summit to talk about the software patent threat and how to respond to it.

  • How Google uses Linux: the challenges faced by one of our largest and most secretive users.

  • : is the kernel getting slower? How do we know and where are the problems coming from?

  • Realtime: issues related to the merging of the realtime preemption tree into the mainline.

  • Generic architecture support: making it easier to port Linux to new processor architectures.

  • Development process issues, including linux-next, staging, merge window rules, and more.

[Joker] The kernel summit closed with a general feeling that the discussions had gone well. It was also noted that our Japanese hosts had done an exceptional job in supporting the summit and enabling everything to happen; it would not be surprising to see developers agitating for the summit to return to Japan in the near future.

Gerrit: Google-style code review meets git

Gerrit, a Git-based system for managing code review, is helping to spread the popular distributed revision control system into Android-using companies, many of which have heavy quality assurance, management, and legal processes around software. HTC, Qualcomm, TI, Sony Ericsson, and Android originator Google are all running Gerrit, project leader Shawn Pearce said in a talk at the October 2009 GitTogether event, hosted at Google in Mountain View. Click below (subscribers only) for the full report by Don Marti.

How Google uses Linux

There may be no single organization which runs more Linux systems than Google. But the kernel development community knows little about how Google uses Linux and what sort of problems are encountered there. Google's Mike Waychison traveled to Tokyo to help shed some light on this situation; the result was an interesting view on what it takes to run Linux in this extremely demanding setting.

Mike started the talk by giving the developers a good laugh: it seems that Google manages its kernel code with Perforce. He apologized for that. There is a single tree that all developers commit to. About every 17 months, Google rebases its work to a current mainline release; what follows is a long struggle to make everything work again. Once that's done, internal "feature" releases happen about every six months.

This way of doing things is far from ideal; it means that Google lags far behind the mainline and has a hard time talking with the kernel development community about its problems.

There are about 30 engineers working on Google's kernel. Currently they tend to check their changes into the tree, then forget about them for the next 18 months. This leads to some real maintenance issues; developers often have little idea of what's actually in Google's tree until it breaks.

And there's a lot in that tree. Google started with the 2.4.18 kernel - but they patched over 2000 files, inserting 492,000 lines of code. Among other things, they backported 64-bit support into that kernel. Eventually they moved to 2.6.11, primarily because they needed SATA support. A 2.6.18-based kernel followed, and they are now working on preparing a 2.6.26-based kernel for deployment in the near future. They are currently carrying 1208 patches to 2.6.26, inserting almost 300,000 lines of code. Roughly 25% of those patches, Mike estimates, are backports of newer features.

There are plans to change all of this; Google's kernel group is trying to get to a point where they can work better with the kernel community. They're moving to git for source code management, and developers will maintain their changes in their own trees. Those trees will be rebased to mainline kernel releases every quarter; that should, it is hoped, motivate developers to make their code more maintainable and more closely aligned with the upstream kernel.

Linus asked: why aren't these patches upstream? Is it because Google is embarrassed by them, or is it secret stuff that they don't want to disclose, or is it a matter of internal process problems? The answer was simply "yes." Some of this code is ugly stuff which has been carried forward from the 2.4.18 kernel. There are also doubts internally about how much of this stuff will be actually useful to the rest of the world. But, perhaps, maybe about half of this code could be upstreamed eventually.

As much as 3/4 of Google's code consists of changes to the core kernel; device support is a relatively small part of the total.

Google has a number of "pain points" which make working with the community harder. Keeping up with the upstream kernel is hard - it simply moves too fast. There is also a real problem with developers posting a patch, then being asked to rework it in a way which turns it into a much larger project. Alan Cox had a simple response to that one: people will always ask for more, but sometimes the right thing to do is to simply tell them "no."

In the area of CPU scheduling, Google found the move to the completely fair scheduler to be painful. In fact, it was such a problem that they finally forward-ported the old O(1) scheduler and can run it in 2.6.26. Changes in the semantics of sched_yield() created grief, especially with the user-space locking that Google uses. High-priority threads can make a mess of load balancing, even if they run for very short periods of time. And load balancing matters: Google runs something like 5000 threads on systems with 16-32 cores.

On the memory management side, newer kernels changed the management of dirty bits, leading to overly aggressive writeout. The system could easily get into a situation where lots of small I/O operations generated by kswapd would fill the request queues, starving other writeback; this particular problem should be fixed by the per-BDI writeback changes in 2.6.32.

As noted above, Google runs systems with lots of threads - not an uncommon mode of operation in general. One thing they found is that sending signals to a large thread group can lead to a lot of run queue lock contention. They also have trouble with contention for the mmap_sem semaphore; one sleeping reader can block a writer which, in turn, blocks other readers, bringing the whole thing to a halt. The kernel needs to be fixed to not wait for I/O with that semaphore held.

Google makes a lot of use of the out-of-memory (OOM) killer to pare back overloaded systems. That can create trouble, though, when processes holding mutexes encounter the OOM killer. Mike wonders why the kernel tries so hard, rather than just failing allocation requests when memory gets too tight.

So what is Google doing with all that code in the kernel? They try very hard to get the most out of every machine they have, so they cram a lot of work onto each. This work is segmented into three classes: "latency sensitive," which gets short-term resource guarantees, "production batch" which has guarantees over longer periods, and "best effort" which gets no guarantees at all. This separation of classes is done partly through the separation of each machine into a large number of fake "NUMA nodes." Specific jobs are then assigned to one or more of those nodes. One thing added by Google is "NUMA-aware VFS LRUs" - virtual memory management which focuses on specific NUMA nodes. Nick Piggin remarked that he has been working on something like that and would have liked to have seen Google's code.

There is a special SCHED_GIDLE scheduling class which is a truly idle class; if there is no spare CPU available, jobs in that class will not run at all. To avoid priority inversion problems, SCHED_GIDLE processes have their priority temporarily increased whenever they sleep in the kernel (but not if they are preempted in user space). Networking is managed with the HTB queueing discipline, augmented with a bunch of bandwidth control logic. For disks, they are working on proportional I/O scheduling.

Beyond that, a lot of Google's code is there for monitoring. They monitor all disk and network traffic, record it, and use it for analyzing their operations later on. Hooks have been added to let them associate all disk I/O back to applications - including asynchronous writeback I/O. Mike was asked if they could use tracepoints for this task; the answer was "yes," but, naturally enough, Google is using its own scheme now.

Google has a lot of important goals for 2010; they include:

  • They are excited about CPU limits; these are intended to give priority access to latency-sensitive tasks while still keeping those tasks from taking over the system entirely.

  • RPC-aware CPU scheduling; this involves inspection of incoming RPC traffic to determine which process will wake up in response and how important that wakeup is.

  • A related initiative is delayed scheduling. For most threads, latency is not all that important. But the kernel tries to run them immediately when RPC messages come in; these messages tend not to be evenly distributed across CPUs, leading to serious load balancing problems. So threads can be tagged for delayed scheduling; when a wakeup arrives, they are not immediately put onto the run queue. Instead, the wait until the next global load balancing operation before becoming truly runnable.

  • Idle cycle injection: high-bandwidth power management so they can run their machines right on the edge of melting down - but not beyond.

  • Better memory controllers are on the list, including accounting for kernel memory use.

  • "Offline memory." Mike noted that it is increasingly hard to buy memory which actually works, especially if you want to go cheap. So they need to be able to set bad pages aside. The HWPOISON work may help them in this area.

  • They need dynamic huge pages, which can be assembled and broken down on demand.

  • On the networking side, there is a desire to improve support for receive-side scaling - directing incoming traffic to specific queues. They need to be able to account for software interrupt time and attribute it to specific tasks - networking processing can often involve large amounts of softirq processing. They've been working on better congestion control; the algorithms they have come up with are "not Internet safe" but work well in the data center. And "TCP pacing" slows down outgoing traffic to avoid overloading switches.

  • For storage, there is a lot of interest in reducing block-layer overhead so it can keep up with high-speed flash. Using flash for disk acceleration in the block layer is on the list. They're looking at in-kernel flash translation layers, though it was suggested that it might be better to handle that logic directly in the filesystem.

Mike concluded with a couple of "interesting problems." One of those is that Google would like a way to pin filesystem metadata in memory. The problem here is being able to bound the time required to service I/O requests. The time required to read a block from disk is known, but if the relevant metadata is not in memory, more than one disk I/O operation may be required. That slows things down in undesirable ways. Google is currently getting around this by reading file data directly from raw disk devices in user space, but they would like to stop doing that.

The other problem was lowering the system call overhead for providing caching advice (with fadvise()) to the kernel. It's not clear exactly what the problem was here.

All told, it was seen as one of the more successful sessions, with the kernel community learning a lot about one of its biggest customers. If Google's plans to become more community-oriented come to fruition, the result should be a better kernel for all.

 

. . . Social Networks . . .

Usage Policies