Content Processing C. Edward Chow Department of Computer ...

Content Processing C. Edward Chow Department of Computer ...

Introduction to Content Switch C. Edward Chow Department of Computer Science University of Colorado at Colorado Springs [email protected] This tutorial is available at http://cs.uccs.edu/~chow/pub/agere/contentswitch.ppt With agere as login and ag2003ere as password 4/11/2003 Edward Chow Content Switch 1 Outline of the Talk Overview of Content Delivery Network and Linux Virtual Server Technologies. Overview of Content Switching Concepts

TCP Delayed Binding and Their Improvement Conflict Detection in Content switching Rule Set Persistent Issues Problems Encountered in Content Processing and their Solutions Specific Implementations and Their Performance: Achieving High Availability with Content Switch. 4/11/2003 Edward Chow Content Switch 2 Content Delivery Network (CDN) Slow Response Huge Requests @Home Clients Host Server PSINet Sprint

QWest UUnet Mind Spring Server Crash Gloobix Clients Clients 4/11/2003 Edward Chow Content Switch 3 Content Delivery Problems http://www.akamai.com 4/11/2003 Edward Chow Content Switch 4

Use Client Cache/ Client Side Cache Server Fewer Requests Clients Fast Response Client Cache Clients 4/11/2003 PSINet QWest @Home Sprint UUnet Mind Spring Client Side Cache

Server Host Server Gloobix Clients Edward Chow Content Switch 5 Use Mirror Sites Need improvement by guiding the selection of mirror servers with server load/network bandwidth measurement Mirror Site Fewer Requests Clients PSINet QWest Clients Mind Spring

Fast Response 4/11/2003 @Home Host Server Sprint UUnet Server Gloobix Mirror Site Edward Chow Clients Content Switch 6 Edge Network Cache Servers Mirror Site Fast Response Clients Cache Server

PSINet Client Cache Cache Server Clients 4/11/2003 Cache Server @Home Cache Server Host Server Sprint UUnet QWest Server

Mind Spring Client Side Cache Server Fewer Requests Gloobix Edge Network Cache Server Edward Chow Mirror Site Clients Content Switch 7 Content Delivery Problem Cache Location Problem: Where to put cache servers? How many are needed? When/where/how to push/delivery the content?

How about dynamic content? 4/11/2003 Edward Chow Content Switch 8 Akamai Edge Delivery Service Date # of # of Networks # of Countries Edge Servers 11/2000 6000 335 54 6/2001 9700

650 56 Peering Bottleneck Problem: Access traffic evenly spread over 7400+ networks (no one over 5%; most << 1%) Need to put edge servers in many networks. 11/2000, 4 billion bits/day for 2800 sites. Source Http://www.akamai.com 4/11/2003 Edward Chow Content Switch 9 Caching Dynamic Content at Web Proxies Active Cache Project : [PeiCao 98] Univ. Wisconsin Cache Java applet to be executed at proxies Choice of passing to server, delivery cached copy, or generate dynamically. Edge Side Include (ESI): XML tag to specify ESI fragment in a web page. Each ESI fragment can have different cache/ 4/11/2003

Edward Chow Content Switch 10 Edge Side Include Example http://www.esi.org/

click here -->

4/11/2003

Edward Chow Content Switch 11 Solution to First Mile Problem First Mile Problem: Hugh requests at web site of CDN High Bandwidth Connection Caching End System Cache Client Cache Client Site Proxy Cache Server Mirror Site Caches Cache Servers in Internet Hierarchical Cache Servers, e.g., Squid/Harvest/Adaptive Web Edge Servers of Akamai Faster Server/Server Farm (Server Side Caching+Cluster) Layer4 Load balancer+Real Servers Content Switch+Real Servers Distributed Packet Rewrite 4/11/2003 Edward Chow Content Switch 12 Web Server Cluster

Load balancer can run at Application Level Reverse Proxy Real Real Server Server Kernel level Linux Virtual Server Load Balancer or Real Server Content Switch Load balancer can distribute requests based on Layer 3-4 info fixe field/fast hash Layer 7 info var. length/slow parsing 4/11/2003 Edward Chow Real Server Content Switch 13

Comparison of Load Balancers Reverse Proxy runs as application process requires more memory/packet copying. Linux Virtual Server runs in kernelno memory copying Name Type Level Layer Info Reverse Proxy/ Apache/Tomcat/Servlet SW Application 3-7 Linux Virtual Server SW Kernel

3-4 Linux Content Switch SW Kernel/Appl. 3-7 Layer4 Switch (narrow def.) HW Embedded OS 3-4 Content/Web Switch Embedded OS 3-7 4/11/2003 HW Edward Chow

Content Switch 14 Linux Virtual Server (LVS) Virtual server is a highly scalable and highly available server built on a cluster of real servers. The architecture of the cluster is transparent to end users, and the users see only a single virtual server with Virtual IP address (VIP). Http://www.linuxvirtualserver.org/ RIP1 Real Server1 Internet CIP Client 4/11/2003 VIP WAN/ LAN Load Balancer/Director Linux Box CIP: Client IP Address VIP: Virutal IP Address Edward Chow RIP: Real Server IP Address

RIP2 Real Server2 RIP3 Real Server3 Content Switch 15 LVS-NAT Configuration (Network Address Translation) All return traffic go through DirectorSlow Modify IP addr/port #/Checksum at Director Director and real servers at same LAN No modification needed on real-servers Port remapping: real web server can run on 8080 RIP1 Real Server1

Internet CIP Client 4/11/2003 RIP2 Real Server2 VIP Director Switch RIP3 Real Server3 Edward Chow Content Switch 16 LVS-NAT Configuration Step 2. Director routes Pkt Based on CIP, source port#, VIP and dst port#,

director selects one of the real servers Change the dst IP addr or port # of pkt. CIP VIP 1. request Internet RIP1 Real 2. Scheduling/ Server1 Rewrite packet CIP RIP1 RIP2 VIP Director Switch CIP ipvsadm cmd Client LVS Routing Scheduling Rules 4/11/2003 Real Server2 Edward Chow

RIP3 Real Server3 Content Switch 17 LVS-NAT Configuration Step 3. Real Server Replies Real server retrieves response. All real servers set default gateway to Director; like any other NAT or IP masquerade setup Packet will be sent back to Director. 3. Process CIP VIP 1. request Internet CIP Client 4/11/2003 RIP1 Real Request 2. Scheduling/ Server1 Rewrite packet CIP RIP1 RIP1 CIP RIP2 Real

Server2 VIP Director Switch RIP3 Real Server3 Edward Chow Content Switch 18 LVS-NAT Configuration Step 4. Director rewrites reply Director changes the dst IP addr. (RIP1) of pkt to VIP Modify port # if needed. Modify the checksum; send back pkt. 3. Process RIP1 Real Request CIP VIP 2. Scheduling/ Server1 CIP RIP1

Rewrite packet 1. request RIP1 CIP RIP2 Internet VIP CIP CIP Client 4/11/2003 Real Server2 VIP Switch Director 4. Rewrite reply Edward Chow RIP3 Real Server3 Content Switch 19 LVS-NAT Configuration (Network Address Translation)

All return traffic go through DirectorSlow Modify IP addr/port #/Checksum at Director. Director and real servers at same LAN CIP VIP 1. request Internet VIP CIP CIP Client 5. Receive reply 4/11/2003 3. Process RIP1 Real Request 2. Scheduling/ Server1 Rewrite packet CIP RIP1 RIP1 CIP RIP2 Real Server2 VIP Director Switch 4. Rewrite reply

Edward Chow RIP3 Real Server3 Content Switch 20 LVS-NAT Setup Commands # make the director forward the masquerading packets echo 1 > /proc/sys/net/ipv4/ip_forward ipchains -A forward -j MASQ -s 172.16.0.0/24 -d 0.0.0.0/0 # Add virtual service and link a scheduler to it ipvsadm -A -t 202.103.106.5:80 -s wlc (Weighted Least-Connection scheduling) ipvsadm -A -t 202.103.106.5:21 -s wrr (Weighted Round Robin scheduling ) #Add real servers and select forwarding method and weight ipvsadm -a -t 202.103.106.5:80 -R 172.16.0.2:80 -m ipvsadm -a -t 202.103.106.5:80 -R 172.16.0.3:8000 -m -w 2 ipvsadm -a -t 202.103.106.5:21 -R 172.16.0.2:21 -m 4/11/2003 Edward Chow Content Switch 21

LVS-Tunnel Configuration (IP Tunneling) Real Servers need to handle IP over IP packets. Real Servers can be geographically separated and return traffic go through different routes. Security implication! RIP1 Real Server1 2. Scheduling/ l e n Put packet in IP Tunnel Tun IP 1. request CIP VIP Internet CIP Client 4. Receive reply 4/11/2003 3. Process Request

RIP2 RIP0 IP Tunnel RIP0 RIP2 CIP VIP Real Server2 VIP Load Balancer IP T un RIP3 Linux Box ne l Real Server3 VIP CIP Edward Chow Content Switch 22 LVS-Tunnel Setup Commands #The load balancer (LinuxDirector), kernel 2.2.14 echo 1 > /proc/sys/net/ipv4/ip_forward ipvsadm -A -t 172.26.20.110:23 -s wlc

ipvsadm -a -t 172.26.20.110:23 -r 172.26.20.112 -i #The real server 1, kernel 2.2.14 echo 1 > /proc/sys/net/ipv4/ip_forward # insert it if it is compiled as module insmod ipip ifconfig tunl0 172.26.20.110 netmask 255.255.255.255 broadcast 172.26.20.110 up route add -host 172.26.20.110 dev tunl0 echo 1 > /proc/sys/net/ipv4/conf/all/hidden echo 1 > /proc/sys/net/ipv4/conf/tunl0/hidden 4/11/2003 Edward Chow Content Switch 23 LVS-DR Configuration (Direct Routing) Real servers need to configure a non-arp alias interface with virtual IP address and that interface must share same physical segment with load balancer.

Only Directors interface replies to VIP ARP request. Director only rewrites server MAC address; IP packet not changed Fast! VMAC 1. request GMAC VMAC CIP VIP 2. Scheduling/ Rewrite packet Director RMAC2 Internet VMAC RMAC3 CIP Client 4/11/2003 Real RMAC1 Server1

Route/ Switch GMAC: Gateway MAC address Edward Chow CIP VIP Real Server2 RMAC3 Real Server3 Content Switch 24 LVS-DR Configuration Step 3. Process Request Real server returns request. Request goes directly through switch/router; not Director. 1. request GMAC VMAC Internet

VIP CIP CIP Client 4. Receive reply 4/11/2003 2. Scheduling/ Rewrite packet VMAC Linux Real Director CIP VIP RMAC1 Server1 RMAC2 VMAC RMAC3 Switch CIP VIP RMAC3 RMAC3 GMAC VIP CIP

GMAC: Gateway MAC address Edward Chow Real Server2 3. Process Real Request Server3 Content Switch 25 LVS-DR Setup Commands #The load balancer (LinuxDirector), kernel 2.2.14 or later echo 1 > /proc/sys/net/ipv4/ip_forward ipvsadm -A -t 172.26.20.110:23 -s wlc ipvsadm -a -t 172.26.20.110:23 -r 172.26.20.112 g #The real server 1, 172.26.20.112, kernel 2.2.14 or later echo 1 > /proc/sys/net/ipv4/ip_forward ifconfig lo:0 172.26.20.110 netmask 255.255.255.255 broadcast 172.26.20.110 up route add -host 172.26.20.110 dev lo:0 echo 1 > /proc/sys/net/ipv4/conf/all/hidden echo 1 > /proc/sys/net/ipv4/conf/lo/hidden 4/11/2003

Edward Chow Content Switch 26 Performance of LVS-based Systems We ran a very simple LVS-DR arrangement with one PII-400 (2.2.14 kernel)directing about 20,000 HTTP requests/second to a bank of about 20 Web servers answering with tiny identical dummy responses for a few minutes. Worked just fine. Jerry Glomph Black, Director, Internet & Technical Operations, RealNetworks. I had basically (1024) four class-Cs of virtual servers which were loadbalanced through a LinuxDirector (two, actually -- I used redundant directors) onto four real servers which each had the four different classCs aliased on them. "Ted Pavlic" 4/11/2003 Edward Chow Content Switch 27 LVS Usage Survey 2/15/2001 Lorn Key Clusters

20 1 2 2 2 Directors Per Cluster 2 2 2 2 2 Total Real Servers 170

12 4 15 6 Routing Methods DR/NAT DR NAT DR NAT Schedule Methods RR/WLC

WRR LC WLC WLC Types of Real Servers RH6.2 Linux Win Linux Linux Solaris RH Service Offered WWW

WWW/ other WWW DB WWW SMTP WWW File System Replication rsync rsync Coda NFS Custom rsync custom

Monitoring Software Heartbeat ldirectord Nanny/ Pulse Heartbeat Mon Nanny Pulse Heartbeat 4/11/2003 Edward Chow Content Switch 28 C. Edward Chow Department of Computer Science University of Colorado at Colorado Springs Sponsored by Computer Comm. Lab/ITRI

Content Switch Topics What is a Content Switch? What Services it Can Provide Content Switch Example Related Technologies Content Switch Architecture and Basic Operations TCP Delay Binding and Related Improvement Content Switch Rule and Conflict Detection Conclusion 4/11/2003 Edward Chow Content Switch 30 Content Switch (CS)

Route packets based on high layer (Layer 5/7) headers and content. Examples: Direct Web traffic based on pattern of URLs, cookies URL Switching XML Tag Value Web Switching Can Route incoming email based on email address; Connect POP/IMAP based on login Web switches and Intel XML Director/accelerator are special cases of content switch. 4/11/2003 Edward Chow Content Switch 31 What Services It Can Provide Enabling premium services for e-commerce, ISP, and Web hosting providers Load Balancing and High Available Server Clusters: Web, E-commerce, Email, Computing, File, SAN Policy-based networking, differential/QoS services. Firewall, Strengthening DoS protection, cache/firewall load-balancing Flash-crowd' management Email Spam Protection, Virus Detection/Removal Applet Authentication/Filtering

4/11/2003 Edward Chow Content Switch 32 F5 VRM Solution Site II losangeles.domain.com Internet Internet Site I newyork.domain.com Router 3-DNS BIG-IP BIG-IP Local DNS GLOBAL-SITE Webmaster

Site III tokyo.domain.com Server Array User london.domain.com 4/11/2003 Edward Chow Content Switch 33 ServerIron 100 Web Switch Integrated Layer 2 through Layer 7 switching Support for up to 7,000,000 concurrent sessions, and 20 Gbps of throughput High-availability server load balancing with active/active configuration and stateful fail-over Industry's most powerful content switching capabilities, including URL, Cookie and SSL Session ID based switching Content-aware cache switching High performance VPN/Firewall load balancing Robust protection against Denial of Service (DoS) attacks

Most comprehensive global server load balancing with DNS Proxy and client proximity measurements 4/11/2003 Edward Chow Content Switch 34 Cisco CSS11000 Content Service Switch comprises four high-speed RISC processors, with 512 MB of memory, and 20.0 Gbps of throughput, Distributed flow forwarding engines feature up to 16 port-level network processors with up to 128 MB of memory for wire-speed delivery of Web content. Support for "sticky" connections based on IP address, Secure Socket Layer (SSL) session ID, and cookies ensures reliability and security for e- commerce transactions. The unique Cisco content replication technology enables dynamic expansion of site capacity in response to sudden "flash crowds" for "hot" content or seasonal peaks in traffic that can overwhelm servers. 4/11/2003 Edward Chow

Content Switch 35 Nortel Alteon Web Switch Provides wire-speed Layer 2/3 Ethernet switching, plus high-speed processing based on Layer 4 through 7 information (TCP ports, URLs, HTTP headers and cookies, SSL session ID, etc.) Processes hundreds of thousands of concurrent sessions each second on eight multi-rate Ethernet ports, (rate selectable per port), with one Gigabit or 100/1000 Mbps Ethernet uplink port Performs local and global server load balancing, application redirection, content filtering, streaming media load balancing, wireless Internet load balancing and content-aware Layer 7 switching Filters packets based on up to 2048 filtering rules (224 filtering rules for Alteon AD3/180e Web Switches), uniquely definable per switch and per port Meters, controls, and accounts for bandwidth use-by client, server farm, virtual service, application, user class, content type and other traffic classes-and supports guaranteed minimum, metered available, and maximum burst bandwidth rates

4/11/2003 Edward Chow Content Switch 36 Intel Netstructure XML Director 7280 Example of Rule: Server1: create */order.asp & //Amount[Value >= 10000] 4/11/2003 Edward Chow Content Switch 37 Phobos In-Switch Only load balancing switch in a PCI card form factor Plugs directly into any server PCI slot

Supports up to 8,192 servers, ensuring availability and maximum performance Six different algorithms are available for optimum performance: Round Robin, Weighted Percentage, Least Connections, Fastest Response Time, Adaptive and Fixed. Provides failover to other servers for high-availability of the web site U.S. Retail $1995.00 4/11/2003 Edward Chow Content Switch 38 E-Commerce Example: 1. Client

Client submits via HTTP/Post (or SOAP) the following purchase in XML: CCL 111222333 309121544 IBM Thinkpad T21 5000 10 50000 309121538 Intel wireless LAN PC Card 200 10 2000 52000 4/11/2003 Edward Chow Content Switch 39 E-Commerce Example: 2. Content Switch

Content switch receives the packet. Recognize it is a http post request from http request line POST /purchase.cgi HTTP/1.1 Recognize it is an XML document from the meta header content-type: TEXT/XML Parsing XML content Extract values of tag sequences: 52000 purchase/totalAmount CCL purchase/customerName Rule 1 is matched and packet is routed to one of highSpeedServers. Rule 1: if (xml.purchase/totalAmount > 5000) routeTo(highSpeedServers); Rule 2: if (xml.purchase/customerName == CCL) routeTo(specialCustomerServers); 4/11/2003 Edward Chow Content Switch 40 No Free Lunch: Penalty of Having Content Switch

Layer 4 Switching Layer 7 Switching packet header extraction fixed short fields varying length long fields switch rule matching hash table look up pattern matching Increased packet processing time. For XML Director/Accelerator, it needs to parse XML document and match tag sequences. 1-3? order of processing time Size of XML Document (Bytes) XML Content Extract Time (ms) 600 14 7000 21 67104 53 4/11/2003 Edward Chow Content Switch 41 Related Technologies Application level solution: Proxy server; Apache/Tomcat/Servlet; Microsoft NLB

Kernel level layer 4 load balancing solution: http://www.linuxvirtualserver.org/ Joseph Marks presentation LVS-NAT(Network Address Translation) web page LVS-IP Tunnel web page LVS-DR (Direct Routing) web page Hardware solution: Cisco 11000, F5 (Big IP), Alteon Web Systems, Foundry Networks (ServerIron), Excellent information on: Foundry ServerIron Installation and Configuration Guide, May 2000. http://www.foundrynet.com/services/documentation/si ug/ 4/11/2003 Edward Chow Content Switch 42 Basic Operations of Content Switching CS: Content Switching Incoming Packets CS Rules

CS Rule Editor Packet Classification Header Content Extraction Network Path Info Server Load Status 4/11/2003 CS Rule Matching Algorithm Packet Routing (Load Balancing) Edward Chow Forward Packet To Servers Content Switch 43

Content Switch Architecture Apostolopoulos Infocom 2000 4/11/2003 Edward Chow Content Switch 44 Content Switch Architecture Case A: Controller finds there is an entry in its Hash Table, Route request to sticky connection outgoing port Real Server1 Hash Table 4/11/2003 Edward Chow

Client Content Switch 45 Content Switch Architecture Case B: Step 1. Controller finds there is no entry in Hash Table, Route request to content switch processor Real Server1 Hash Table 4/11/2003 Edward Chow Client Content Switch 46

Content Switch Architecture CS Rules Step2. CS processor a. Extract content/Match CS rules b.Route request c. Setup Sequence# modification on server side port Case B: Step 1. Controller finds there is no entry in Hash Table, Route request to content switch processor pkt Modification info Hash Table Client 4/11/2003 Edward Chow

Real Server1 Content Switch 47 Content Switch Architecture CS Rules Case B: Step 1. Controller finds there is no entry in Hash Table, Route request to content switch processor pkt Modification info Real Server1 Step 3. At server side port, Return pkts are modified Sequence#/IP addr/Chksum Route back to client Hash

Table Client 4/11/2003 Step2. CS processor a. Extract content/Match CS rules b.Route request c. Setup Sequence# modification on server side port Edward Chow Content Switch 48 Efficient Content Switching Architecture Tasks: Million packets with thousand of rules to match and load balancing algorithms to run. How to assign tasks to the (network) processors and threads? Packet Extraction (Understand header formats, XML parsing) Content Switching Rule Matching Packet Routing (Load Balancing, Bandwidth Control) How Much Packet Processing Should Controllers Do? What a controller can do? A Typical Parallel Processing Problem?

4/11/2003 Edward Chow Content Switch 49 TCP Delay Binding (Splicing) client server content switch SYN(CSEQ) step1 SYN(DSEQ) ACK(CSEQ+1) ACK(DSEQ+1) DATA(CSEQ+1) ACK(DSEQ+1) DATA(DSEQ+1) ACK(CSEQ+LenR+1) step5

step6 step7 step8 step9 step2 step3 step4 SYN(CSEQ) SYN(SSEQ) ACK(CSEQ+1) ACK(SSEQ+1) DATA(CSEQ+1) ACK(SSEQ+1) DATA(SSEQ+1) ACK(CSEQ+lenR+1) step10 ACK(DSEQ+ lenD+1) DATA(?) 2nd request ACK(?) 4/11/2003 step11 ACK(SSEQ+lenD+1) lenR: size of http request.

. lenD: size of return document Edward Chow Content Switch 50 Improve Content Switching Setup CS-Real Server connections ahead of time (Persistent HTTP Connections). NetScale Reduce TCP 3-way handshake time Pre-allocate Server Scheme (Guess Real Server based on the TCP Sync) Sequence# modification on every return pkt Need to recompute checksum also. Filter Scheme (Offload Sequence# modification/rule matching to real servers). Buffering/Pipeline (aggregate) Requests 4/11/2003 Edward Chow Content Switch 51 Pre-Allocate Server Scheme client

content switch SYN(CSEQ) ACK(SSEQ + 1) step1 SYN(SSEQ) ACK(CSEQ+1) DATA(CSEQ+1) ACK(SSEQ+1) DATA(SSEQ+1) ACK(CSEQ+LenR+1) ACK(SSEQ+lenD+1) step2 step3 Pre-allocated server SYN(CSEQ) SYN(SSEQ) ACK(CSEQ+1) ACK(SSEQ+1)

step4 DATA(CSEQ+1) ACK(SSEQ+1) step5 DATA(SSEQ+1) step6 ACK(CSEQ+lenR+1) ACK(SSEQ+lenD+1) Guess routing decision based on IP/Port#/History . Advantage: Faster than TCP delay binding. Possible direct route between client and server 4/11/2003 Reduce session processing Edward Chowoverhead Content Switch 52 no need to convert server sequence # Degenerated to TCP Delayed Binding If Guess is Wrong Pre-allocated client SYN(CSEQ)

content switch step1 SYN(SSEQ)/ ACK(CSEQ+1) ACK(SSEQ + 1) DATA(CSEQ+1)/ ACK(SSEQ+1) Server sent HTTP 404 step4 DATA(CSEQ+1)/ACK(SSEQ+1) step5 DATA(SSEQ+1) FIN(CSEQ+lenR+1)) Right server SYN(CSEQ) SYN(RSEQ)/ ACK(CSEQ+1) step7 step8 ACK(SSEQ+lenD+1 4/11/2003 step11 server

step2 SYN(SSEQ)/ ACK(CSEQ+1) step3 ACK(SSEQ+1) step6 Sequence # step9 conversion needed for right server now step10 DATA(SSEQ+1)/ACK(CSEQ+LenR+1) SYN(CSEQ) ACK(RSEQ+1) DATA(CSEQ+1)/ACK(RSEQ+1) DATA(RSEQ+1)/ACK(CSEQ+lenR+1) step12 ACK(RSEQ+lenD+1) Edward Chow Content Switch 53 Filter Process Scheme client SYN(CSEQ)

content switch step1 SYN(DSEQ)/ACK(CSEQ+1) step2 ACK(DSEQ+1) Filter Process run on server step3 DATA(CSEQ+1)/ACK(DSEQ+1) step5 a step4 step5b SYN(CSEQ) Migrate (Data, CSEQ, DSEQ) SYN(SSEQ)/ ACK(CSEQ+1) step6 step7 step8

DATA(DSEQ+1) ACK(CSEQ+LenR+1) ACK(DSEQ+lenD+1) 4/11/2003 server step9 step10 Edward Chow ACK(SSEQ+1) DATA(CSEQ+1)/ACK(SSEQ+1) DATA(SSEQ+1) ACK(CSEQ+lenR+1) ACK(SSEQ+lenD+1) Content Switch 54 Pre-allocate performance plot micro seco n d s Plot of response time vs document size

Series 1 - Basic scheme with no rule matching module inserted, i.e., using default IPVS. 500000 480000 460000 440000 420000 400000 380000 360000 340000 320000 300000 280000 260000 240000 220000 200000 180000 160000 140000 120000 100000 80000

60000 40000 20000 0 Series1 Series2 Series 2 - Basic scheme with the rule matching module inserted. Series3 Series4 0 10000 20000 30000 40000 bytes

Figure 3. Performance of Pre-allocate Server Scheme 4/11/2003 Edward Chow Series 3 - Pre-allocate scheme with all hits, i.e., where all preallocate guesses were correct. Series 4 - Pre-allocate scheme with all misses, i.e., where all pre-allocate guesses were wrong. Content Switch 55 Handling multiple requests in a Keep-Alive connection Determine when new request arrives Verify that previous request has been completely received Request data size is > 0 Key assumption is only one outstanding request is sent at a time by client, i.e., requests are not pipelined Reuse connections Store each connection control information in a hash table keyed by real server address, once it is established.

4/11/2003 Edward Chow Content Switch 56 Quiz Web server keeps the TCP connection alive, expecting the browser to return for images and in-line media files. How many keep-alive connections are setup on IE5 and Netscape 4.7 for web page with many .jpg/.gif images? Can these image requests be pipelined from client browser to web server? 4/11/2003 Edward Chow Content Switch 57 Multiple HTTP Requests from One TCP Connection NAT approach if g

. cs c u Content Switch client Index.htm cs.jpg ro ck y .m id server1 server2 . . . server9 A keep alive TCP connection may include multiple HTTP GET requests. Content Switch examines each GET request and makes new routing decision. Content Switch establishes another connection with a different server based

on the routing decision. Those HTTP responses from different servers need to be interleaved and seen by the user as if from the same server. Solutions: In order delivery (buffer requirement); Out of order delivery (seq# tracking)? Problems: Should we throw away earlier html requests if receive later requests? 4/11/2003 Edward Chow Content Switch 58 Multiple HTTP Requests from One TCP Connection j pg . s ucc Content Switch client cs . gi

server1 roc ky. mid server2 . . . server9 f Can servers return documents directly to client in keep-alive session case? Can equivalent VS-Tunnel or VS-DR be implemented using Content Switch? 4/11/2003 Edward Chow Content Switch 59

Content Switch Rule Survey Survey shows that existing switches support rules in basic (condition action) or (action condition) form some define condition as class, then specify the action in separate statement or command simple single conditional term command line interface (to facilitate incremental update?) Actions can include reject, forward, put in queue (for bandwidth control, scheduling) 4/11/2003 Edward Chow Content Switch 60 Content Switch Rule Design Rule syntax generic to support all Intended features. Use simple C if statement syntax rule: if (condition) { action } Easy to read Allow optimization using c compiler Condition consists of multiple terms of variable relational_operator value e.g. xml.purchase/totalAmount > 50000 smtp.to == [email protected] cookie.name == servlet1

bitmatch(64, 8, 0xff) == 64 # above mean TTL=64 idea from netfilter universal filter suffix(variable, string) e.g. suffix(url, gif) regex(variable, pattern) e.g. regex(url, /purchase) Action consists of reject, forward(server| queue) loadBalance(serverGroup, loadBalancingAlgorihtm) 4/11/2003 Edward Chow Content Switch 61 Efficient CS Rule Matching Brute force, strict priority: Rules are executed in sequential manner. Efficient Rule Matching Method: Organize Rules so that rules can be skipped based on existing content types. Utilize compiler optimization technique. 4/11/2003 Edward Chow Content Switch 62

Simple CS Rule Editor GUI 4/11/2003 Edward Chow Content Switch 63 Conflict Detection on Content Switching Rules Detect conflicts among rules or rule set. Absolute conflict type: r1: if (xml.purchase/customerName == CCL) {routeTo(r1)} r2: if (xml.purchase/customerName == CCL) {routeTo(r2)} Potential conflict type: r1: if (xml.purchase/totalAmount > 5000) {routeTo(quickServers)} r2: if (xml.purchase/totalAmount >20000) {routeTo(superServers)} Algorithm: Build tree with the same variable, check operator and value to see if they are the same or lead to potential conflict, compare actions to decide conflict type or duplication. Developed conflict detection algorithm for rules with multiple term condition. Can be applied to policy-based rules conflict detection. Editor can build these trees while a user enters rules and warns about conflict right away.

4/11/2003 Edward Chow Content Switch 64 XML Tag Value Extraction A xmlContentExtract() is built to extract the tag values of a list of unique tag sequences. It is based on clark coopers expat 1.0 xmlparser. Its argument include the pointer to an XML document, the pointer to the array of strings (unique xml tag squences we follow the xsl selector syntax), and the number of sequences. It return the list of a structure node, with the tag sequence, its attribute, and its value. Currently, it supports one attribute and tag sequece needs to be unique. 4/11/2003 Edward Chow Content Switch 65 Persistence Handling in LVS Some network applications require packets from same

users/sessions be routed to same real servers. For consistent treatment? For fast performance, e.g. servers maintain persistent data/info for sessions Tomcat web server returns cookie value so that return client requests can be routed to the same Tomcat web server. But cookie value is in HTTP header, a Layer 7 info. Layer 4 switch cannot access it. This is so called persistence handling problem. One solution: Sticky connection. Same IP address served by same server. 4/11/2003 Edward Chow Content Switch 66 Persistent handling Problems FTP Case: Normally FTP uses port 21 for control, port 20 for data. But for passive FTP, the server tells the clients the port that it listens to. The client initiates the data connection connecting to that port. For the LVS/TUN and LVS/DR, LinuxDirector is only on the clientto-server half of the connection, so it is impossible for LinuxDirector to get the data port from the packet that goes to the client directly. SSL Session Case: port 443 for secure Web servers and port 465 for secure mail server,

key for connection must be chosen/exchanged and only the initial real server has the key. Persistent or sticky connection is needed. 4/11/2003 Edward Chow Content Switch 67 Persistent Connection Solution When the client first accesses the service, LinuxDirector creates a template between the given client and the selected server, then create an entry for the connection in the hash table. The connections for any port from the client will send to the server before the template expires. The template expires in a configurable time, and the template won't expire until all its connections expire. The timeout of persistent templates can be configured by users, and the default is 300 seconds 4/11/2003 Edward Chow Content Switch 68

Problems Encountered in The Design of Linux-based Content Switch Handle a Request Contained in Multiple Packets Handle Different Data Encoded Methods Allow Referencing Specific XML Tags Handle Long Transactions in SSL and Email network services 4/11/2003 Edward Chow Content Switch 69 Handle a Request Contained in Multiple Packets For a long request, its headers and content will be carried by the multiple packets due to packet size limitation. We have observed Netscape 4.7 spliting a short request <1000 into two packets

Due to interleaving with other sessions, packets of the same session may not be allocated consecutive memory. Even packets of the same session arrives without interleaved with packets of other sessions, application level data will be fragmented in kernel packet buffer such as skbuf. Matching application data pattern in the kernel is tricky. 4/11/2003 Edward Chow Content Switch 70 Example: Determine Content Length TCP Segment n contains: POST /cgi-bin/cs622/purchase.pl HTTP/1.0\r\n Referer: http://archie.uccs.edu/~acsd/lcs/xmldemo.html\r\n Connection: Keep-Alive\r\n User-Agent: Mozilla/4.75 [en] (X11; U; Linux 2.2.16-22enterprise i686) \r\n Host: viva.uccs.edu\r\n Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, image/png, */*\r\n Accept-Encoding: gzip\r\n Accept-Language: en\r\n Accept-Charset: iso-8859-1,*,utf-8\r\n Content-type: application/x-www-form-urlencoded\r\n Content-length: 7

TCP Segment n+1 contains: 53\r\n data (753 bytes) 4/11/2003 Edward Chow Content Switch 71 Potential Solutions Allocate application data of a session in the consecutive memory Major rework on most kernel packet buffer allocation scheme. Use carry lookahead memory hardware. Coding complicated pattern matching code that can match pattern over fragmented data. Use application level content switching bear the overhead of data copying from kernel to application level. 4/11/2003 Edward Chow Content Switch 72 Handle Different Data Encoding Methods

XML data can be passed in plain/text. When submitting it with form, the XML request data are encoded using the x-www-form-urlencoding method When extracting XML data for rule matching, different data encoding methods need to be detected through the content-type header. 4/11/2003 Edward Chow Content Switch 73 An E-Commerce XML Example Client submits via HTTP/Post (or SOAP) the following purchase in XML: CCL 111222333 309121544 IBM Thinkpad T21 5000 10 50000 309121538 Intel wireless LAN PC Card

200 10 2000 52000 4/11/2003 Edward Chow Content Switch 74 Allow Referencing Specific XML Tags An ambiguous XML tag sequence specification can match multiple instances. To avoid that and to speed up the matching, we propose the use of XML tag sequence specification that enables us to specify the specific XML tag sequence. For example, To specify a rule based on subTotal value present in the second item tag within the first purchase tag, the condition of the rule will be specified as purchase:1.item:2.subTotal > 5000. As another example, purchase:2.totalAmount < 15000 specifies the condition of a rule based on the totalAmount tag present within the second purchase tag.

4/11/2003 Edward Chow Content Switch 75 Handle Long Transactions in SSL and Email network services some of the packet processing functions are better handled at the application level. For example, there are a lot of packages, including McAfees uvscan and AMAVis scanmail, mutt (recombine email component), for detecting and removing email virus, but almost all of them are implemented in application level and interact with the sendmail program. It will require significant effort to rewrite them as kernel modules. Same observations were derived on SSL processing. 4/11/2003 Edward Chow Content Switch 76 Web Switching/SSL processing overhead and Performance differences btw Prefork and Dynamic fork Overall WebBench Requests/Second

300.000 Request Per Second Prefork NonSSLProxy Requests / Second 250.000 Request Per Second Dynamic NonSSLProxy 200.000 150.000 Request Per Second Apache NonSSL 100.000 50.000 Request Per Second Dynamic SSLProxy 0.000 c 1_

nt li e c 8_ nt li e _c 6 1 nt li e _c 4 2 nt li e _c 2 3

nt li e _c 0 4 nt lie _c 8 4 nt li e _c 6 5 Clients

nt lie Request Per Second Prefork SSLProxy Request Per Second Apache SSL Significant SSL processing overhead. 240 req/sec vs. 38 req/sec Content switching processing overhead may reduce the performance to lower than single web server. What we gain here? How we can improve it? 4/11/2003 Edward Chow Content Switch 77 IXP1200-based Content Switch We have ported OpenSSL and our Linux Secure Web System to run on IXP12EB with VxWork. Using WindRivers Tornado II IDE. Preliminary version run purely on StrongArm core. Currently working on offload header extraction and rule matching code to run as hardware threads on microengines.

4/11/2003 Edward Chow Content Switch 78 Intel IXP1200 NP and IXP12EB The IXP 1200 Network Processor The IXP12EB Evaluation Board: PCI form factor board based on IXP1200 Network Processor eight 10/100 Mbps ports two Gigabit Ethernet ports PCI back-plane and an Ethernet Network Interface Card (NIC) 4/11/2003 Edward Chow Content Switch 79 IXP 1200 Network Processor

4/11/2003 Edward Chow Content Switch 80 Packets Receiving & Transmitting 4/11/2003 Edward Chow Content Switch 81 Agere Network Processor The following figures are from Douglas Comers new text Network System Design using Network Processors 4/11/2003 Edward Chow Content Switch 82 Ageres FPP

4/11/2003 Edward Chow Content Switch 83 Ageres RSP 4/11/2003 Edward Chow Content Switch 84 Alchemys Au1000 4/11/2003 Edward Chow Content Switch 85 Applied Micro Circuit Corp nP7510

4/11/2003 Edward Chow Content Switch 86 Cisco Parallel eXpress Forwarding (PXF) 4/11/2003 Edward Chow Content Switch 87 Cognigines Reconfigurable Communication Unit (RCU) 4/11/2003 Edward Chow Content Switch 88 EZChip NP-1

4/11/2003 Edward Chow Content Switch 89 IBM PowerNP 4/11/2003 Edward Chow Content Switch 90 IBM NP Embeded Processor Complex 4/11/2003 Edward Chow Content Switch 91

Motorolas C-Port 4/11/2003 Edward Chow Content Switch 92 Motorola Single CP 4/11/2003 Edward Chow Content Switch 93 Packet Flow and IXP2400 4/11/2003 Edward Chow Content Switch 94 Intel

IXP2400 4/11/2003 Edward Chow Content Switch 95 HA-LVS Configuration High Available Client CIP MON Internet 1. When Backup Director detects Linux Director failure through heart beat protocol, graciously negotiate the take-over of VIP Provide fault-tolerant 4/11/2003 Linux Director

Heart Beat Real Server1 Real Server2 Real Backup Server3 Director2. Monitor server processes run on real servers Route requests to server processes that are alive. Initiate restart/repair MON Edward Chow Content Switch 96 High Available Web Server MON Cluster Real

Client CIP Web Switch1 Internet Server1 Heart Beat Real Server2 1. Web Switch detects MON Real the failure of other web Web Server3 switch Switch2 Take over the processing of routing

2. Web switch monitors server processes run on real servers. request. When they die, 4/11/2003 route requests to server processes that are alive. Rewrite web switching rule. Initiate restart/repair Edward Chow Content Switch 97 Status of UCCS ACSD Project

Two versions of Linux Kernel -based LCS content switch, LCS01, LCS02 were developed. A Linux Application level secure web switch (LSWS) was developed using OpenSSL package. LSWS is ported to run on Intel IXP12EB and IXP1200 network processor with Windriver VxWork. Part of the above research projects are sponsored by CCL/ITRI. Based on Linux-2.2.16-3, current release LCS02. Being ported to Linux-2.4.18 and integrated with KTCPVS. ip_forward.c, ip_masq.c, ip_vs.c are modified to implement basic TCP delay binding. ip_cs.c are added for most of the content switching functions with http header extraction and xml content extraction. A simple Java-based ruleEdit program was created for rule editing and conflict detection. A C-based program can detect conflicts among rules with regular expression in their condition expression. Rule translate program to convert the rule set into a Linux kernel module and allow dynamic replacement of rule without restarting the system. Currently working on integrating KTCPVS and provide unified configuration/monitor command 4/11/2003 Edward Chow Content Switch 98

LCS Demo We set up viva.uccs.edu as a content switch and wait and ace as two real servers. URL Switching demo: http://viva.uccs.edu/~lcs1/ route to ace.uccs.edu http://viva.uccs.edu/~lcs2/ route to wait.uccs.edu XML Web Switching (E-commerce applications) http://archie.uccs.edu/~acsd/lcs/xmldemo.html When the 2nd subtotal tag >=50000, route to ace. When the 2nd subtotal tag <50000, route to wait. Let us know if you have problem accessing them. My students may be working on LCS extension. 4/11/2003 Edward Chow Content Switch 99 LCS Rule Example R4: if (atoi(rule_fields[1].value) >= 50000) { return route_to("ace", NON_STICKY, saddr); } R5: if ((atoi(rule_fields[1].value) > 0) && (atoi(rule_fields[1].value) < 50000)){ IP_RULE_MSG("serevr=wait\n"); return route_to("wait", NON_STICKY, saddr); }

R10: if (strstr(url, "lcs1") != NULL) { IP_RULE_MSG("server=ace\n"); return route_to("ace", NON_STICKY, saddr); } R11: if(strstr(url, "lcs2") != NULL){ IP_RULE_MSG("server=wait\n"); return route_to("wait", NON_STICKY, saddr); } 4/11/2003 Edward Chow Content Switch 100 Intel 7280 Demo http://cs.uccs.edu/~chow/pub/master/ycai/doc/csdemo.html 4/11/2003 Edward Chow Content Switch 101 Related Load Balancing

Research Results Modified Apache status module to report Total bytes to be transferred by child processes Average document transfer speed Modified LB-DNS to receive server status and bandwidth probing results. LB-DNS returns IP-address of the best server based a weight contributed by both server load and bandwidth. Modified WebStone benchmark to test the performance of load balancing web server clusters. 4/11/2003 Edward Chow Content Switch 102 Load balancing Systems Bandwidth Probe Results Statistics Gathering Daemon Modified Web Server 1 Server Delay

Server Ranking /tmp/StatFile Modified Web Server n 4/11/2003 LBA: Modified DNS Edward Chow Request for Web pages Content Switch 103 Connection Rate: LBA vs. Round-Robin Server connection rate for 4 servers 1000 Connections/sec 800 600 400 200 0 1

2 3 4 5 6 7 8 9 10 11 12 load balancing system 418.2 656.6 907.9 420 636.7 322.6 711.6 420.5 638.3 670.6 683.4 899 327.6 327.6 327.6 327.6 327.6 327.6 327.6 327.6 327.6 327.6 327.6 327.6 round-robin

Round robin only run once 4/11/2003 Update for LBA , per sec load balancing system Edward Chow round-robin Content Switch 104 Conclusion Content Delivery Network improves internet content retrieval LVS provides a low cost layer 4 switching service for cluster. Linux Content Switch with generic rules can be easily configured for wide-variety of value-added services: Premium services

Load balancing/High Available server farm. Firewall Bandwidth control/Traffic shaping Require efficient SW/HW architecture and rule matching algorithms to reduce processing overhead. Content rule design/conflict detection are important and challenging. TCP delay binding can be improved. 4/11/2003 Edward Chow Content Switch 105 References

http://www.linuxvirtualserver.org/ http://www.akamai.com/ http://cs.uccs.edu/~chow/pub/contentsw/talk/contentswitching.ppt [Aron2000] Aron, Mohit, Differential and predictable QoS in web server systems, Ph.D dissertation Rice University, Oct. 2000. [Zhang97] Lixia Zhang, Sally Floyd, and Van Jacobson, Adaptive Web Caching, April 25, 1997. http://www-nrg.ee.lbl.gov/floyd/web.html [Esi2001] Edge Side Includes, http://www.esi.org/. [Chow2001a] C. Edward Chow and Indira Semwal, Web Load Balancing Through More Accurate Server Report, Proceeding of PDCAT 2001, Taipei, Taiwan. [Chow2001b] C. Edward Chow, Ganesh Godavari, and Jianhua Xie, Content Switch Rules and their Conflict Detection, Proceeding of PDCAT 2001, Taipei, Taiwan. [Chow2001c] C. Edward Chow and Weihong Wang, The Design and Implementation of Linux LVS-based Content Switch, Proceeding of PDCAT 2001, Taipei, Taiwan. [Aversa2000] Luis Aversa and Azer Bestavros, Load Balancing a Cluster of Web Servers: Using Distributed Packet Rewriting, Proceedings of IPCCC 2000. [Cao98] PeiCao, Jin Zhang and Kevin Beach, Active Cache: Caching Dynamic Contents on the Web http://www.cs.wisc.edu/~cao/papers/active-cache.ps 4/11/2003 Edward Chow Content Switch 106

Recently Viewed Presentations

  • www.angelfire.com

    www.angelfire.com

    "Then the eyes of both of them were opened, and they realized they were naked; so they sewed fig leaves together and made coverings for themselves… The Lord God made garments of skin for Adam and his wife and clothed...
  • Report from the Frontier - minnstate.edu

    Report from the Frontier - minnstate.edu

    A Virtual Multiplier Effect. Students transfer directly from 2-year partner to UCF Online Programs. New markets, but not locations; current Center staff provide online support. 8 pending or signed MOUs with other state colleges. Slide Subtitle - Verdana 12pt Regular,...
  • Cal State Northridge Risk &amp; Insurance &quot;Professor for a Day ...

    Cal State Northridge Risk & Insurance "Professor for a Day ...

    Changing Face of Global Risk. Misunderstanding is a fatal mistake. Magnitude. of risk is increasing. Complexity. of risk is going up. Scrutiny. of risk is on the rise . Risk Solutions. are 3 parts Opportunity and 1 part downside protection....
  • Chapter 1: Matter and Measurement - University of Winnipeg

    Chapter 1: Matter and Measurement - University of Winnipeg

    16-6 Polyprotic Acids Phosphoric acid: A triprotic acid. Ka = 7.1 10-3 H3PO4 + H2O H3O+ + H2PO4-Ka = 6.3 10-8 H2PO4- + H2O H3O+ + HPO42-
  • Southwest Asias VOLUNTARY TRADE & OPEC Presentation, Graphic

    Southwest Asias VOLUNTARY TRADE & OPEC Presentation, Graphic

    Natural trade barriers include _____ , deserts, rainforests, or lack of access to bodies of water. Afghanistan is a _____ , so trade is difficult because it does not have ports to ship goods overseas. Political trade barriers are policies...
  • AUSTRALIAN CENTRE FOR PUBLIC COMMUNICATION E-electioneering and E-democracy

    AUSTRALIAN CENTRE FOR PUBLIC COMMUNICATION E-electioneering and E-democracy

    E-electioneering and E-democracy (Government 2.0) in Australia Studies of online citizen consultation and social media in the 2010 Australian federal election
  • Remains - BA English Revision

    Remains - BA English Revision

    Context. Simon Armitage's poetry is known for its colloquial style, strong rhythms and voice. He often uses the monologue form in his poetry to create immediate and moving characters. The reference to 'desert sand' suggests this poem is set in...
  • Numberblocks

    Numberblocks

    Maths in the Episode. Counting to 1. Saying 'one' when there is one thing to be counted. Concept of 'oneness' One is a single unit. If one is the quantity then what is being counted is arbitrary, and the size...