Cafe au Lait News Friday, July 27, 2007

Friday kicks off with an information-packed keynote from two of eBay's architects, Randy Shoup and Dan Pritchett. I can't type fast enough to keep up with them. I cheated and talked to Randy earlier in the Speaker's lounge. They have a few petabytes of data going back to the site's founding, split across a couple of dozen Oracle databases. They use a custom object-relational mapping layer to manage queries across independent databases. (Please open source that. I need it.)

Dan starts with the database side:

Scale out, not up.
They also do functional decomposition: separate selling from searching from bidding, etc.
Prefer asynchronous integration.
Design for failure

They're on version 3 of the architecture. eBay started with commodity Fry's hardware, Perl, and open source software, every item in a separate file. Limited to 50,000 files. Now they use 4-way Solaris 2900 boxes, Java, and Oracle. Power and cooling is the biggest current issue.

Version 2 was C++ in IIS. Database was moved to Oracle. Microsoft Index Server was introduced for search. Used Resonate for load balancing and spread horizontally. Still a single database though. Failover database introduced here. November 1999 they're down for a day and on CNN. This motivated splitting the databases.

December 2002 current DB architecture introduced, many different databases. Application is out of control though. It's a two-tiered architecture. 2 million+ lines of code in a single ISAPI DLL. Hundreds of developers are working on this. Integration nightmare. This motivates a move to J2EE and Java. Presentation logic was moved out of C++ into XSL! Introduced developer API.

Version 3 is Java.

Database segmentation by function: user hosts, item hosts, account hosts, feedback hosts, transaction hosts, and 70 more. Different functions have different availability and other requirements. Also helps enforce functional isolation and prevents coupling. Quotes Eric Billingsley: "If you can't split it, you can't scale it." Dan: "At eBay.com, vertical scaling is just not in our vocabulary."
Horizontal splits within functions: multiple instances per database. Write master/read slaves.
Limit the database work.
No transactions. They use autocommit. They sometimes do transaction logic for a single database within the application Java code using Oracle specific features. They do recover from failed operations. Listing an item updates 73 tables. There's no reasonable way to do this in a transaction.
"Not a single stored procedure. Not one." but they do have simple triggers. They do try to keep logic out of the database.
Performance: CPU intensive work goes into the applications, including referential integrity checking! They do use unique key constraints, but that's it. They make extensive use of prepared statements and bind variables. They do joins, but not big ones, and mostly along primary key. They don't use ordering.

The horizontal splitting of the databases is transparent to the application developers. They don't need to know how many physical servers they're talking to. However, 15,000 app servers at four DB connections per server. Connection management is a problem. Oracle can't handle 60,000 connections. They have to cheat to handle this.

Different rows have very different characteristics; e.g. a user with one feedback and a user with half a million feedbacks.

Amazon has the exact same rules. Brewer's CAP theorem: Pick 2 out of 3: consistency, availability, scalability (partition tolerance). They and Amazon discovered this empirically.

Now Randy starts talking about scaling the application tier (as opposed to the database tier).

Segmentation by function.
Application tier is stateless, which enables horizontal scaling.
Minimize dependencies between search, selling, etc.
Virtualize data access
Throw out most of J2EE. Use servlets, not much else.
Keep application server completely stateless. No session state. Transient state maintained in a cookie or another database.
Cache where possible.

They'd like to put more than 4K of data in a cookie, and they believe they can't do multipage flows without cookies. (They're wrong about that one, by the way.) They're in process of inventing server-side cookies to get around this. Now if they just threw out the cookies and stored the ID of the server side record in the URL, they'd be RESTful. They did have some scalability issues where the 4K cookies came back with each of 70 requests for 70 images on one page. They had to reorganize to avoid that. They do try to avoid session state, but they haven't achieved stateless nirvana yet (my opinion, not theirs).

The 4K limit in cookies is a surprise. I didn't know about that. URLs can hold more than 4K. I'm about to run out of battery power. I think I got most of the critical details down anyway.

Java News from Friday, July 27, 2007