CS179G - Fall 2004 - Database Project

Team Members

Course Material

Project Timeline

Complete by Oct 10

Weblog

Sep-29-04

We discussed using floating point numbers, or liberally spaced integers for our durable numbering scheme. I pointed out that counting a node twice is inneficient because it cuts the numbers available in half and any numbering scheme will guarantee that your children have a higher number than their parent, and a lower number than the parents next sibling. By not counting nodes twice you save space, and the child relationship can be captured by:
isChild = parent < node < parent->next

We setup a schedule of mandatory meetings on Tuesdays and Thursdays, with flexibly Mondays and Wednesdays to take care of other classes. We agreed read up on durable numbers and meet on Monday, Oct-4.

Oct-4-04

We met during the TAs lab hours and listened to his lecture. We also discussed an alternative numbering scheme that I was thinking about which uses a bitmask much like subnet masking for networking. In essence, the high order bits indicate the parent id, the middle bits indicate the current node id, and the low order bits are used for children. This would allow parent/child tests to be performed as AND operations and eliminate the need for double counting or range checking. There are some disadvantages to this scheme that may make it useless.

We reviewed the xml document, the log file, and the example source code. We asked the TA some questions regarding the log description format (we could not read the doc file in any of the programs installed in lab) and got a pretty clear idea on how to parse the xpath elements and count matching nodes. We examined the expat parser library and wrote out some pseudo code to process the xml doc. Everyone agreed it would be simpler that we should have seperate functions to decompose an xml document, analyze the decomposition for numbering, and parse the decomposition for xpath queries. The alternatives were to combine parsing in number and queries, but would lead to duplicate code and create potential problems.

Oct-5-04

Oben looked into xml parsing more and found that libxml not only handles xml easier than expat, it also has some xpath support. While we may not be able to use the xpath routines for our database, easier xml parsing is welcome.

We discussed many numbering schemes. Mike suggested weighting the spacing assigned to branches based on the how many children were in each branch, this would give branches with more children extra space compared to branches with fewer nodes. Evalyn suggested using strings like software versions (such as 1.1.1, 1.1.2). Oben thought the weighted spacing might cause problems for frequently used tags. Mike had thought of this and suggested that the number of children be averaged across all nodes of the same tag type.

The team decided the previous ideas were too difficult or radical, and the two ideas today were worth the most attention. We weighed the pros and cons of each method and decided that weighted spans would be the fastest but suffer from limited space, whereas string versions could grow extremely large but be slow to process. Oben suggested creating a combination of the two, and Mike suggested using weighted spans for speed, but using overflow buckets to expand the space as needed. We all agreed to implement the weighted span first, and worry about overflow buckets later.

The TA informed us that the xml parser, database storage, numbering system, and insertion and deletion had to be ready by Monday. Nowhere on the website does it say this, previous labs did not mention this, and the teacher did not mention this. We broke the immediate requirements into three parts: xml analysis and numbering (Mike), database storage (Oben), and update processing (Evalyn). We drew up some diagrams and flow charts, worked out some pseudo code, and sent everyone home to work on their respective parts.