CS179G - Fall 2004 - Database Project
Team Members
Course Material
Project Timeline
Complete by Oct 10
- Course website layout and design
- Subversion repository (privately hosted)
- Weekly code archive (privately hosted)
- Preliminary specification
- Visit professor
- Timeline
Weblog
Sep-29-04
We discussed using floating point numbers, or liberally spaced integers
for our durable numbering scheme. I pointed out that counting a node
twice is inneficient because it cuts the numbers available in half and any
numbering scheme will guarantee that your children have a higher number
than their parent, and a lower number than the parents next sibling. By
not counting nodes twice you save space, and the child relationship can be
captured by:
isChild = parent < node < parent->next
We setup a schedule of mandatory meetings on Tuesdays and Thursdays, with
flexibly Mondays and Wednesdays to take care of other classes. We agreed
read up on durable numbers and meet on Monday, Oct-4.
Oct-4-04
We met during the TAs lab hours and listened to his lecture. We also
discussed an alternative numbering scheme that I was thinking about which
uses a bitmask much like subnet masking for networking. In essence, the
high order bits indicate the parent id, the middle bits indicate the
current node id, and the low order bits are used for children. This would
allow parent/child tests to be performed as AND operations and eliminate
the need for double counting or range checking. There are some
disadvantages to this scheme that may make it useless.
We reviewed the xml document, the log file, and the example source code.
We asked the TA some questions regarding the log description format (we
could not read the doc file in any of the programs installed in lab) and
got a pretty clear idea on how to parse the xpath elements and count
matching nodes. We examined the expat parser library and wrote out some
pseudo code to process the xml doc. Everyone agreed it would be simpler
that we should have seperate functions to decompose an xml document,
analyze the decomposition for numbering, and parse the decomposition for
xpath queries. The alternatives were to combine parsing in number and
queries, but would lead to duplicate code and create potential problems.
Oct-5-04
Oben looked into xml parsing more and found that libxml not only handles
xml easier than expat, it also has some xpath support. While we may not
be able to use the xpath routines for our database, easier xml parsing is
welcome.
We discussed many numbering schemes. Mike suggested weighting the spacing
assigned to branches based on the how many children were in each branch,
this would give branches with more children extra space compared to
branches with fewer nodes. Evalyn suggested using strings like software
versions (such as 1.1.1, 1.1.2). Oben thought the weighted spacing might
cause problems for frequently used tags. Mike had thought of this and
suggested that the number of children be averaged across all nodes of the
same tag type.
The team decided the previous ideas were too difficult or radical, and
the two ideas today were worth the most attention. We weighed the pros
and cons of each method and decided that weighted spans would be the
fastest but suffer from limited space, whereas string versions could grow
extremely large but be slow to process. Oben suggested creating a
combination of the two, and Mike suggested using weighted spans for
speed, but using overflow buckets to expand the space as needed. We all
agreed to implement the weighted span first, and worry about overflow
buckets later.
The TA informed us that the xml parser, database storage, numbering
system, and insertion and deletion had to be ready by Monday. Nowhere on
the website does it say this, previous labs did not mention this, and the
teacher did not mention this. We broke the immediate requirements into
three parts: xml analysis and numbering (Mike), database storage (Oben),
and update processing (Evalyn). We drew up some diagrams and flow charts,
worked out some pseudo code, and sent everyone home to work on their
respective parts.