CS179M: Project in Artificial Intelligence - Winter 2008
Basic Information
Lecture: F 2:10pm - 3:00pm OLMH 1126
Lab: M 6:10pm - 9:00pm ENGR2 135
Instructor: Dimitrios Gunopulos
TA: Shashwati Kasetty
Instructor office hours: TBA
TA office hours: M 2:00pm - 4:00pm ENGR2 368
Overview
The goal of this course is to give each student an understanding of
how to use different machine learning algorithms to solve regression
problems. The project for this course will be done in groups. There
will be two parts to the project. The first part will involve fine-tuning common regression algorithms to solve a
real-world problem as accurately as possible. The second part will
involve creating a web interface in which users can enter queries
about the data. The best projects will include thorough testing,
results and explanations for the algorithms tested in part one of the
project, as well as an easy to use and working web interface for part two.
Presentation and Report
Binder Requirements:
- Proposal Document
- User Manual (with screen shots if helpful)
- Document that contains descriptions of the different techniques used, the
development process, any packages used, and any code developed from
scratch.
- Report containing experiments that show accuracies the user can
expect.
- A CD containing all the code that is part of your project.
Presentation Guidelines:
- Overview of tools used.
- Issues that had to be addressed.
- A demo of the application if expected.
Project Description and Guidelines
Hosting you web application online:
- The database is a PostgreSQL database hosted on eurydice
(localhost) with your database name being cs179m-#, same login and
password for database as for the eurydice server.
- You can use PHP or Python.
- Your app must be hosted on your http://cs179m-#.cs.ucr.edu web space.
- Your regression code must run live online (we will narrow down the
data to search over by entering parameters on your form for your SQL
query).
- It's easier in some cases to write your own regression
algorithm. You can use available code as long as you cite all sources
clearly.
- Remember to use feature selection and instance elimination. You
may want to do cross-validation to avoid overfitting. Use light-weight
methods for all of these otherwise your algorithm will run forever.
Your web application needs to support the following operations:
- The form must list all attributes and have options for value
ranges in the form of checkboxes or
drop-down menus (whichever you find easier). For some fields it's more
appropriate to have text boxes.
- The user will select the values they want, and hit a submit/go
button. Then your app must run a database query that returns the
new table/dataset which will usually be smaller.
- Then the user will enter parameters for a particular house they
are searching for, and you must run regression on the narrowed down
results from the SQL query above to figure out the sale price for that
house.
Download the datasets here.
Download only the files associated with the Sale Price Prediction task.
Grading
The grading will be based primarily on the final report and
presentation. You will also be graded on other documents submitted
throughout the quarter, such as the project plan due in Week 4.
Resources
- Weka -
This package contains several machine learning algorithms for
classification, clustering and other tasks.
- Netlab -
This is a toolbox for Matlab that contains several machine learning
algorithms.
- SVM Light -
This package contains an implementation of Support Vector Machines.
- R - This is a
language and environment for statistical computing.