CS105 Data Analysis Methods

Instructors

Mariam Salloum (prof)

  • Email: msalloum [at] cs [dot] ucr [dot] edu

  • Office: Bourns A (Room 159B)

  • Office Hours: WF 1-2 and by appointment

Al Amin Hossain (TA)

  • Email: ahoss005 [at] ucr [dot] edu

  • Office Hours: TBD

Announcements

  • 1/6 Website online!!

Course Description

Introduction to data analysis methods, including data statistics, simple data storage types, data acquisition from the web and public APIs, data cleaning, crowdsourcing for data collection and cleaning, supervised and unsupervised learning techniques, and data visualization. The laboratory will also include hands-on exercises on the aforementioned topics in Python.

Course Logistics

iLearn

  • Will be used to post grades

Google Drive

Campus Wire

  • CampusWire will be used for discussions- announcements. Questions relating to lecture or assignment should be posted to discussion board, not emailed to teachers, so any teacher/student can respond and fellow students benefit from answers.

  • Invite Link : https:campuswire.compG62E9D77F Code : 4385

Textbooks

Due to the rapidly evolving nature of the material, there is no single textbook that covers the course in its entirety. We provide some indicative textbooks below, however the class notes will also be self-contained,and pertinent references to resources will be provided throughout the course.The following textbook covers fundamental concepts for ‘dealing with data.
o

Please note, to access the following books you must either be on UCR campus or connected to the VPN.

  • (DS4Business) Data Science for Business: What You Need to Know about Data Mining and Data-Analytic Thinking
    Foster Provost, Tom Fawcett
    Available online via UCR Library : LINK

  • (BD in Practice) Big data in practice : how 45 successful companies used big data analytics to deliver extraordinary results
    Bernard Mar, 2016
    Available online via UCR Library LINK

  • (Doing DS) Doing Data Science
    Cathy O'Neil; Rachel Schutt, 2013
    Avai;able online via UCR Library LINK

Grade Breakdown

Grades will be weighted as follows:

Item Percentage
Labs 40%
Midterms (x2) 35% (15% each and 5% to highest midterm)
Project 25%
  • Labs: This course is designed to be a hands-on learning experience. I believe that students learn better by doing. As part of this philosophy, there will be a series of labs to enforce the concepts covered in lecture.

  • Midterms: There will be two written in-class midterms during the quarter. Both midterms are closed book/notes. There will no makeup exams unless you let me know of any conflicts ahead of time and bring a doctor’s note. We will have an in-class review the Tuesday before the midterm. I will usually handout a study guide the week before the midterm.

    • Midterm 1 on Feb. 3rd

    • Midterm 2 on March 6th

  • Final Project: For the final project, you will be asked to work in teams of 2 (or indivually) on a data science project of your choosing. You will be asked to 1) select a data set and try to answer a hyposthesis 2) perform data cleaning 3) perform data exporation 4) perform data analysis 5) relay your results through visualizations.

Academic Integrity

Academic integrity is fundamentally about ethical behavior. Appropriate collaboration and research of previous work is an important part of the learning process. However, not all collaboration or use of existing work is ethical. The overarching principles which should guide you when determining whether or not it is appropriate to use a source or collaborate with a classmate involve answering these questions: Does this fit within the spirit of the assignment/activity?

In any ethical decision there is always judgment involved. Some assignments and activities involve collaborating with a team, in others you are asked to work individually. You are expected to have some common sense and to use it.

Does this help me or someone else in the class to improve our skills and/or understanding of class material?

As a guiding principle, talking about concepts is usually good, talking about specific answers or approaches to problems is usually not.

Does this misrepresent my own (or someone else's) capabilities and understanding of materials for the purpose of grading?

Attribution of sources is a key idea here; if you use work which is not your own, that work should be cited. For this class, citation is not required to be in a specific format, but any citation should clearly identify the author and source of any work which is not your own. Refer to the university policy on plagiarism and cheating.

Have any specific instructions been given for this assignment?

Not all assignments are the same. On some you will be given explicit instructions about what level of collaboration is appropriate, and you are expected to abide by those restrictions even if you disagree with them.

If you are at all uncertain about an action, whether it be working with another student, researching existing code, or something else, you are always welcome to ask the instructor for clarification.

The severity of sanctions imposed for an academic integrity violation will depend on the severity of the transgression and ascertained intent of the student. Penalties may range from failing the assignment to failing the course. Again, actions will adhere to the Academic Honesty policies of BCOE and UCR.