Skip to content

Course Syllabus

Course: MATH 3080

Division: Natural Science and Math
Department: Mathematics
Title: Foundations of Data Science

Semester Approved: Fall 2020
Five-Year Review Semester: Fall 2025
End Semester: Spring 2026

Catalog Description: Students will get an introduction to Python programming, data analysis tools, and the necessary statistics to acquire, clean, analyze, explore, and visualize data real-life data sets. Using statistics, students will learn to make data-driven inferences and decisions, and to communicate those results effectively.

Semesters Offered:
Credit/Time Requirement: Credit: 3; Lecture: 3; Lab: 0

Prerequisites: Math 2040 with a C or better and Math 1210 with a C or better.

Justification: Data collection and the analysis of data is ubiquitous and fast becoming a prerequisite to economic success for businesses. This course provides a subset of the tools necessary to leverage data for prediction. This course will support the bachelor’s in software engineering degree by providing relevant mathematics coursework.


Student Learning Outcomes:
Students will acquire data through we-scraping and data APIs.  Students will be assessed through assignments, quizzes, exams and/or class discussion – instructor will provide feedback.

Students will clean and reshape messy datasets. Students will be assessed through assignments, quizzes, exams and/or class discussion, and projects – instructor will provide feedback.

Students will learn to use statistical software to deploy statistical methods including generalized linear regression, cluster analysis, and classification. Students will be assessed through assignments, class projects, quizzes, exams and/or class discussion – instructor will provide feedback.

Students will apply dimensionality reduction and perform basic analysis of network data.  Students will be assessed through assignments, quizzes, exams and/or class discussion – instructor will provide feedback.

Students will evaluate outcomes, make decisions based on data, and effectively communicate those results. Students will be assessed through assignments, class projects, quizzes, exams and/or class discussion – instructor will provide feedback.

Students will understand and be able to apply the theoretical foundations underlying the methods applied throughout the course. Students will be assessed through assignments, class projects, quizzes, exams and/or class discussion – instructor will provide feedback.


Content:
This course will include introduction to data analysis tools in Python, descriptive statistics, data structures with Numpy & Pandas, introductory hypothesis testing & statistical inference, web scraping and data acquisition via APIs, generalized linear regression, classification methods including logistic regression; k-nearest neighbors; decision trees; support vector machines; and neural networks, data visualization, clustering methods, dimensionality reduction; including principle component analysis; network analysis; rating, ranking, and elections, cleaning and reformatting messy datasets using regular expression or dedicated tools such as open refine; natural language processing; ethics of big data.

This course supports an inclusive learning environment where diverse perspectives are recognized, respected and seen as a source of strength. The consideration of a diverse set of problems using real data will help support this goal.

Key Performance Indicators:
Student learning will be evaluated through:

Attendance / Participation 0 to 15%

Class Group Activities 10 to 15%

Computer Projects 20 to 50%

Quizzes 0 to 20%

Homework 5 to 25%

Midterm Exams / Tests 20 to 40%

Final Exam 15 to 35%


Representative Text and/or Supplies:
McKinney, W. (current edition). Python for data analysis: Data wrangling with pandas, NumPy, and IPython. Sebastopol, CA: O'Reilly Media.

Géron, A. (current edition). Hands-on machine learning with Scikit-Learn and TensorFlow: Concepts, tools, and techniques to build intelligent systems. Beijing; Boston; Farnham; Sebastopol; Tokyo: O'Reilly.

A computer and statistical software are required for this course. Free software such as Python or R are recommended, but subscription software (e.g., SAS, SPSS) may be used at the discretion of the instructor.


Pedagogy Statement:
John Dewey stated that “education should not revolve around the acquisition of a pre-determined set of skills, but rather the realization of one’s full potential and the ability to use those skills for the greater good.” Applying this idea to the pedagogy of this course, the teacher will help students learn both theory and application in a modern curriculum. By the end of the course, students should know how to use technology to apply specific skills and to analyze the results of their work.

This course supports an inclusive learning environment where diverse perspectives are recognized, respected and seen as a source of strength. This environment is supported by activities that consider data from a diverse set of sources. Moreover, students will interact in groups and will be encouraged to think critically in the face of data that may disagree with their own beliefs.


Instructional Mediums:
Lecture

Hybrid

Maximum Class Size: 25
Optimum Class Size: 20