Courses

Non-Thesis Master Program Structure 

Data Science Non-Thesis Master program is comprised of:

  • At least 30 credits
  •  
  • At least 10 courses
  •  
  • DASC 591 – Non-Thesis Master Term Project
  •  
  • ENGL 500 – Academic Writing
  •  
  • ETHR 500 – Scientific Research Methods and Research and Publication Ethics

Program Courses

A general introduction to programming using the Python programming language. Problem solving with ordered steps, algorithmic thinking, implementation

Frequently used mathematical concepts in data science: Probability, statistics, linear algebra, calculus, regression, hypothesis testing, functions, equations and simple graphs.

Basics of data manipulation and visualization with relevant Python libraries, different types of plots, vector/matrix representations, linear algebra operations, probability/statistics operations, data analysis applications,

A broad introduction to machine learning covering regression, classification, clustering, and dimensionality reduction methods; supervised and unsupervised models; linear and nonlinear models; parametric and nonparametric models; combinations of multiple models; comparisons of multiple models and model selection.

Introduction to system dynamics and systems thinking; theory and applications to support strategic decision making. Current topics in health policy and management, mapping tools for system dynamics, crisis/pandemic management, case studies, sustainability and management simulations. Concepts of systems thinking and modeling for better decision making and analysis

Applications of data loading, pre-processing, visualization, exploratory data analysis. Using various models for regression and classification. Practical applications of evaluating learning performance, pipelining and model selection. Off-the-shelf dimensionality reduction and clustering methods.

This course gives you a hands-on introduction to dealing with data, including fundamental statistical data analysis and data visualization. The hands-on Python tasks are designed to provide students with real data processing and visualization abilities.

Big data characteristics, distributed systems, hdfs, map reduce, spark, dask, using streaming data, data visualization, performing scalable data processing operations, big data infrastructure, data transfer and integration, big data, data science and machine learning relationships, and concepts

Neural networks, their basic building blocks, the need for deep networks and their training. Convolutional, recurrent, and embedding layers, multi-input and multi-output layered architectures and their usage. Other practical considerations, matching data with models/architectures and common use-cases.

What is big data? Value creation with big data, Data sources and extraction from unstructured sources. Learning tasks and statistical learning. Fundamental concepts and their operationalization; overfitting vs. generalization; curse of dimensionality; correlation vs causation; data collection strategy and biases; security, privacy and ethical considerations.

Study of computational models of visual perception and their implementation in computer systems. Topics include image formation; edge, corner, and boundary extraction, segmentation, matching, pattern recognition and classification techniques; 3-D Vision: projection geometry, camera calibration, shape from stereo/silhouette/shading, model-based 3D object recognition; color texture, radiometry and BDRF; motion analysis.

Introduction to distributed computing, an overview of operating systems, process synchronization and deadlocks, threads and thread synchronization, communication protocols, synchronization in distributed systems, management of time, causality, logical clocks, consistent global states, distributed mutual exclusion, distributed deadlock detection, election algorithms, agreement protocols, consensus, multicast communication, distributed transactions, replication, shared-memory model, scheduling, distributed file systems, fault tolerance in distributed systems, distributed real-time systems.

Fundamental concepts of concurrency, non-determinism, atomicity, race-conditions, synchronization, mutual exclusion. Overview of parallel architectures, multicores, distributed memory. Parallel programming models and languages, multithreaded, message passing, data-driven, and data-parallel programming. Design of parallel programs, decomposition, granularity, locality, communication, load balancing. Patterns for parallel programming, structural, computational, algorithm strategy, concurrent execution patterns. Performance modeling of parallel programs, sources of parallel overheads.

Overview of Computer Security Techniques, Conventional Encryption, Public-Key Cryptography, Key Management, Message Authentication, Hash Functions and Algorithms, Digital Signatures, Authentication Protocols, Access Control Mechanisms, Network Security Practice, TCP/IP Security, Web Security, SSL (Secure Socket Layer), Denial-of-Service Attacks, Intrusion Detection, Viruses.

Applications of artificial intelligence in user interfaces. Design, implementation, and evaluation of user interfaces that use machine learning, computer vision, and pattern recognition technologies. Supporting tools for classification, regression, multi-modal information fusion. Gaze-tracking, gesture recognition, object detection, tracking, haptic devices, speech-based and pen-based interfaces.

Basic linear models for classification and regression; stochastic gradient descent (backpropagation) learning; multi-layer perceptrons, convolutional neural networks, and recurrent neural networks; recent advances in the field; practical examples from machine translation, computer vision; practical experience in programming, training, evaluating and benchmarking deep learning models.

Advanced topics in data structures, algorithms, and their computational complexity. Asymptotic complexity measures. Graph representations, topological order, and algorithms. Forests and trees. Minimum spanning trees. Bipartite matching. Union-find data structure. Heaps. Hashing. Amortized complexity analysis. Randomized algorithms. Introduction to NP-completeness and approximation algorithms. The shortest path methods. Network flow problems.

Algorithms, models, representations, and databases for collecting and analyzing biological data to draw inferences. Overview of available molecular biological databases. Sequence analysis, alignment, database similarity searches. Phylogenetic trees. Discovering patterns in protein sequences and structures. Protein 3D structure prediction: homology modeling, protein folding, representation for macromolecules, simulation methods. Protein-protein interaction networks, regulatory networks, models and databases for signaling networks, data mining for signaling networks.

Convex analysis, optimality conditions, linear programming model formulation, simplex method, duality, dual simplex method, sensitivity analysis; assignment, transportation, and transshipment problems.

Combinatorial optimization, structure of integer programs, pure integer and mixed-integer programming problems, branch and bound methods, cutting plane and polyhedral approach, convexity, local and global optima, Newton-type, and conjugate gradient methods for unconstrained optimization, Karush-Kuhn-Tucker conditions for optimality, algorithms for constrained nonlinear programming problems, applications in combinatorial and nonlinear optimization.

Tools, techniques, and skills needed to analyze decision-making problems characterized by uncertainty, risk, and conflicting objectives. Methods for structuring and modeling decision problems and applications to problems in a variety of managerial decision-making contexts. Structuring decision problems: Decision trees, model building, solution methods and sensitivity analysis; Bayes’ rule, the value of information, and using decision analysis software. Uncertainty and its measurement: Probability assessment. Utility Theory: Risk attitudes, single- and multiattribute utility theory, and risk management. Decision making with multiple objectives.

Theory and practice of dynamic programming, sequential decision making over time; the optimal value function and Bellman’s functional equation for finite and infinite horizon problems; Introduction of solution techniques: policy iteration, value iteration, and linear programming; General stochastic formulations, Markov decision processes; application of dynamic programming to network flow, resource allocation, inventory control, equipment replacement, scheduling and queueing control.

Overview of industrial engineering and operations research applications in healthcare. Capacity planning and management in hospitals. Evaluating the effects of interventions on the spread of infectious diseases. Analyzing the effects of resource allocation policies. Analysis of screening policies and their effects. Developing medical decision modeling to build decision support systems.

Hypothesis Testing, Signal Detection, Parameter Estimation, Cramer-Rao Lower Bound, Maximum Likelihood/ Maximum a Posteriori Estimation, Stochastic Least Squares Estimation and Kalman Filtering.

 

Fundamental concepts of sensors, driving/readout electronics; rational design of sensors, state-of-the-art commercial sensing, remote sensing, Internet of Things; Strain, piezoelectric, pressure, temperature, GPS, chemical, biological, resistive, capacitive sensors, photodetectors, accelerometers, gyroscopes; optical, magnetic sensors, RF sensors, power electronics, self-powered sensors; Sensor networks based on WiFi, 4G, Bluetooth; Human/Animal sensory systems; Feedback control systems; Microfabrication of sensors.

Review of sensing fundamentals, materials, mechanisms, read-out circuits, details of ADC’s, DAC’s, feedback control, real-time operating systems and implementations, mobile/wireless connectivity, mobile app development, predictive data analytics, machine learning implementations, embedded systems and board design process example, constraint-driven designs (power savings, bio-implantable sensing, long-range wireless connectivity, mission-critical life-support systems). Example state-of-the-art sensor system design examples on multimedia, security, healthcare, energy, consumer electronics tracks. Course project, optional labs.

Review of probability and statistics: random variables, univariate and joint probability distributions, expectations; bivariate normal; sampling distributions; introduction to asymptotic theory; estimation; inference. Linear regression: conditional expectation function; multiple regression; classical regression model, inference and applications.

Departures from the standard assumptions: specification tests; a first look at time series; generalized regression; nonlinear regression; simultaneous equations, identification, instrumental variables. Extensions and applications: ML, GMM, VAR, GARCH, panel data.

The focus of the course is the empirical applications and tests of macroeconomic and/or microeconomic theories. Students are provided with the ability to analyze the standard econometric applications.

The principles and computational methods to study the biological data generated by genome sequencing, gene expressions, protein profiles, and metabolic fluxes. Application of arithmetic, algebraic, graph, pattern matching, sorting and searching algorithms and statistical tools to genome analysis. Applications of Bioinformatics to metabolic engineering, drug design, and biotechnology.

Examine the technologies, environmental impacts and economics of main energy sources of today and tomorrow including fossil fuels, nuclear power, biomass, geothermal energy, hydropower, wind energy, and solar energy. Comparison of different energy systems within the context of sustainability. Hydrogen economy and fuel cells.

Crude oil and biomass refining technologies. Fractionation, catalytic- and thermo- cracking, gasoline and diesel upgrading and other side processes in crude oil refining; gasification, pyrolysis, transesterification and condensation processes in biomass refining; economical and environmental factors in refining.

Drug design consists of identifying a target (DNA, RNA, proteins) that is known to cause a certain disease and selectively inhibiting or modifying its activity by binding a drug molecule to a specified location on that target. In this course, computational techniques for designing such a drug molecule will be taught. The topics to be covered are: Identification of the active part. Forces involved in drug-receptor interactions. Screening of drug libraries. Use of different software to determine binding energies. Identifying a lead molecule. Methods of refining a lead molecule for better suitability. Case studies: A survey of known drugs, success and failure stories.

Introduction to the principles of structural biology, and computational techniques used to investigate the structure, dynamics and function of biological systems. Description of theoretical and computational tools to investigate relevant problems in the domain of biophysics and biochemistry. The fundamentals structure determination techniques, energy functions, molecular dynamics simulations, molecular docking and techniques to predict the protein structure and protein-protein interactions.

Business Value Creation with Big Data. Data sources. Data mining tasks and supervised/ unsupervised learning. Evaluation criteria. Generalizing from data versus overfitting. Data mining process. Data collection strategy. Security, privacy and ethical considerations. Case studies.

Fundamentals of architecture and representation of information. The journey from data to information with regards to relation, grouping, hierarchy as well as mental information processing. Forming, representing and visualizing the information for different types of media. Case studies such as instructional design, time-series, spatio-temporal data, comparison, big data. Exploration of tangible, gestural, device-based interaction with the information.

An introduction to important topics in biostatistical concepts and reasoning. Tools for describing central tendency and variability in data; methods for performing inference on population means and proportions via sample data; statistical hypothesis testing and its application to group comparisons. Several statistical methods such as linear regression, ANOVA, logistic regression, survival analysis, nonparametric methods, and ROC analysis, commonly used to study biological problems. In-lab practices on computers and software for statistical analysis, to provide students with the skills to generate, read, and interpret the results in their fields of study.