Introduction to Databases — Course Overview

Table of Contents:

Course Components
Standard DBMS Features
Past Examples
Navigation Strategies
Duke Community Standard
Data Organization

Introduction to Databases: Course Snapshot

This course offers a balanced, practice-oriented introduction to modern database systems. It pairs core theory—data models, relational design, and query processing—with hands-on techniques for building and tuning real-world data solutions. Learners follow a clear progression from conceptual modeling and normalization to writing efficient SQL, understanding storage and indexing choices, and applying transaction and concurrency controls that preserve correctness under load. Applied modules introduce basic ETL patterns, analytics-oriented schemas, and scaling considerations for larger datasets.

What you will learn

Students gain both the vocabulary and practical skills needed to design, implement, and maintain dependable data systems. Key learning outcomes include:

Relational modeling and normalization techniques that reduce redundancy and support data integrity.
Writing and optimizing SQL for retrieval and updates, including joins, aggregations, window functions, and query refactoring strategies.
Foundational relational algebra concepts that explain how queries are executed and optimized.
Database internals: storage layouts, index types (B-tree, hash, composite), and how execution plans affect performance.
Transaction management and concurrency control: isolation levels, locking strategies, and recovery methods aligned with ACID properties.
An introduction to semi-structured data and when to combine relational and non-relational approaches for flexibility and performance.
Practical ETL and analytics patterns: designing data-mart schemas, simple reporting pipelines, and initial strategies for parallel or distributed processing.

Topics and practical focus

The material emphasizes example-driven explanations and step-by-step demonstrations. Expect guided SQL walkthroughs that start with simple SELECTs and progress to multi-join queries and window functions. ER-style modeling exercises help you translate requirements into normalized schemas, while case studies show how index selection and storage organization influence common query patterns and throughput.

Concurrency and recovery topics are explored through realistic scenarios that reveal trade-offs between isolation, latency, and application correctness. Applied modules cover extracting and transforming source data, loading analytical schemas, and assembling lightweight reporting pipelines. Comparative discussions clarify when a single-DBMS approach suffices and when distributed or parallel solutions are warranted for larger data volumes.

Hands-on projects and study tips

Hands-on exercises reinforce theory with practical tasks: design a normalized application schema (for example, a library or user-account system), implement constraints and representative data, profile queries, and apply indexing and query rewrites to improve performance. A recommended workflow: sketch an ER model, implement schema and constraints, populate representative datasets, measure query plans and latencies, then iterate—adding indexes or modifying queries where measurements indicate bottlenecks.

Use the included glossary and worked examples to bridge conceptual gaps. Focus on reproducible experiments (change one variable at a time), and document performance findings to build intuition about optimizer behavior and storage trade-offs.

Who benefits most

This course is well suited to undergraduate students in computer science or information systems, software developers building data-backed applications, and data analysts who need a deeper understanding of storage and query behavior. Novices will find clear explanations and glossary support for essential terms; intermediate learners will benefit from performance-focused sections and project prompts that translate theory into deployable solutions.

How to use this overview

Start with the conceptual chapters to establish a common vocabulary, then work through example-driven modules and projects to apply concepts. Treat the hands-on sections as reproducible experiments: measure, adjust, and iterate. The course favors Jun Yang's practical approach—combine clear theory with incremental, measurable practice to build robust database skills that apply across OLTP and introductory analytics contexts.