Relational Databases
A collection of data items with predefined relationships between them, organized as a set of tables with columns and rows.
- Relational databases are the backbone of modern data management systems.
- They enable efficient storage, retrieval, and manipulation of structured data.
Key Components of Relational Databases
- Tables:
- The fundamental building blocks of a relational database, where data is stored in rows and columns.
- Each table represents an entity (e.g., Students, Courses) and contains attributes (e.g., Name, Age).
- Primary Keys: Unique identifiers for each record in a table, ensuring data integrity.
- Foreign Keys: Attributes that establish relationships between tables by referencing primary keys in other tables.
- Relationships: Connections between tables, such as one-to-many, many-to-many, or one-to-one.
Normalization
Normalization
A process used to organize data in a database to reduce redundancy and improve data integrity.
- As discussed before, there are three forms:
- First Normal Form (1NF): Ensures that each attribute contains only atomic (indivisible) values.
- Second Normal Form (2NF): Requires that all non-key attributes are fully functionally dependent on the primary key.
- Third Normal Form (3NF): Ensures that no transitive dependencies exist, meaning non-key attributes depend only on the primary key.
- Consider a table with student data:
| StudentID | Name | Course1 | Course2 |
|---|---|---|---|
| 1 | Alice | Math | Science |
| 2 | Bob | Math | English |
- This table violates 1NF because it contains repeating groups (Course1, Course2).
- To normalize it, we create two tables:
- Students Table:
| StudentID | Name |
|---|---|
| 1 | Alice |
| 2 | Bob |
- Courses Table:
| StudentID | Course |
|---|---|
| 1 | Math |
| 1 | Science |
| 2 | Math |
| 2 | English |
- Now, each attribute contains atomic values, and the data is organized efficiently.
Designing a Relational Database
Step 1: Identify Entities and Attributes
- Determine the entities (e.g., Students, Courses) and their attributes (e.g., Name, Age).
- Define the primary key for each entity.
Step 2: Establish Relationships
- Identify the relationships between entities (e.g., Students enroll in Courses).
- Use foreign keys to establish these relationships.
Step 3: Normalize the Data
- Apply 1NF to ensure atomicity.
- Apply 2NF to eliminate partial dependencies.
- Apply 3NF to remove transitive dependencies.
Step 4: Implement the Database
- Create Tables: Define the structure of each table, including attributes and data types.
- Insert Data: Populate the tables with records.
- Define Relationships: Use foreign keys to link tables.
- When designing a database, always start with a clear understanding of the data requirements and relationships.
- This will help you create an efficient and scalable database structure.
School Database
Entities and Attributes
- Students: StudentID (Primary Key), Name, Age
- Courses: CourseID (Primary Key), CourseName, Credits
- Enrollments: EnrollmentID (Primary Key), StudentID (Foreign Key), CourseID (Foreign Key), Grade
Relationships
- Students-Enrollments: One-to-Many (A student can have multiple enrollments).
- Courses-Enrollments: One-to-Many (A course can have multiple enrollments).
Normalization
Unnormalized:
| StudentID | StudentName | Age | Course1 | Course2 | Grades |
|---|---|---|---|---|---|
| 1 | Alice | 20 | Math | History | A, B |
| 2 | Bob | 21 | Math | - | A |
Problems:
- Repeating groups (Course1, Course2)
- Multiple values in a single column (Grades = "A, B")
1NF: Each table contains atomic values.
| StudentID | StudentName | Age | Course | Grade |
|---|---|---|---|---|
| 1 | Alice | 20 | Math | A |
| 1 | Alice | 20 | History | B |
| 2 | Bob | 21 | Math | A |
Problems:
- Redundant data: Student name and age repeat.
- Mix of different data types in one table (students and courses)
2NF: All non-key attributes are fully dependent on the primary key.
- Remove partial dependencies (where non-key attributes depend on part of a composite key).
| EnrollmentID (PK) | StudentID | StudentName | CourseID | CourseName | Grade |
|---|---|---|---|---|---|
| 1 | 1 | Alice | 101 | Math | A |
| 2 | 1 | Alice | 102 | History | B |
| 3 | 2 | Bob | 101 | Math | A |
3NF: No transitive dependencies exist.
| StudentID (PK) | StudentName | Age |
|---|---|---|
| 1 | Alice | 20 |
| 2 | Bob | 21 |
| CourseID (PK) | CourseName |
|---|---|
| 101 | Math |
| 102 | History |
| EnrollmentID (PK) | StudentID (FK) | CourseID (FK) | Grade |
|---|---|---|---|
| 1 | 1 | 101 | A |
| 2 | 1 | 102 | B |
| 3 | 2 | 101 | A |
These tables are in 3NF because:
- Each attribute contains atomic values.
- All non-key attributes (StudentID, CourseID, Grade) are fully dependent on the primary key (EnrollmentID).
- No transitive dependencies exist.
Querying Relational Databases
- Relational databases use Structured Query Language (SQL) to interact with data.
- SQL allows users to query, insert, update, and delete information efficiently.
Basic SQL Commands
- SELECT: Retrieves data from tables.
- INSERT: Adds new records to tables.
- UPDATE: Modifies existing records.
- DELETE: Removes records from tables.
- To retrieve all students enrolled in a specific course, use the following SQL query:
SELECT Students.Name FROM Students JOIN Enrollments ON Students.StudentID = Enrollments.StudentID WHERE Enrollments.CourseID = 101;
Advantages of Relational Databases
- Data Integrity: Ensured through primary and foreign keys.
- Scalability: Easily accommodates growing data.
- Flexibility: Supports complex queries and relationships.
- Security: Provides robust access control mechanisms.
Challenges and Considerations
- Complexity: Designing a normalized database can be time-consuming.
- Performance: Highly normalized databases may require more complex queries, impacting performance.
- Maintenance: Regular updates and backups are essential to ensure data integrity.
Social and Ethical Implications
- Data Privacy: Relational databases often store sensitive information, raising concerns about unauthorized access.
- Data Security: Ensuring data is protected from breaches is critical.
- Ethical Use: Organizations must use data responsibly, respecting user consent and privacy.
- A common mistake is to denormalize data for simplicity, leading to redundancy and potential data anomalies.
- Always strive for a balance between normalization and performance.