Set Union
The union operation in relational algebra combines all tuples from two relations (i.e., tables) to form a new relation. The union operator is typically denoted by ∪. Its mathematical definition is as follows:
Formal Definition:
Given two relations R(A1,A2,…,An) and S(A1,A2,…,An), their union R∪S is defined as:
R∪S={t∣t∈R or t∈S}
where t represents a tuple. R and S must be union-compatible, meaning they have the same attributes (columns).
Note: The union operation removes duplicate tuples, ensuring that the tuples in the resulting relation are unique.
Assume we have two relations R and S with the following structure and data:
Relation R:
ID123NameAliceBobCarol
Relation S:
ID234NameBobCarolDave
The union of relations R and S, denoted as R∪S, results in the following relation:
Relation R∪S:
ID1234NameAliceBobCarolDave
In this example, the tuples (2,Bob) and (3,Carol) that are present in both relations R and S appear only once in the resulting relation R∪S. The union result includes all distinct tuples from both relations.
Set Difference
The difference operation in relational algebra is used to remove tuples present in one relation from another relation. The difference operator is typically denoted by −.
Formal Definition:
Given two relations R(A1,A2,…,An) and S(A1,A2,…,An), their difference R−S is defined as:
R−S={t∣t∈R and t∈/S}
where t represents a tuple. The result of the difference operation contains only those tuples that are present in R but not in S.
Assume we have two relations R and S with the following structure and data:
Relation R:
ID123NameAliceBobCarol
Relation S:
ID234NameBobCarolDave
The difference operation R−S results in the following relation:
Relation R−S:
ID1NameAlice
In this example, the tuple (1,Alice) exists only in R and not in S, so it appears in the resulting relation R−S. The other tuples (2,Bob) and (3,Carol) also exist in S, so they are excluded from the difference result.
The projection operation in relational algebra is used to select specific columns (attributes) from a relation to create a new relation. The projection operator is typically denoted by π (Greek letter pi).
Formal Definition:
Given a relation R(A1,A2,…,An) and a subset of attributes {Ai,Aj,…,Ak}, the projection operation πAi,Aj,…,Ak(R) is defined as:
πAi,Aj,…,Ak(R)={{Ai,Aj,…,Ak}(t)∣t∈R}
where {Ai,Aj,…,Ak}(t) represents the values of attributes Ai,Aj,…,Ak extracted from tuple t. The projection operation removes duplicate tuples in the result.
Assume we have a relation R with the following structure and data:
Relation R:
ID123NameAliceBobCarolAge302528
Now, performing projection on relation R, selecting attributes Name and Age:
πName,Age(R)
Resulting Relation:
NameAliceBobCarolAge302528
In this example, the projection operation selects the Name and Age columns from R, resulting in a new relation that includes only the data for these two attributes.
If the projection result contains duplicate tuples, the projection operation will automatically remove these duplicates, ensuring that the tuples in the resulting relation are unique.
The selection operation in relational algebra is used to select tuples from a relation that satisfy a specific condition. The selection operator is typically denoted by σ (Greek letter sigma).
Formal Definition:
Given a relation R(A1,A2,…,An) and a condition θ (such as a comparison or logical expression), the selection operation σθ(R) is defined as:
σθ(R)={t∣t∈R and t satisfies condition θ}
where t represents a tuple and θ is a Boolean expression defining the selection condition.
Assume we have a relation R with the following structure and data:
Relation R:
ID1234NameAliceBobCarolDaveAge30252822
Now, performing selection on relation R, selecting tuples where Age is greater than 25:
σAge>25(R)
Resulting Relation:
ID13NameAliceCarolAge3028
In this example, the selection operation filters the tuples in R where Age is greater than 25. The resulting relation includes all tuples that meet this condition.
The selection operation allows us to extract specific data from a relation by specifying conditions, which is very common in database queries.
The renaming operation in relational algebra is used to change the name of a relation or its attributes (columns). The renaming operator is typically denoted by ρ (Greek letter rho).
Formal Definition:
Given a relation R(A1,A2,…,An) and a new name S(B1,B2,…,Bn), the renaming operation ρS(B1,B2,…,Bn)(R) is defined as:
ρS(B1,B2,…,Bn)(R)=S(B1,B2,…,Bn)
where S(B1,B2,…,Bn) represents the new relation name and attribute names.
Assume we have a relation R with the following structure and data:
Relation R:
ID123NameAliceBobCarolAge302528
Now, performing the renaming operation on relation R, changing the relation name to Person, and renaming attributes ID to PersonID, Name to FullName, and Age to Years:
ρPerson(PersonID,FullName,Years)(R)
Resulting Relation:
PersonID123FullNameAliceBobCarolYears302528
In this example, the renaming operation changes the name of the relation R to Person and updates the attribute names. Renaming is useful, especially when you need to alias a relation or adjust the output format of query results.
Natural Join
The natural join operation in relational algebra combines two relations based on their common attributes (i.e., attributes with the same name), creating a new relation. The natural join operator is usually denoted by ⋈.
Formal Definition:
Given two relations R(A1,A2,…,An,B1,B2,…,Bm) and S(B1,B2,…,Bm,C1,C2,…,Ck), their natural join R⋈S is defined as:
R⋈S={t∣t=(r∪s) and r∈R,s∈S and r[Bi]=s[Bi] for all common attributes Bi}
The result of the natural join includes all combinations of common attribute values that are the same in both R and S, while removing duplicate common attributes.
Assume we have two relations R and S with the following structure and data:
Relation R:
ID123NameAliceBobCarolDeptID101102103
Relation S:
DeptID101102104DeptNameHRITFinance
Now, performing the natural join R⋈S on their common attribute DeptID:
R⋈S
Resulting Relation:
ID12NameAliceBobDeptID101102DeptNameHRIT
In this example, only tuples with the same DeptID values in both R and S are combined and appear in the result. The tuple (3,Carol,103) in R does not have a matching DeptID in S, so it does not appear in the result. Similarly, the tuple with DeptID 104 in S has no matching DeptID in R, so it is not included in the result.
The natural join is commonly used in database queries, especially when combining multiple tables based on common fields, as it automatically removes duplicate common attributes, simplifying the query results.
θ-Join
The θ-join (Theta Join) in relational algebra is a more general join operation that allows tuples from two relations to be joined based on any condition (θ). The θ-join is denoted by R⋈θS, where θ is a condition expression that may include comparison operators (such as =, <, >, <=, >=, <>, etc.).
Formal Definition:
Given two relations R(A1,A2,…,An) and S(B1,B2,…,Bm), and a condition θ, the θ-join R⋈θS is defined as:
R⋈θS={t∣t=(r∪s),r∈R,s∈S, and r and s satisfy the condition θ}
In a θ-join, the resulting relation contains all tuples (r,s) that satisfy the condition θ and includes all attributes from R and S.
Assume we have two relations R and S with the following structure and data:
Relation R:
ID123NameAliceBobCarolAge302528
Relation S:
DeptID101102103NameAliceBobDave
Now, performing the θ-join on R and S with the condition R.Name=S.Name:
R⋈R.Name=S.NameS
Resulting Relation:
ID12NameAliceBobAge3025DeptID101102
In this example, the condition for the θ-join is R.Name=S.Name, so only tuples with matching Name values in R and S are joined and appear in the result. Tuples with names that do not match (e.g., Carol and Dave) are not included in the result.
The θ-join is a flexible join operation that can combine two relations based on any condition. Its generality allows it to represent various join operations, including natural joins and equijoins.
Equijoin
As mentioned, the equijoin is a special case of the θ-join where the condition θ is equality =.
Semi Join
The semi-join is a relational algebra operation that selects tuples from one relation that match tuples in another relation. Unlike join operations, a semi-join result includes only attributes from the first relation and does not combine all attributes from both relations.
Formal Definition:
Given two relations R(A1,A2,…,An) and S(B1,B2,…,Bm), the semi-join R⋉S is defined as:
R⋉S={t∈R∣∃s∈S such that t and s satisfy the join condition θ}
where θ is the join condition, typically t[Ai]=s[Bj] (i.e., equality join condition).
Assume we have two relations R and S with the following structure and data:
Relation R:
ID1234NameAliceBobCarolDaveDeptID101102103104
Relation S:
DeptID101102105DeptNameHRITFinance
Now, performing the semi-join R⋉S with the condition R.DeptID=S.DeptID:
R⋉S
Resulting Relation:
ID12NameAliceBobDeptID101102
In this example, the semi-join selects tuples from R where DeptID has a match in S, returning these matching tuples from R. Unlike join operations, the result includes only attributes from R, excluding attributes from S.
Summary: The semi-join is used to extract tuples from one relation that have matches in another relation without including the second relation's attributes. It is useful for query optimization and reducing data redundancy.
Anti Join
The anti-join is a relational algebra operation used to select tuples from one relation that do not have matches in another relation. In other words, an anti-join returns tuples that do not satisfy the join condition.
Formal Definition:
Given two relations R(A1,A2,…,An) and S(B1,B2,…,Bm), the anti-join R⋉S (or R∖(R⋈S)) is defined as:
R⋉S={t∣t∈R and there is no s∈S such that t and s satisfy the join condition θ}
where θ is the join condition, typically t[Ai]=s[Bj] (i.e., equality join condition).
Assume we have two relations R and S with the following structure and data:
Relation R:
ID1234NameAliceBobCarolDaveDeptID101102103104
Relation S:
DeptID101102105DeptNameHRITFinance
Now, performing the anti-join R⋉S with the condition R.DeptID=S.DeptID:
R⋉S
Resulting Relation:
ID34NameCarolDaveDeptID103104
In this example, the anti-join selects tuples from R where DeptID does not have a corresponding match in S. Tuples with DeptID values 103 and 104 are included in the result because they have no matches in S.
Summary: The anti-join is used to find tuples from one relation that do not match any tuples in another relation based on the join condition. It is useful for identifying records that do not meet certain criteria, such as finding entities not included in a specific group or condition.
In relational algebra, the division operation is a higher-level operation used to select tuples from one relation that are associated with all tuples in another relation. In other words, the result of a division operation includes those tuples from the dividend relation that match every tuple in the divisor relation.
Formal Definition:
Given two relations R(A1,A2,…,An) and S(B1,B2,…,Bm), where B1,B2,…,Bm is a subset of A1,A2,…,An (denoted as Bi⊆Ai), the division operation R÷S is defined as:
R÷S={t[A1,A2,…,An−m]∣∀s∈S,∃r∈R, such that r[B1,B2,…,Bm]=s}
The resulting relation includes tuples that represent the values of attributes A1,A2,…,An−m in R that are associated with every tuple in S.
Suppose we have two relations, R and S, with the following structures and data:
Relation R:
StudentAliceAliceBobBobCarolDaveSubjectMathPhysicsMathPhysicsMathMath
Relation S:
SubjectMathPhysics
Now, performing the division operation R÷S yields the students who have taken all the subjects listed in S:
R÷S
Resulting Relation:
StudentAliceBob
In this example, R is a relation that includes students and the subjects they have taken, while S is a list of subjects of interest. The result of the division operation is the set of students who have taken every subject in S, so only Alice and Bob appear in the result, as they have taken both Math and Physics.
Conclusion: The division operation is particularly useful in database queries when looking for tuples that meet all specified conditions. It helps in finding a subset of tuples that fulfill a particular requirement.
In relational algebra, an outer join is an extended join operation that preserves tuples that do not satisfy the join condition. Outer joins are classified into three types: Left Outer Join, Right Outer Join, and Full Outer Join. The result of these operations includes not only the tuples that satisfy the join condition but also the unmatched tuples, with NULL used to fill in missing attribute values.
Left Outer Join
Formal Definition:
Given two relations R(A1,A2,…,An) and S(B1,B2,…,Bm), the Left Outer Join R⟕S is defined as:
R⟕S={t∣t=(r∪s) and r∈R,s∈S satisfy the join condition θ}∪{r∣r∈R and there is no s∈S that satisfies the join condition θ}
The Left Outer Join returns all tuples from R. If a tuple in R does not have a match in S, the corresponding S attributes in the result are filled with NULL.
Relation R:
ID1234NameAliceBobCarolDaveDeptID101102103104
Relation S:
DeptID101102105DeptNameHRITFinance
Left Outer Join R⟕S Result:
ID1234NameAliceBobCarolDaveDeptID101102103104DeptNameHRITNULLNULL
In this example, Carol and Dave have DeptIDs that do not match any DeptID in S, so the DeptName column is NULL for these rows in the result.
Right Outer Join
Formal Definition:
The Right Outer Join is similar to the Left Outer Join, except that it preserves all tuples from the second relation. If a tuple in the second relation has no match in the first relation, the corresponding attributes from the first relation are filled with NULL.
R⟖S={t∣t=(r∪s) and r∈R,s∈S satisfy the join condition θ}∪{s∣s∈S and there is no r∈R that satisfies the join condition θ}
Right Outer Join R⟖S Result:
ID12NULLNameAliceBobNULLDeptID101102105DeptNameHRITFinance
In this example, the tuple with DeptID 105 in S has no matching tuple in R, so the ID and Name columns are NULL for this row in the result.
Full Outer Join
Formal Definition:
A Full Outer Join returns all tuples from both relations, including those that do not match. If a tuple in one relation has no corresponding match in the other, the attributes from the other relation are filled with NULL.
R⟗S=(R⟕S)∪(R⟖S)
Full Outer Join R⟗S Result:
ID1234NULLNameAliceBobCarolDaveNULLDeptID101102103104105DeptNameHRITNULLNULLFinance
In a Full Outer Join, the result includes all tuples from both R and S, with unmatched portions filled with NULL.
Conclusion:
- Left Outer Join preserves all tuples from the left table, filling unmatched parts with NULL.
- Right Outer Join preserves all tuples from the right table, filling unmatched parts with NULL.
- Full Outer Join preserves all tuples from both tables, filling unmatched parts with NULL.
These outer join operations are particularly useful in database queries when we want to retain unmatched records in the result.