/images/pfp.png

Leetcode - Two pointers

Two pointer is a fairly common and useful technique for solving leetcode challenges. When several comparisons of pairs in an array are required, this technique tends to be a solid first approach. It boils down to keeping, as the name suggests, two pointers, either one for the first and another for the second element of the array or one for the first and another for the last element of the array.

SQL - Compare consecutive entries

Problem: given a table my_table with an id column, a date column called my_date and a numerically valued column my_value, return the IDs of the rows whose my_value value is larger than their predecessor. Assume that the table is sorted in descending order by my_date and that each entry corresponds to exactly one day.

Solution:

1
2
3
4
5
SELECT mt1.id
FROM my_table AS mt1
JOIN my_table AS mt2
ON mt1.my_date = mt2.my_date + INTERVAL '1 day'
WHERE mt1.my_value > mt2.my_value

Explanation: We simply join the table on itself by the my_date column while shifting all dates by one day and then use the WHERE clause to filter for the rows we want to keep. This trick is highly localized in the sense that it requires all days to exist, otherwise it would fail.

SQL - Find duplicates

Problem: given a table my_table with two columns: id representing a unique identifier and constituting the table’s primary key and col1 being a text column, we want to write a query that finds all values of col1 of my_table that have one or more duplicates.

Solution:

1
2
3
SELECT col1
FROM my_table
GROUP BY col1 HAVING COUNT(col1) > 1

Explanation: this is pretty straightforward, we simply group by col1 which will cause us to produce a table composed uniquely of the distinct values of col1. By specifying HAVING COUNT(col1) > 1, we are essentially filtering the table of distinct values of col1 to restrict ourselves to the values that occur more than once.

SQL - Second highest distinct value

Problem: given a table my_table with an numeric column col, we want to return the second highest distinct value. In case there is no second highest (for example, when the table only has one row or all the rows in the table have the same value), we want to return null.

Solution:

1
2
3
4
5
6
SELECT MAX(col)
FROM my_table 
WHERE col NOT IN ( 
    SELECT MAX(col) 
        FROM my_table
    );

Explanation: This is a nifty trick. The subquery simply produces a temporary table with one single value, the maximum of the column col of table my_table. Then we again select the maximum value of column col of my_table where the value is not in the temporary table we’ve produced, i.e., the maximum of the values that include every value except the actual maximum is the second highest value. Additionally, if there is no second highest value, there is nothing to select so we naturally produce a null value.

High-level view of storing data in a database

Records, bytes, tables, pages and files

  • Record - a set of information grouped together, where the different pieces of information can have different types;
  • Byte representation of a record - how the record is actually stored in disk as bytes. Since records can have fields of different types, not all fields take up the same amount of bytes. Additionally, fields can be of fixed size, such as char, or of variable size, such as varchar. The specification of the types that compose a record is called a schema;
    • There are many ways a record can be represented in disk. For example, records can be of fixed length or of variable length and this obviously changes how we store and access them. However, the goals are always the same: records should be compact in memory/disk and there should be fast access to their fields;
  • Table - a collection of records sharing a schema;
  • Page - a chunk of sequential bytes that composes a unit of transfer for disk read/write and that can hold multiple records;
    • Much like records, the way page layout is handled is also affected by whether or not the records are of fixed length. Most notably, the records in a page can be packed or unpacked, where packed refers to not having “empty” bytes between records in a page;
  • File - a logical collection of pages that compose a table. Note that tables can span multiple files or even multiple machines;
    • Files themselves in a database can be structured in a variety of ways: unordered heap files (records placed arbitrarily across pages), clustered heap files (records and pages are grouped), sorted files (pages and records are in sorter order), index files (B+ trees, linear hashing, etc., may contain records or point to records in other files), etc.

Note that we’ve used a slotted page as an example, but that is just one out of many possible ways to manage records in a page.