CPSC 350 - Applications of Databases - Spring 2026

Assignment #4

Possible experience: +40XP

Due: Thursday, Mar 12th, midnight

Bulk import

Your mission in this homework is to import all the relevant data in the Stephen's galactic MariaDB database into your own SQLite database. This will certainly involve multiple kinds of impedance mismatches between the two schemas that you will have to overcome.

Connecting to galactic

Log on to the cpsc server, and use mariadb to inspect the galactic database. Use the SHOW TABLES; and SELECT * FROM tablename LIMIT 10; command to inspect everything that's in there. Do some deep, deep thinking about what kinds of information are in galactic, which parts are relevant to your own SQLite database, and where all that information should be stored in SQLite. Do not rush this. Think hard.

On the machine in which you have your SQLite database, make sure that Python has the mariadb package installed. If you use pip, the command will be:

$ pip install mariadb

Sanity check that you can now create the following program using vim (calling it "sanitycheck.py" or some such) and run it in Python:

import mariadb
conn = mariadb.connect(user="theSecretUsername",
    password="theSecretPassword",
    host="cpsc.umw.edu", port=3306, database="galactic")

where theSecretUsername and theSecretPassword should be replaced with the values I wrote on the whiteboard in class.

Assuming this does not error out, you're good. All fiddly obstacles should be out of the way and you can actually write your real program.

Importing

Your goal is now to write a Python program that will read from my MariaDB database and populate your SQLite database accordingly. I will let you figure out how to do that. I'll only mention the following hints:

  1. Be on the lookout for impedance mismatches. The MariaDB database certainly contains some information you don't care about, and it is certainly missing some information that is present in your database tables. If you don't care about something, ignore it. If you are missing something, ignore it (leave it NULL in your SQLite tables).
  2. In particular, you will discover that although every flight plan uses a particular kind of aircraft, that information is present only implicitly through the trips table, not directly through the flightplans table. This in turn means that you will be missing aircraft type information for some of your flight plans — namely, those which have no scheduled flights yet for particular dates. In that case, your SQLite database can and should properly be ignorant of what type of plane is used for that flight plan. In cases where you can know this, however (which is most of the time), your database should have the aircraft type information imported into it.
  3. You will discover that the sales table — which records transactions for every customer that has bought (or canceled and received a refund for) any seat on any flight — has thousands of rows. It is not a good idea for your Python program to read that entire table. This is especially true because you don't need all the information in it: you don't care what the passengers' names are, for example, or when they bought (or canceled) their tickets: you only care how many tickets have been purchased for each flight. Think deeply about how you can extract all the information you need from that table without actually reading all thousands of rows. (Hint: use group by.)
  4. Note that some aircraft are very small (e.g., an X-wing fighter, which can only hold one (small, non-human) passenger plus the pilot) and hence do not have a co-pilot. Obviously, the your co-pilot column should be null for flights using such aircraft.
  5. You should infer pilot certifications from the ships that the various personnel are scheduled to fly. For example: if crew member Martha is the pilot or co-pilot for a United 789 flight on some day, and United 789 uses a Boeing 747 as its aircraft, you should properly deduce that Martha is certified on the Boeing 747, and mark your database as such.
  6. Finally, most rows of the sales are for ticket purchases, but a few are for cancellations where a customer got a refund for a ticket they had previously purchased. It goes without saying that if (say) 15 people bought a ticket for a particular flight, and 3 people returned their ticket for a refund, there will be a total of 12 passengers (not 15, and not 18) booked for that day's flight.

Turning it in

Send me an email with subject line "CPSC/DATA 350 Assignment #4 turn-in" with your SQLite database (.db file) attached. (Double-check that it's actually attached!) Note that I'm not asking for your Python code; I'm asking for your SQLite database into which you have bulk imported all the galactic data using your code.

Getting help

Come to office hours, or send me email with subject line "CPSC 350 Assignment #4 help!!"