CPSC 350 - Applications of Databases - Spring 2026
Assignment #4
Possible experience: +40XP
Due: Thursday, Mar 12th, midnight
Bulk import
Your mission in this homework is to import all the relevant data in the
Stephen's galactic MariaDB database into your own SQLite database.
This will certainly involve multiple kinds of impedance mismatches between the
two schemas that you will have to overcome.
Connecting to galactic
Log
on to the cpsc server, and use mariadb to inspect the
galactic database. Use the SHOW TABLES; and SELECT * FROM
tablename LIMIT 10; command to inspect everything that's in there. Do some
deep, deep thinking about what kinds of information are in galactic,
which parts are relevant to your own SQLite database, and where all that
information should be stored in SQLite. Do not rush this. Think hard.
On the machine in which you have your SQLite database, make sure that Python
has the mariadb package installed. If you use pip, the
command will be:
$ pip install mariadb
Sanity check that you can now create the following program using
vim (calling it "sanitycheck.py" or some such) and run it in
Python:
import mariadb
conn = mariadb.connect(user="theSecretUsername",
password="theSecretPassword",
host="cpsc.umw.edu", port=3306, database="galactic")
where theSecretUsername and theSecretPassword should be
replaced with the values I wrote on the whiteboard in class.
Assuming this does not error out, you're good. All fiddly obstacles should
be out of the way and you can actually write your real program.
Importing
Your goal is now to write a Python program that will read from my
MariaDB database and populate your SQLite database accordingly. I will
let you figure out how to do that. I'll only mention the following hints:
- Be on the lookout for impedance mismatches. The MariaDB database certainly
contains some information you don't care about, and it is certainly missing
some information that is present in your database tables. If you don't care
about something, ignore it. If you are missing something, ignore it (leave it
NULL in your SQLite tables).
- In particular, you will discover that although every flight plan uses a
particular kind of aircraft, that information is present only implicitly
through the trips table, not directly through the flightplans
table. This in turn means that you will be missing aircraft type
information for some of your flight plans — namely, those which
have no scheduled flights yet for particular dates. In that case, your SQLite
database can and should properly be ignorant of what type of plane is used for
that flight plan. In cases where you can know this, however (which is
most of the time), your database should have the aircraft type
information imported into it.
- You will discover that the sales table — which records
transactions for every customer that has bought (or canceled and received a
refund for) any seat on any flight — has thousands of rows. It is not
a good idea for your Python program to read that entire table. This is
especially true because you don't need all the information in it: you
don't care what the passengers' names are, for example, or when they bought (or
canceled) their tickets: you only care how many tickets have been
purchased for each flight. Think deeply about how you can extract all the
information you need from that table without actually reading all thousands of
rows. (Hint: use group by.)
- Note that some aircraft are very small (e.g., an X-wing fighter,
which can only hold one (small, non-human) passenger plus the pilot) and hence
do not have a co-pilot. Obviously, the your co-pilot column should be
null for flights using such aircraft.
- You should infer pilot certifications from the ships that the
various personnel are scheduled to fly. For example: if crew member Martha is
the pilot or co-pilot for a United 789 flight on some day, and United 789 uses
a Boeing 747 as its aircraft, you should properly deduce that Martha is
certified on the Boeing 747, and mark your database as such.
- Finally, most rows of the sales are for ticket purchases, but a
few are for cancellations where a customer got a refund for a ticket they had
previously purchased. It goes without saying that if (say) 15 people bought a
ticket for a particular flight, and 3 people returned their ticket for a
refund, there will be a total of 12 passengers (not 15, and not 18)
booked for that day's flight.
Turning it in
Send me an email with subject line "CPSC/DATA 350 Assignment #4
turn-in" with your SQLite database (.db file) attached.
(Double-check that it's actually attached!) Note that I'm not asking for your
Python code; I'm asking for your SQLite database into which you have bulk
imported all the galactic data using your code.
Getting help
Come to office hours, or send me email with subject line "CPSC 350
Assignment #4 help!!"