CPSC 350 - Applications of Databases - Spring 2025

Assignment #4

Possible experience: +40XP

Due: Sunday, Mar. 2  Sunday, Mar. 9, midnight

Bulk import

Your mission in this homework is to import all the relevant data in the Stephen's galactic MariaDB database into your own SQLite database. This will certainly involve multiple kinds of impedance mismatches between the two schemas that you will have to overcome.

Connecting to galactic

First, Google/ChatGPT to find out how to install the MariaDB client software to your machine. (As I mentioned on the cheat sheet, you do not have to download the entire MariaDB server software. Only the client is needed.

Second, make sure you have the mariadb package installed in your Python environment. If you use pip, the command will be:

$ pip install mariadb

Now, you will attempt to connect to Stephen's MariaDB instance running on the cpsc server. This is the command:

$ mariadb -u yourusername -p -h cpsc.umw.edu -P 3306 galactic

In place of yourusername, of course, you should put your actual username. Note: when you are prompted for a password, nothing will appear while you type the password: not even dots or asterisks!

When this command works, you should be connected to galactic database. Type "show tables;" and make sure it shows some table names.

Finally, you should now be able to create this program using vim (calling it "sanitycheck.py" or some such) and run it in Python:

import mariadb
conn = mariadb.connect(user="yourusername",
    password="yourpassword",
    host="cpsc.umw.edu", port=3306, database="galactic")

Assuming this does not error out, you're good. All fiddly obstacles should be out of the way and you can actually think about databases.

Importing

Your goal is now to write a Python program that will read from my MariaDB database and populate your SQLite database accordingly. I will let you figure out how to do that. I'll only mention the following hints:

  1. Be on the lookout for impedance mismatches. The MariaDB database certainly contains some information you don't care about, and it is certainly missing some information that is present in your database tables. If you don't care about something, ignore it. If you are missing something, ignore it (leave it NULL in your SQLite tables).
  2. In particular, you will discover that although every flight plan uses a particular kind of aircraft, that information is present only implicitly through the trips table, not directly through the flightplans table. This in turn means that you will be missing aircraft type information for some of your flight plans — namely, those which have no scheduled flights yet for particular dates. In that case, your SQLite database can and should properly be ignorant of what type of plane is used for that flight plan. In cases where you can know this, however (which is most of the time), your database should have the aircraft type information imported into it.
  3. You will discover that the sales table — which records transactions for every customer that has bought (or canceled and received a refund for) any seat on any flight — has thousands of rows. It is not a good idea for your Python program to read that entire table. This is especially true because you don't need all the information in it: you don't care what the passengers' names are, for example, or when they bought (or canceled) their tickets: you only care how many tickets have been purchased for each flight. Think deeply about how you can extract all the information you need from that table without actually reading all thousands of rows. (Hint: use group by.)
  4. Note that some aircraft are very small (e.g., an X-wing fighter, which can only hold one (small, non-human) passenger plus the pilot) and hence do not have a co-pilot. Obviously, the your co-pilot column should be null for flights using such aircraft.
  5. You should infer pilot certifications from the ships that the various personnel are scheduled to fly. For example: if crew member Martha is the pilot or co-pilot for a United 789 flight on some day, and United 789 uses a Boeing 747 as its aircraft, you should properly deduce that Martha is certified on the Boeing 747, and mark your database as such.
  6. Finally, most rows of the sales are for ticket purchases, but a few are for cancellations where a customer got a refund for a ticket they had previously purchased. It goes without saying that if (say) 15 people bought a ticket for a particular flight, and 3 people returned their ticket for a refund, there will be a total of 12 passengers (not 15, and not 18) booked for that day's flight.

Turning it in

You will turn this assignment in by attaching a git bundle to an email. (A "git bundle" is essentially a portable, zipped-up git repo.) The subject line of the email should be "CPSC/DATA 350 Assignment #4 turn-in".

To do this, first make sure that all your code is committed to git. Running "git status" and ensuring your workspace is clean is a great way to do that. Then, bundle up your git repo:

$ git  bundle  create  yourUmwUsername.git  --all

(Please do not name it literally "yourUmwUsername.git" Substitute your actual UMW username. For instance, "jsmith7.git".)

Finally, send me the yourUmwUsername.git file as an email attachment with subject line "CPSC/DATA 350 Assignment #4 turn-in". (Double-check that it's actually attached!) If you're using the Cloud (which includes the cpsc department server) for development, then in order to get the file on your local machine to attach it to an email, you will have to download it from the Cloud.

To receive full credit, I must be able to type the following commands in sequence, verbatim, with no variations, to run your program:

$ git  clone  the-name-of-the-repo-you-sent-in-your-email.git  temporary
$ cd  temporary
$ python3  customer_service.py  yourUmwUsername_airline.db

If I have to do any extra fiddling because these three commands, verbatim, in sequence, did not work, it's coming out of your grade.

Hint: test that this works before submitting your assignment. You can test it by creating a little temporary practice directory:

$ mkdir  pretendImStephen
$ cp  the-name-of-your-repo.git  pretendImStephen
$ cd  pretendImStephen

and then typing those three commands, above, to make sure they have the desired effect. (You can then get rid of that directory with "cd .." followed by "rm -rf pretendImStephen".)

Getting help

Come to office hours, or send me email with subject line "CPSC 350 Assignment #4 help!!"