Lab 05a: Persistence

The questions below are due on Thursday March 08, 2018; 09:55:00 PM.


Partners: You have not yet been assigned a partner for this lab.
You are not logged in.

If you are a current student, please Log In for full access to the web site.
Note that this link will take you to an external site (https://oidc.mit.edu) to authenticate, and then you will be redirected back to this page.
A Python Error Occurred:

Error on line 8 of Python tag (line 9 of source):
    kerberos = cs_user_info['username']

KeyError: 'username'

Music for this Lab

 

Goals:

Today you'll (maybe) eventually be working in a team of teams (two teams, so four to five people most likely). We'll get some experience with database operations so as to add persistent storage to our server-side scripting capabilities, allowing us to create stateful web experiences. We'll bring in both of the pieces we developed last week to have a system that reports where you're at, which of your teammates are close by to you, and how many classmates are in the same region of campus as you.

1) Persistent Storage

Our capabilities with scripts that we can run on the server are currently limited to code that operates based solely on user-provided inputs, and not on history. By this we mean every time you visit your final code from Lab 04B and provide the same GPS coordinates, you'll get the exact same response. We have no means of remembering previous queries, or saving posted data. In effect, our system is stateless. We can fix this, however, by providing persistent storage in the form of a database.

There are many different types of databases that exist, and while we do not mean to trivialize them in saying this, they all do pretty much the same thing, which is to provide a means to store and access information efficiently1. In 6.08 we'll use SQLite2 which is a database management system. In order to interact with SQLite, we'll use SQL, which is a language used for managing databases. We will interact with SQLite and generate SQL commands in Python3 using the sqlite3 library which provides a convenient interface.

With SQLite, databases are files with the .db file extension. Within the database are tables, and within those tables are rows of data that we append to, extract from, and do whatever we want with. These files are automatically saved to the actual hard drive so if you have a Python script that messes with them in some way, and you run that Python script, after the script is finished, the changes made will live on into the future in that database, far beyond the lifespan of that Python file's runtime.

2) SQLite

SQLite has a pretty easy-to-use Python interface once you get the basics of it. What's nice is that it is very modular so you can create databases and move/copy them as needed. Let's make a database. You'll want to start a Python process on your local computer. In general our order of operations will be

  1. import sqlite3
  2. specify/create a database
  3. establish a connection to the database
  4. establish a cursor to the database (how we send it SQL commands)
  5. do stuff with the database (via the cursor)
  6. properly close connection to the database

In line with this approach, the snippet of code below will create a database called example.db and create a table within it. Create a file called db_create.py, place the code below in it, and run it:

import sqlite3
example_db = "example.db" # just come up with name of database
conn = sqlite3.connect(example_db)  # connect to that database (will create if it doesn't already exist)
c = conn.cursor()  # make cursor into database (allows us to execute commands)
c.execute('''CREATE TABLE test_table (user text,favorite_number int);''') # run a CREATE TABLE command
conn.commit() # commit commands
conn.close() # close connection to database

Now in your terminal locally run (in the same directory where your Python process was) ls. You should see a file in the list called example.db. What we've done is create a database that includes one table in it. We made this table by using a SQL command. Specifically, the command used was the following:

CREATE TABLE test_table (user text,favorite_number int);

which says: "create a table called 'test_table', and have it contain two columns, one which contains a text entry called 'user' and one which contains an integer entry called 'favorite_number'". After execution and commitment, the structure of our database looks like the following:

The state of our database after creation. It is totally fine to think of database tables as Excel/Open-Office spreadsheets.

If you happen to run the same program again you'll get:

Traceback (most recent call last):
  File "builder.py", line 5, in <module>
    c.execute('''CREATE TABLE test_table (user text,favorite_number int);''')
sqlite3.OperationalError: table test_table already exists

And this is because when the creation script db_create.py runs the second time, it clashes with the fact that a table already exists in that database called test_table. You cannot have identically named tables in the same database.

If you're wondering where the syntax/grammar of the SQL command we used came from, it is basic SQL, which has numerous tutorials online like this one. Let's go and do more stuff with SQL in our database.

2.1) Inserting

OK this database is pretty useless until we have some actual information in it. Let's add some stuff to the database then! Make another Python script called db_insert.py or something similar. Copy-paste the lines of code below into that file, save it, and run it.

import sqlite3
example_db = "example.db"
conn = sqlite3.connect(example_db)
c = conn.cursor()
c.execute('''INSERT into test_table VALUES ('joe',5);''')
conn.commit()
conn.close()

When it runs you'll see nothing happen probably...and that's good. In looking at the code, you should see a very similar structure to the first one, except the SQL command we run is the INSERT operation. In particular:

INSERT into test_table VALUES ('joe',5);

which says: "Insert into test_table, the values (in order with the table structure) of 'joe', then 5". This will result in the following new state to our database:

The state of our database after a single INSERT.

2.2) Selecting

It would now be great to see what is in the database, and/or have the ability to extract information from it. For this we will need the SELECT operation. So, let's create a python script called db_lookup.py. Copy-paste the code below into that file, save it, and run it.

import sqlite3
example_db = "example.db"
conn = sqlite3.connect(example_db)
c = conn.cursor()
things = c.execute('''SELECT * FROM test_table;''').fetchall()
for row in things:
    print(row)
conn.commit()
conn.close()

When you run it you should see:

('joe', 5)

And this should make sense, since what we've done is run the SQL command:

SELECT * FROM test_table;

which says: "Select everything from test_table." (The * symbol is the wildcard, meaning 'everything') By default in SQLite, execution of a SELECT query will return a SQLite iterator, and you can use either the fetchone() or fetchall() methods to get the first returned value from the query or all returned values in a Python list, respectively, allowing us the ability to manipulate them as needed. Alternatively, we can use the iterator directly in a for loop, just like we do for the range() operator—if you just need to iterate over all the elements!

If you just want to see the favorite number of the users in your database, instead of using * you could specify the columns you want to see such as:

SELECT favorite_number FROM test_table;

2.3) More Inserting

When we inserted data into our database above, we had a pre-made (hard-coded) Python string that we used for our SQL command. In particular we did c.execute('''INSERT into test_table VALUES ('joe',5);''') which runs the SQL command INSERT into test_table VALUES ('joe',5);. What if we'd instead had some values we generated programatically that we wanted to be inserted into the database? For example, if we had a Python variable user and another variable favorite_of_user. How can we incorporate them into the SQL string we need to generate?

Well as we know in Python we could create a string from other datatypes by casting and some concatenation action. For example we could do this (the old-school way):

sql_command = '''INSERT into test_table VALUES ('%s',%d);''' %(user,favorite_of_user)
c.execute(sql_command)

or if we do it the newer way in Python way like this:

sql_command = '''INSERT into test_table VALUES ('{}',{});'''.format(user,favorite_of_user)
c.execute(sql_command)

While not apparent or a threat when running these databases on a local machine, this approach to building SQL commands can be a huge security flaw when our database is living on a server and the Python variables we're using to construct our SQL command are coming from user-specified values from GET or POST values. If a visitor to your site finds a way to manipulate the user variable to itself be a SQL command they can cause bad things to happen such as damage or destruction of your database. This is bad and is known as a SQL-Injection Attack and is the technical angle behind the comic below. Now you can say, "I get it."

Sanitizing our inputs will protect our database from malicious SQL-injection attacks. From xkcd.

In order to protect ourselves against this, inputs coming from "the outside" (anything you do not have complete control over such as user-specified inputs) should be sanitized and this can be done using the sqlite3 functionality shown below where insertion points of outside values are demarcated by question marks, and a following tuple provides the values to be inserted (in order). The SQLite Python library checks these values to make sure nothing SQL-like is in them, thus preventing injection:

c.execute('''INSERT into test_table VALUES (?,?);''',(user,favorite_of_user))

You can use SQL command syntax for all SQL queries we need to generate (creation of tables, selection of rows, etc...)

The trailing tuple above is a tuple, so if you have only one value to insert, you still need to make sure your tuple is of the form (value,) (yes that is a trailing comma there), otherwise you'll get an error. Please pay attention for this! I've wasted hours on this before and it is the least satisfying of bugs to figure out.

2.4) Moving On

OK, I'm getting restless. Let's insert some more things into our database. Make a fourth Python script called, lotsa_inserts.py, copy-paste the code below into it, and run it. What this script does is create 100,000 random usernames and favorite numbers and inserts them one-after-the-other into the database:

import sqlite3
import random
import string
example_db = "example.db"
conn = sqlite3.connect(example_db)
c = conn.cursor()
for x in range(100000):
    number = random.randint(0,1000000) # generate random number from 0 to 1000000
    user = ''.join(random.choice(string.ascii_letters + string.digits) for i in range(10)) # make random user name
    c.execute('''INSERT into test_table VALUES (?,?);''',(user,number))
conn.commit()
conn.close()

If you now go back and rerun db_lookup.py, you'll get waaay more things that print because there are now waaay more entries in this database.

The state of our database after inserting 100,000 fictitious friends and their favorite numbers.

What if we wanted to return the user/favorite number pair with the largest favorite number in the database? One solution would be to build on the script we had, and get the entire database in a Python list, and then write some Python script for us to return the max. This can be ok for small databases, but as your database grows, having Python sort 500,000 entries or something can start to get pretty slow, and this is where more refined SELECTs and sortings can be done on the database side3

The first thing we can do is modify our SQL query with a sort command, ORDER BY. In order to return the entire database in ascending order based on the value of favorite_number you'd run the following command:

SELECT * FROM test_table ORDER BY favorite_number ASC;

If you modify your db_lookup.py to instead have that command instead of the one we originally had in place, when you run it, you'll now see the entries in the database fly by in ascending order. If we want in descending order we could do this SQL command (try it to watch the values fly by in descending order)

SELECT * FROM test_table ORDER BY favorite_number DESC;

So far we've been running a query, generating a Python list from that query and then printing them all. If we wanted to print just the top entry, we could instead of a for loop, just do print(things[0]), but even then this can be slow because we're generating a really large Python list for no real reason. If we instead use the fetchone() method on an ordered response, we can get just a single value, and this can have significant speedup benefits. Consider the code below. If you run it on your currently existing database you should see a speedup of approximately three-fold between the two approaches. Understand why.

import sqlite3
import time
example_db = "example.db"
conn = sqlite3.connect(example_db)
c = conn.cursor()

start = time.time()
things = c.execute('''SELECT * FROM test_table ORDER BY favorite_number DESC;''').fetchall()
print(things[0])
print(time.time()-start)

start = time.time()
thing = c.execute('''SELECT * FROM test_table ORDER BY favorite_number DESC;''').fetchone()
print(thing)
print(time.time()-start)

conn.commit()
conn.close()

You can also refine what is returned (and gain efficiency benefits) within the SQL query itself. For example the following SQL command is identical to the other two above, but limits what is done using SQL rather than the sqlite3 interface (and will give a speedup of about 20% or so on my machine with our current database as compared to our fast one above).

SELECT * FROM test_table ORDER BY favorite_number DESC LIMIT 1;

More details on the syntax of ORDER BY can be found here.

2.5) WHERE

We can add further or alternative sorting mechanisms using the WHERE keyword. For example, if we wanted a list of users that had favorite numbers only between 1132 and 1185 (to identify weird people since only weird people have favorite numbers in that range, amirite) we could do the following SQL command (try this out with the db_lookup.py script. You should see a much smaller portion of your database print out)

SELECT * FROM test_table WHERE favorite_number BETWEEN 1132 AND 1185;

This command says, select all full rows in test_table where favorite_number is between 1132 and 1185.

We can nest our specifications as needed as well. The operation above will return a Python list of the entries that meet that condition, but what if we want that in order? We could do this by appending the ORDER BY operator from earlier, like below:

SELECT * FROM test_table WHERE favorite_number BETWEEN 1132 AND 1185 ORDER BY favorite_number DESC;

More details on syntax of WHERE can be found here.

2.6) Timestamps

Another really useful type which we can use with our databases are timestamps. SQLite sort of has undocumented support for timestamps, but when we use SQLite in conjunction with Python and its datetime library, we're able to remember and pull elements from arrays based on times (possibly of when that row was inserted or when that row was modified). We'll do that using a new type of entry, the timestamp. Consider the following code which creates a new database time_example.db, with a single table in it called dated_table that has three entries in it: a text field called "user", an int field called "favorite_number", and a timestamp field called "timing".

import sqlite3
import datetime
example_db = "time_example.db" # just come up with name of database
conn = sqlite3.connect(example_db)  # connect to that database (will create if it doesn't already exist)
c = conn.cursor()  # make cursor into database (allows us to execute commands)
c.execute('''CREATE TABLE dated_table (user text,favorite_number int, timing timestamp);''') # run a CREATE TABLE command
conn.commit() # commit commands
conn.close() # close connection to database

Try Now:

What is the SQL query in the above code snippet?

We'll use this new database like the old one, but instead here, we'll time-stamp entries as they are made using the datetime library, which is documented here.

We'd now like to populate that database with a few (5) ficticious users and their favorite numbers. We'll space when we enter these into the database by two seconds.

import sqlite3
import datetime
import string
import random
import time
example_db = "time_example.db" # name of database from above
conn = sqlite3.connect(example_db)  # connect to that database
c = conn.cursor()  # make cursor into database (allows us to execute commands)
for x in range(5):
    number = random.randint(0,1000000) # generate random number from 0 to 1000000
    user = ''.join(random.choice(string.ascii_letters + string.digits) for i in range(10)) # make random user name
    c.execute('''INSERT into dated_table VALUES (?,?,?);''',(user,number,datetime.datetime.now())) #with time
    print("inserting {}".format(user)) #notificatin message
    time.sleep(2) #pause for 2 seconds before next insert
conn.commit() # commit commands
conn.close() # close connection to database

Try Now:

What is the SQL query in the above code snippet?

Just like the database before, we could then go and access and pull things out as needed:

import sqlite3
import datetime
example_db = "time_example.db" # name of database from above
conn = sqlite3.connect(example_db)  # connect to that database
c = conn.cursor()  # make cursor into database (allows us to execute commands)
things = c.execute('''SELECT * FROM dated_table;''').fetchall()
for x in things:
    print(x)
conn.commit() # commit commands
conn.close() # close connection to database

But where this gets cool is we can start to sort off of timestamps. Below we create a datetime object that expresses "15 minutes ago" and then use that in combination with our SELECT query to get items from the last 15 minutes.

import sqlite3
import datetime
example_db = "time_example.db" # name of database from above
conn = sqlite3.connect(example_db)  # connect to that database
c = conn.cursor()  # make cursor into database (allows us to execute commands)
fifteen_minutes_ago = datetime.datetime.now()- datetime.timedelta(minutes = 15) # create time for fifteen minutes ago!
things = c.execute('''SELECT * FROM dated_table WHERE timing > ? ORDER BY timing DESC;''',(fifteen_minutes_ago,)).fetchall()
for x in things:
    print(x)
conn.commit() # commit commands
conn.close() # close connection to database

Try Now:

What is the SQL query in the above code snippet?

Further documentation on the Python datetime library can be found here.

Checkoff 1:
Discuss SQLite and SQL with a staff member.

2.7) On the Server

A benefit of SQLite is that its databases are separate files4, but this will allow us to create databases locally and then move them onto the server for use (and even pull them back down for local analysis!). So for example, let's say you'd like your favorite number timestamp database to live on the server. You can transfer time_example.db onto the server at a location such as lab05a/ using sftp (with reference to your home directory on the server) and then use the __HOME__ keyword to reference its absolute location for calling it. For example the script below when placed into your directory along with an appropriately placed database will allow us to access that database from an HTTP GET (or POST). In the particular example below, when we visit the script in the web browser, it will return all entries from the last 15 minutes, from the database, in order.

NOTE your server login password is

A Python Error Occurred:

Error on line 1 of Python tag (line 356 of source):
    print('%s' % (cs_user_logos[kerberos],))

NameError: name 'kerberos' is not defined

import sqlite3
import datetime
visits_db = '__HOME__/lab05a/time_example.db'

def request_handler(request):
    conn = sqlite3.connect(visits_db)  #connect to that database (will create if it doesn't already exist)
    c = conn.cursor()  #make cursor into database (allows us to execute commands)
    fifteen_minutes_ago = datetime.datetime.now()- datetime.timedelta(minutes = 15) #create time for fifteen minutes ago!
    things = c.execute('''SELECT * FROM dated_table WHERE timing > ? ORDER BY timing DESC;''',(fifteen_minutes_ago,)).fetchall()
    outs = ""
    for x in things:
        outs+=str(x)+"\n"
    conn.commit() #commit commands
    conn.close() #close connection to database
    return outs

You may get a permission denied error when trying to access your database. Assuming you have placed it in the correct location, this is most likely a permission error. To fix while in sftp, run chmod 666 database_name where "database_name" is the database of interest here.

You can also create databases on the server automatically. Consider the code below. In it, the request_handler tries to create the database if it doesn't exist, and if it does, it adds a new entry to the database and returns what is currently in it. This can be accomplished with a try-except structure like below. If you place this code in your home directory on the server and then visit it in the browser, the first time you get it you'll receive a "database constructed" message, and every time after that you'll receive an ever-growing list of entries! This is awesome...we now have a means of remembering stuff.

import sqlite3
import datetime
visits_db = '__HOME__/time_example2.db'

def request_handler(request):
    conn = sqlite3.connect(visits_db)  # connect to that database (will create if it doesn't already exist)
    c = conn.cursor()  # make cursor into database (allows us to execute commands)
    outs = ""
    try:
        c.execute('''CREATE TABLE dated_table (user text,favorite_number int, timing timestamp);''') # run a CREATE TABLE command
        outs = "database constructed"
    except:
        fifteen_minutes_ago = datetime.datetime.now()- datetime.timedelta(minutes = 15) # create time for fifteen minutes ago!
        c.execute('''INSERT into dated_table VALUES (?,?,?);''', ('joe','5',datetime.datetime.now()))
        things = c.execute('''SELECT * FROM dated_table WHERE timing > ? ORDER BY timing ASC;''',(fifteen_minutes_ago,)).fetchall()
        outs = "Things:\n"
        for x in things:
            outs+=str(x)+"\n"
    conn.commit() # commit commands
    conn.close() # close connection to database
    return outs

It is important to realize that the code is no longer stateless. Our input to it, a GET request with no arguments is always the same every time we visit it in the browser or refresh the page, but the response it provides is different each time (since it is using the database to store past information)

3) Deliverable

What we want for today is a two-part system. First we want you to update your server-side script from Lab 04B so that it accepts incoming POSTs of current GPS coordinates (in addition to GETs) from your team's labkits, and based on that POST creates a database entry of:

  • the user name
  • the location (in lat and lon)
  • the location on MIT's campus
  • the timestamp of when the position posting happened.

In response to the POST, the script should return, the list of users that have reported their location in the same campus-spot within the last 2 minutes. This system should be designed to work with a number of clients like shown below:

A central script that will report the users that have been in your current location in the last 30 seconds.

You will mostly be working on the server for this assignment. Work together with your partner to implement the above server-side system, and modify your individual labkits' code to POST to your shared server-side code. (Pick one person's URL, and all post to that URL.)

You'll notice that we have not given you code to go onto your ESP32. That's because debugging things like POSTs and GETs can be miserable on the ESP32, as I'm sure many of you have encountered. There are tools that God5 made to make testing and debugging GET and POST requests far easier.

If you haven't already, you will become very annoyed with debugging web request handling if you do not have a reliable means of generating GET and POST requests on the fly. Two different pieces of software we'd recommend:

  • Hurl.it: Bare-bones request generator. Make POST and GET requests as needed and analyze their responses. This has the lowest energy-barrier-to-entry.
  • Postman: A more professional stand-alone application, allows saving and more detailed creation and analysis of requests (you need to sign up, but it is free...in terms of money, anyways, I think you pay for it with your soul a bit). I like Postman a lot. If identifying street signs is not your thing, then this might be a piece of software to invest some time in.

Pick one of these two programs (or another if you know of it) and use it as your primary POST-generating tool for writing this script.

This is probably the first time you've dealt with receiving a POST on the server. Start with a simple script that reports back the body of the request dictionary so that you can get an idea of what a POST produces in that dictionary. For example the following server script is a great way to start to just see what you're dealing with.

def request_handler(request):
    return str(request)

Checkoff 2:
Show your working team code to a staff member. It must exhibit all new POST functionality while also demonstrating the original GET functionality of Lab 04B.

To test your working system with your labkits, we have included some working files you can drop onto your device. Specifically there are two files:

  • Copy the TinyGPSPlus library into your Documents/Arduino/libraries folder like you've done for other libraries and make sure to restart Arduino completely.
  • Run the position_poster.ino file on your labkit. You will need to update this code to POST an appropriately-formatted json or form-encoded body to your Python script. The code will POST to the URL of your script (upon a button push) the following information: username, latitude and longitude. You must make sure to specify the proper URL otherwise it will be POSTing into the ether. The code then displays what is returned to it on the OLED.

Checkoff 3:
Show your complete working system, as a team. We want to see people appearing and disappearing over time.


 
Footnotes

1 There are numerous pros and cons to different databases. Some people will go to war for MongoDB or something else, and we're not passing judgement here. (click to return to text)

2 SQLite is pronounced either "S-Q-L-lite" or "sequelite", whichever you prefer. (click to return to text)

3 the reason performing these sorts on the database side can be faster is that they'll be performed in the native C that the database is written in rather than using the bulkier higher-level structures of Python, which, while they can make our handling of data easier, do add significant bulk to calculations and operations. (click to return to text)

4 this can also be a problem as well... (click to return to text)

5 Javascript developers (click to return to text)