Saturday, October 6, 2012

Getting to know the Relatives: SQLite Databases

I was contacted by a colleague who needed some help analyzing a SQLite database.  It was the myspace.messaging.database#database located in the "\Users\\appdata\local\google\chrome\userdata\default\plugin data\google gears\messaging.myspace.com\http_80\" folder.  I didn't and still don't know a whole lot about this file, but it appears to contain myspace email messages.

The Challenges of SQLite

Let's face it: SQLite is everywhere.  Understanding it is essential to good examinations, and a big part of that understanding come from learning SQL statements. There are many good online sources for learning SQL, and one of my favorites is w3schools.com.

But, for digital forensics practitioners, there is another challenge beyond understanding SQL commands--understanding the construction and relationships of the tables.  SQLite is a relational database, and the tables are meant to be related to one another to produce a result no possible or impractical from a single table.  Knowing how the table was intended to be used can be very difficult... after all, a SQLite database is more akin to a file cabinet, not a secretary who uses the file cabinet.

For example, the secretary can place company bank records in a file called "Financial Records" or she can put them in a file called "Artichokes".  It really doesn't matter, because she knows what goes in the file.  Someone coming along behind her won't have much trouble finding the bank records in the Financial Records file, but might overlook them them entirely in the Artichokes file.  The point is, without the secretary, it might be very hard to understand the filing system.

SQLite databases can be a lot like that.  You can see the structure, or the schema as it is called, very easily.  But what is not so easily understood is how the structure is intended to be used.  That mystery is usually locked up in the application that utilizes the database, but it is not explained in the database itself.

Getting a Clue

To be sure, there can be hints about how tables in a database interrelate.  Table and field names often speak volumes.  A database called "AddressBook.db" with two tables called "Names" and "Addresses" that have a field in common called "SubjectID" isn't too hard to fathom.  If we are lucky enough to be able to run the application that uses the database and test our inferences based on the applications output, our confidence grows (and our understanding, if supported by the outcome, would now be considered reliable).

My favorite hints by far are SQL 'view' statements.  These are virtual tables that draw their data from other tables in the database (or an attached database).  By studying a view statement, you get insight from the database creator how the database was intended to be used... at least in one capacity.  Think of a view as a macro: it saves the database user the trouble of repeated typing a frequently used query.  And, if the query is frequently used, then you have a good sense of how the database was intended to be used.

What if There are No Clues?
What about circumstances in which there are no clues in the database to help us understand its use.  Well, if there are really no clues, then the only safe answer is we look at the data flat, that is to say, we look at the tables individually and we don't relate them in any way.  But, there are often less obvious clues than can reveal an underlying relationship... which brings me to the point of this article.

Latent Rows

Latent fingerprint examiners know the term "latent" means hidden or invisible.  Latent fingerprints must be revealed to be seen by some external method, such as fingerprint powder.  SQLite tables have a latent field, so to speak.  And, we can reveal it to help us form relations in a SQLite database.

Consider the myspace.messaging.database#database I mentioned in the open paragraph. It has the following schema:

CREATE VIRTUAL TABLE AuthorData USING fts2(AuthorDisplayName, AuthorUserName); 
CREATE TABLE AuthorData_content(c0AuthorDisplayName, c1AuthorUserName); 
CREATE TABLE AuthorData_segdir(  level integer,  idx integer,  start_block integer,  leaves_end_block integer,  end_block integer,  root blob,  primary key(level, idx)); 
CREATE TABLE AuthorData_segments(block blob); 
CREATE TABLE AuthorMetaData (AuthorId INTEGER PRIMARY KEY, AuthorImageUrl TEXT); 
CREATE VIRTUAL TABLE MessageData USING fts2(Subject, Body); 
CREATE TABLE MessageData_content(c0Subject, c1Body); 
CREATE TABLE MessageData_segdir(  level integer,  idx integer,  start_block integer,  leaves_end_block integer,  end_block integer,  root blob,  primary key(level, idx)); 
CREATE TABLE MessageData_segments(block blob); 
CREATE TABLE MessageMetaData (MessageId INTEGER PRIMARY KEY, RecipientId INTEGER, AuthorId INTEGER, Folder INTEGER, Status INTEGER, CreatedDate INTEGER); 
CREATE TABLE UserSettings (UserId INTEGER PRIMARY KEY, MachineId TEXT, Enabled INTEGER, TimeStamp INTEGER, LastSyncTimeStamp INTEGER, FirstRunIndexPass INTEGER, FirstRunIndexTargetCount INTEGER, OldestMessageId INTEGER, LastServerTotalCount INTEGER); 
CREATE INDEX AuthorIdIndex ON MessageMetaData (AuthorId, RecipientId); 
CREATE INDEX StatusIndex ON MessageMetaData (Status, CreatedDate);

Now look more closely at two tables of interest,MessageMetaData and MessageData_content:

CREATE TABLE MessageMetaData (MessageId INTEGER PRIMARY KEY, RecipientId INTEGER, AuthorId INTEGER, Folder INTEGER, Status INTEGER, CreatedDate INTEGER);
CREATE TABLE MessageData_content(c0Subject, c1Body)

It would seem from the table names that MessageMetaData contains information about the messages, and MessageData_content contains the messages themselves.  But, they don't share any fields that allow the two tables to be related. In other words, which rows of the metadata table correspond to which row of the content table?  Do they even correspond at all?

Let's look at our first hint or correspondence:

$ sqlite3 myspace.messaging.database#database.db 'select count(*) from MessageMetaData;'
1358 
$ sqlite3 myspace.messaging.database#database.db 'select count(*) from MessageData_content;'
1358

Both tables have the same number of records.  Hmm, a clue?  Quite likely, especially upon study of the table content and the remaining tables contents.  In fact conducting a similar study, we find another set of table correspondence: AuthorMetaData and AuthorData_content also have an equal number of records (172, to be exact) but no obvious, interrelated fields.

Unless you've studied SQLite construction in any depth, you probably don't know that it creates a 'rowid' field for every table to act as a primary key.  If a table is created with a defined primary key, that primary key is just a alias to the builtin rowid (with one exception outside the scope of this discussion).  But the rowid is not represented in the table or database schema, which is probably why you didn't know about it (at least, I didn't until I bought a SQLite book).

Knowing about the rowid, i can now check to see if the two tables have matching rowid fields:

$ sqlite3 myspace.messaging.database#database.db 'select count(*) from MessageMetaData m, MessageData_content c where m.rowid = c.rowid'
1358 

We don't have to trust the count function, take a look for yourself:

$ sqlite3 myspace.messaging.database#database.db 'select m.rowid, c.rowid from MessageMetaData m, MessageData_content c where m.rowid = c.rowid'
...
81407357|81407357
81416917|81416917
81504605|81504605
81505714|81505714
81530947|81530947
81569294|81569294

Well, now this is even more interesting.  We not only have two tables with the same number of rows, but we have two tables with fields in relation, i.e., rowid!  

Understand that rowid is simply an autoincrementing, unique, 64-bit integer unless specifically declared otherwise by insert and update commands.  But is this just a coincidence?  Let's consider: we have non-sequential rowids throughout both tables.  That might be explained by dropped rows from the tables.  But two tables, each with 1358 rows, and each row having a matching rowid in the other table?  That is more than coincidence--it's programatic.  The application populating the tables is assigning the rowids.

The Proof is in the Pudding

My assertion is that the myspace.messaging.database#database.db is assigning the rowids as it populates the related tables and links the rows by matching rowid.  Let me demonstrate how rowid can be assigned:

sqlite> create table numbers(digit integer);
sqlite> insert into numbers (digit) values(1);
sqlite> insert into numbers (digit) values(2);
sqlite> insert into numbers (digit) values(3);
sqlite> select rowid, digit from numbers;
1|1
2|2
3|3
4|3
sqlite> insert into numbers (rowid, digit) values (1000, 4);
sqlite> select rowid, digit from numbers;
1|1
2|2
3|3
4|3
1000|4

I created at table called "numbers" with one field called "digit."  I then inserted three rows in the table with the values 1, 2, and 3 respectively.    If you've been following along, you now know that every SQLite table also has a rowid field, even if not expressly created in the table by the user.  The first select statemnt shows the autogenerated rowid and along with the digits I inserted.

The final insert statement is different.  Here I assign the rowid, rather than let it be automatically populated by the SQLite engine.  And, as you an see in the final select statement, I succeed in setting an non-sequential rowid.

Putting it All Together

I've demonstrated a "hidden" way that tables in SQLite databases can be related.  It takes some knowledge in SQLite structure and the SQL query language to unveil this data, however.  If you are in the habit of relying on SQLite browsers and looking at tables without relating them, then you are really missing out on a wealth of data.

Again, let me illustrate using the myspace.messaging.database#database.  Lets look at one row in each of the tables I mentioned previously:

$ sqlite3 -header myspace.messaging.database#database.db 'select * from MessageMetaData limit 1;'
MessageId|RecipientId|AuthorId|Folder|Status|CreatedDate
1289081|544962655|41265701|0|2|1280870820000 

$ sqlite3 -header myspace.messaging.database#database.db 'select * from MessageData_content limit 1;'
c0Subject|c1Body
Hi|Hey, what's up? 

$ sqlite3 -header myspace.messaging.database#database.db 'select * from AuthorMetaData limit 1;'
AuthorId|AuthorImageUrl
-1930729470|http://some_url/img/some_image.png 

$ sqlite3 -header myspace.messaging.database#database.db 'select * from AuthorData_content limit 1;'
c0AuthorDisplayName|c1AuthorUserName
A User|auser

The only hint of relationship, besides table names, is the AuthorID field in MessageMetaData and AuthorMetaData.  But there is still no obvious way to tie the metadata to the content we are most interested in.  Your favorite GUI browser maybe make the display prettier, but its just as impotent.

But, now that you have knowledge of the rowid, and have a link to a tutorial on SQLite statements, you're not too far from being able to do this:

sqlite3 -header myspace.messaging.database#database.db 'select messageid, datetime(createddate/1000, "unixepoch", "localtime") as Date, mm.AuthorID, c0AuthorDisplayName as "Author Display Name", c1AuthorUserName as "Author Username", c0subject as Subject, c1Body as Body from  MessageMetaData mm, MessageData_content mc, AuthorData_Content ac, AuthorMetaData am where mm.AuthorID = am.AuthorID and am.rowid = ac.rowid and mm.rowid = mc.rowid limit 2;'
MessageId|Date|AuthorId|Author Display Name|Author Username|Subject|Body1289081|2010-08-03 14:27:00|41265701|A User|auser|Hi|Hey, what's up?

I ask you, on which output would you rather examine and report?

Addendum

That last query is really not so scary.  It's just long because we're grabbing seven fields from four tables, and converting a date stamp.  But, in reality, it's very straight forward.

Let's take a look:

select
     messageid,
     datetime(createddate/1000, "unixepoch", "localtime") as Date,
     mm.AuthorID,
     c0AuthorDisplayName as "Author Display Name",
     c1AuthorUserName as "Author Username",
     c0subject as Subject,
     c1Body as Body 
from 
     MessageMetaData mm,
     MessageData_content mc,
     AuthorMetaData am,
    AuthorData_Content ac 
where
     mm.AuthorID = am.AuthorID
     and am.rowid = ac.rowid
     and mm.rowid = mc.rowid;

The select clause simply picks the fields we want to display.  The datetime function converts the unixepoch time, which is recorded in milliseconds, to local time.  The 'as' statements are naming the columns something more user friendly and are not required.

The from statement simply declares what tables to query for the fields we are trying to display.  Each table is followed by an alias I chose to make easier reference to field names common to more than one table.  For example, AuthorID is found in both the MessageMetaData and AuthorMetaData tables.  By giving MessageMetaData the alias of mm, I can now reference the MessageMetaData.AuthorID field as mm.AuthorID.

The where statement is a filter.  It keeps the tables 'aligned,' so to speak.  It ensures that only the correct author content and message content is returned for each row.  This post is a lot long in the tooth, so I won't go into detail describing how it works.  But, very succinctly, the MessageMetaData record is matched to a AuthorMetaData record by AuthorID.  The the AuthorMetaData record is matched to its corresponding AuthorData_Content record by rowid.  Finally, the MessageMetaData record is matched to its corresponding MessageData_content, also by rowid.