Saturday, May 11, 2013

SQLite: Hidden Data in Plain Sight

Important

The title of this post is not intended to imply there are some new clever data hiding techniques for sqlite, but to alert the reader that data in plain view is often going unanalyzed or under-analyzed.

My experience in data forensics has taught me two things about SQLite databases: they are both ubiquitous and poorly understood by examiners. I find that most examiners rely on "viewers," such as SQLite Database Browser or automated tools that parse the SQLite for them. Both of these approaches can be very flawed, however.

The first practice—using viewers—is not faulty on its face. SQLite Database Browser and similar applications are a quick way to visualize the content of a database. But most examiners are using the viewers to open tables in a "flat" view. This is akin to looking at the table data in a spreadsheet: you see the content but not in relation to the data from other tables in the database (or from other databases for that matter) as the database was designed to be used. Further, the data is often not meaningful in the form in which it is stored, i.e., timestamps are often in some form of epoch time and not a human readable format.

The second practice of relying on automated tools means you are counting on a programmer (who may not be a forensic examiner) to tell you what’s important in the database, and you are relying on the coder’s SQLite skills, which might be lacking. (What follows is not a rant against a product, but just an illustration of my point) For example Cellebrite Physical Analyzer parses the iPhone sms.db (sms/iMessage) into a neat, well formed report. But it doesn’t include the rowID in the output, which is an auto-incrementing integer applied to each sent or received message. From the rowID, you can tell if intervening messages have been deleted, but you won’t know that from the automated report from Cellebrite. And what about when your automated tool doesn’t parse a database you have discovered?

Demanding Answers (Learning to Query)

The key to understanding SQLite is to learn the structure query language (SQL). One of the best on-line resources I have found for this is w3schools.com. The lessons are brief, and in less that a dozen 5-10 minute sessions, you will have the basics of SQL under your belt. And with the basics, you can accomplish much.

The query language is designed to be human readable, which, by the by, makes it easier to remember. It consists of sentences composed of subjects and predicates. For example, to view all the contents of a table:

SQLite Command Line Program (command line mode)
$ sqlite3 some.db 'select * from some_table;'
1|some_data
2|some_more_data

Using the above SQL statement, our subject is "select *" which is translated "select all fields" (or columns, if you will), and the predicate, "from some_table" is fairly self-explanatory. The table "some_table" is located in the "some.db" SQLite database. To be brief, in a SQL query, we tell the database engine what we want (subject) followed by qualifiers (predicate).

Finding SQLite Databases for Analysis

So, how do we overcome the shortcomings of the common SQLite Database analysis techniques? Let’s take a recent examination I conducted as an example. I was looking for communications recorded in an iPhone iOS v.6.1.3 backup. I had restored the backup to its original file structure (DOMAIN/path) to facilitate the analysis. After examining the sms.db, I discovered a large block of deleted messages (by RowID analysis) for the time frame that was the subject of the investigation. The messages were not recoverable (more on recovering deleted SQLite records another time).

I decided to take a look at other communications applications that might have been overlooked when the user was deleting data. Unfortunately, I don’t know all the different communications applications available for Apple mobile devices and likely never will. I can get a list of the installed applications on the Device from the Info.plist in the backup directory, but in reality, that doesn’t really help me too much because, as I said, I don’t recognize many of them. I do know, however, that most communications applications store their data in SQLite databases. So, I search for those:

BASH
$ find unback/ -type f -exec file {} \; | grep SQLite
unback/HomeDomain/Library/Voicemail/voicemail.db: SQLite 3.x database
unback/HomeDomain/Library/SMS/sms.db: SQLite 3.x database
unback/HomeDomain/Library/Safari/Bookmarks.db: SQLite 3.x database, user version
 31
unback/AppDomain-com.cardify.tinder/Documents/Tinder.sqlite: SQLite 3.x database
...
Note

I restored the iTunes backup to a directory called unback consistent with the method used by the open libimobiledevice software library, which offers a device backup/unback utility as well as others useful for iDevice analysis.

You can see that I use the find command to look for files (-type f) in the "unback" directory and the execute the file command to determine the file type. I piped the results through the grep command, filtering for "SQLite". With the command, I get a list of SQLite files, true, but I still don’t know what there all are. Some are familiar and/or obvious, but some are not. Take the last item on the abbreviated list above: Tinder by Cardify. I had never heard of Tinder (as well as many other applications that appeared in the results).

Getting a Peek Inside

It would be more informative to list the databases, and then get a look at the tables contained in each one. While table names don’t necessarily tell you the content, they can be informative while in data "triage" mode. So, how do we modify our find command to show us the databases as well as their tables?

BASH (incorporating SQLite Command Line Program)
$ find unback/ -type f | while read i; do file $i | grep -q SQLite; \
[ $? = 0 ] && (echo $i; sqlite3 $i .tables; echo); done
unback/HomeDomain/Library/Voicemail/voicemail.db
_SqliteDatabaseProperties  voicemail

unback/HomeDomain/Library/Safari/Bookmarks.db
bookmark_title_words  folder_ancestors      sync_properties
bookmarks             generations

unback/HomeDomain/Library/SMS/sms.db
_SqliteDatabaseProperties  chat_message_join
attachment                 handle
chat                       message
chat_handle_join           message_attachment_join

unback/AppDomain-com.cardify.tinder/Documents/Tinder.sqlite
ZLIKE             ZPHOTO            ZUSER             Z_METADATA
ZMESSAGE          ZPROCESSEDPHOTO   Z_5SHAREDFRIENDS  Z_PRIMARYKEY
...

Ok, that’s much more helpful. But how does the command work? Like the initial command, find is used to locate files (directories are excluded). The results of the file command are piped to a while loop, which assigns each file name to the variable i. Similar to the first command, the file command displays the file type which is filtered for "SQLite" by grep. The "-q" option in grep is used to keep grep silent; it is the exit status that is of interest.

I want the exit status, or alternatively: the return status or exit code, to perform a test. All commands, scripts and functions return an exit status, and an exit code of "0" means success. The exit status is captured in the variable ?, and recalled, like all BASH variables, by prepending with a dollar sign: $?. In the command, I test the exit status of the last command which was grep. If the regular expression "SQLite" is matched in the file command output, the grep exits with "0". The test "[ $? = 0 ]" is shorthand notation for "if the last command’s exit code is 0", then do what follows: (echo $i; sqlite3 $i .tables; echo), i.e., print the file name, print the tables of the database, and then print a blank line (for readability). I discussed while loops in a previous post if you have more interest, or you could look here.

Making Sense of the Data

From the output thus far, I see that Tinder has a message table, as well as user and shared friends tables. Looks like it is a social networking application, and the data might be relevant to the investigation. So, how do I get see the contents of the tables? I could look at one table at a time:

SQLite Command Line Program
$ sqlite3 -header Tinder.sqlite 'select * from zmessage limit 5;'
Z_PK|Z_ENT|Z_OPT|ZINBOUND|ZUSER|ZCREATIONDATE|ZBODY
1|2|1|1|832|379798036.741|Hi!
2|2|1|1|1156|379797384.794|hey!
3|2|1|1|832|379798729.794|What's doin?
4|2|1|1|1318|379804817.728|Are you online often?
5|2|1|0|1318|379806963.685|Wouldn't you like to know?!
...

In the command above, I used the SQLite "-header" option to display the column titles, and I limited the output to five records to get a sense of the data. Dropping the limit syntax would result in the entire table and all record fields being printed to standard output (the screen).

Is there a way we could quickly look at a few records of every table to see what is of interest, if anything? You bet!

SQLite Command Line Program
$ for i in $(sqlite3 Tinder.sqlite .tables); do echo Table: $i; \
sqlite3 Tinder.sqlite -header "select * from $i limit 5;"; echo; done
...
Table: ZUSER
Z_PK|Z_ENT|Z_OPT|ZCOMMONFRIENDCOUNT|ZCOMMONLIKECOUNT|ZGENDER|ZHASIMAGE|
ZHASUNVIEWEDMESSAGES|ZISACTIVE|ZISMATCH|ZISRECOMMENDED|ZISUNSEENNEWMATC
H|ZSERVERMESSAGECOUNT|ZBIRTHDATE|ZCHATLASTVIEWED|ZDISTANCEINMILES|ZLAST
ACTIVITYDATE|ZMATCHEDDATE|ZPINGTIME|ZBIO|ZFACEBOOKID|ZMATCHID|ZNAME|ZUS
ERID|ZIMAGE
42|5|126|8|0|0|0||0|1|1||0|||76.4215774536133|379794350.747|379794350.7
47|379794187.955|I like pie.|604832678|50f2fc2fbe8d00b3d4f58c36|Gunter|
50d39d6024571b7803001639|
90|5|1|0|0||0||0|0|0||0|||0.0|||||#########||Gretta||
91|5|1|0|0||0||0|0|0||0|||0.0|||||#########||Hilde||
92|5|1|0|0||0||0|0|0||0|||0.0|||||#########||Agnes||
93|5|1|0|0||0||0|0|0||0|||0.0|||||#########||Johanna||

Table: ZMESSAGE
Z_PK|Z_ENT|Z_OPT|ZINBOUND|ZUSER|ZCREATIONDATE|ZBODY
1|2|1|1|832|379798036.741|Hi!
2|2|1|1|1156|379797384.794|hey!
3|2|1|1|832|379798729.794|What's doin?
4|2|1|1|1318|379804817.728|Are you online often?
5|2|1|0|1318|379806963.685|Wouldn't you like to know?!

Table: Z_5SHAREDFRIENDS
Z_5SHAREDFRIENDS|REFLEXIVE
42|90
42|91
42|92
42|93
42|94

Briefly, the primary difference in the last command from those executed earlier is the use of a for loop. The for loop takes the ouput of the SQLite .tables command,

The output above demonstrates the relational nature of SQLite databases. Looking at the ZMESSAGE table, we see the message content, but the user is an integer (ZUSER field). The integer appears correlate to the ZUSER table (Z_PK field). Just looking at the ZMESSAGE table, we see the conversation but we don’t know with whom it occurred.

SQLite lets us query the tables in relation to make more meaningful output.

SQLite Command Line Program
$ sqlite3 Tinder.sqlite 'select m.z_pk, zinbound, zuser, zname, \
zcreationdate, zbody from zmessage as m, zuser as u where \
m.zuser = u.z_pk limit 5;'
Z_PK|ZINBOUND|ZUSER|ZNAME|ZCREATIONDATE|ZBODY
1|1|832|Tobias|379798036.741|Hi!
2|1|1156|Siegfried|379797384.794|hey!
3|1|832|Tobias|379798729.794|What's doin?
4|1|1318|Theoduff|379804817.728|Are you online often?
5|0|1318|Theoduff|379806963.685|Wouldn't you like to know?!

In this command, I specified the fields I wanted returned as opposed to all fields. This is necessary whenever you are relating tables to one another. You may have noticed that in the predicate I queried both the ZMESSAGE and ZUSER tables. The "as" statements create aliases to the tables (zmessage = m, zuser = u) to keep the command more concise. In the select statement, I asked for the ZNAME field, which is located in the ZUSER table where ZUSER from the ZMESSAGE table matched Z_PK from the ZUSER table. In the select statement, I had to specify the Z_PK (m.z_pk) field from the ZMESSAGE table because both tables contain that field.

Two more fields don’t have much meaning in our result: ZINBOUND and ZCREATIONDATE. ZINBOUND is a flag, that with some context, lead me to understand that 0 = sent and 1 = received. ZCREATIONDATE appears from its value to be Mac Absolute Time and file system timestamps support this evaluation. The case expression can be used to interpret the flags. It is the equivalent to an if/then statement in scripting languages. The datetime function converts the unix epoch to a human-readable date. Because the values in the Tinder database are Mac Absolute Time, the timestamps have to first be converted to unix epoch by adding 978307200 seconds.

SQLite Command Line Program
$ sqlite3 -header Tinder.sqlite 'select m.z_pk, case zinbound when 0 \
then "sent" when 1 then "received" else "unknown" end as zinbound, \
zuser, zname, datetime(zcreationdate + 978307200, "unixepoch", \
"localtime") as zcreationdate, zbody from zmessage as m, zuser as u \
where m.zuser = u.z_pk limit 5;'
Z_PK|ZINBOUND|ZUSER|ZCREATIONDATE|ZBODY
1|received|832|Tobias|2013-01-13 11:27:16|Hi!
2|received|1156|Siegfried|2013-01-13 11:16:24|hey!
3|received|832|Tobias|2013-01-13 11:38:49|What's doin?
4|received|1318|Theoduff|2013-01-13 13:20:17|Are you online often?
5|sent|1318|Theoduff|2013-01-13 13:56:03|Wouldn't you like to know?!

Now we have meaningful data by relating two tables, interpreting flags (case expression), and converting timestamps (datetime function). You can find more information about the case expression here, and the datetime function here.

Summing Up

I covered a lot of ground in this post, from using find, file, grep, and while and for loops to basic and intermediate SQLite Command Line Program usage. I left a lot of explanation out of the discussion, and I barely scratched the surface of SQLite analysis. My goal was to: - highlight the fact that automated tool and viewer users are likely leaving a lot of data on the table (pardon the pun) - show how command line tools can be used to rapidly locate and evalute SQLite databases - demonstrate that learning SQLite queries will go a long way to filling the gap left wide open by automated tools - encourage you to learn more about SQLite and improve your investigative skills.

I hope to start delving into more specific SQLite analysis topic in future posts.

Happy Querying!


2 comments:

  1. Thanks for posting this! You had me at "...they are both ubiquitous and poorly understood by examiners", because in all honesty, that applies to so many data structures! However, you take the time to go into this topic in detail...great stuff, and thanks for sharing!

    As a side note, I've recently been involved in a couple of discussions regarding tools, in general. My feeling is that tools provide a layer of abstraction over the data, and if you don't understand the underlying data structures, what point is there to running two or more tools?

    Too often I hear analysts state that they download every tool they see because they want to have another tool to validate their findings. What's not being considered is, in part, exactly what you stated regarding the SQLite Database Browser...it's not that it's a bad tool, but it doesn't show some data that is clearly of significant value to the examiner.

    My point is, if you don't understand the underlying data structures, then how can you accurately evaluate the output of a tool, and then use another tool to validate that output?

    ReplyDelete
    Replies
    1. Thank you for extending my thoughts. I tried to balance practicality with rant, leaning toward practicality... this time.

      Delete

Time Perspective

Telling time in forensic computing can be complicated. User interfaces hide the complexity, usually displaying time stamps in a human reada...