I recently had the opportunity to help a colleague with an iPhone database that was not supported by his automated tools. The application was the TextMe application, and predictably, the texting app stored its chat in a SQLite database. What made the database interesting was the fact that there was no immediately obvious way to identify to whom a message was sent.
Let me illustrate: A quick scan of the database reveals some of the capabilities of the application: texting (ZMESSAGE table) and calling (ZCALL, ZVOICEMAIL)
SQLite Command Line Interface
$ sqlite3 TextMe2.sqlite .tables
ZATTACHMENT ZDISCUSSION ZMESSAGEHEIGHT Z_METADATA
ZCALL ZFAVORITE ZVOICEMAIL Z_PRIMARYKEY
ZCONTACT ZMESSAGE Z_3DISCUSSIONS
$
The subject of this investigation was the text messages, so I needed to see how the table was constructed.
SQLite Command Line Interface
$ sqlite3 TextMe2.sqlite ".schema zmessage"
CREATE TABLE ZMESSAGE ( Z_PK INTEGER PRIMARY KEY, Z_ENT INTEGER,
Z_OPT INTEGER, ZSTATUS INTEGER, ZCALL INTEGER, ZDISCUSSION INTEGER,
ZHEIGHT INTEGER, ZSENDER INTEGER, Z3_SENDER INTEGER, ZTIMESTAMP
TIMESTAMP, ZBODY VARCHAR, ZGUID VARCHAR, ZLOCATION VARCHAR, ZREMOTEID
VARCHAR );
...
$
The CREATE TABLE statement shows us there are 14 fields in the table, and a majority are integers: Z_PK, Z_ENT, Z_OPT, ZSTATUS, ZCALL, ZDISCUSSION, ZHEIGHT, ZSENDER, Z3_SENDER, ZTIMESTAMP, ZBODY, ZGUID, ZLOCATION, ZREMOTEID. But, don’t fall into the trap that the declared type of each column (e.g. INTEGER, TIMESTAMP, VARCHAR)actually constrain the data to those types, they don’t. Treat the type-name as informative only, but verify the data before forming any conclusions.
Inspecting the columns, we see some of obvious value in a forensic examination:
Others might be grabbing your attention, but I’m going to keep this discussion focused on these columns. Some sample data is in order:
SQLite Command Line Interface
$ sqlite3 -header TextMe2.sqlite "select z_pk, zstatus, zsender,
ztimestamp, zbody from zmessage limit 5;"
Z_PK|ZSTATUS|ZSENDER|ZTIMESTAMP|ZBODY
2|2|2|386362581|Hi!
3|4|10|386362603|hey, what are you doing?
4|3|2|386362630|I'm checking out this new app
5|3|2|386362634|It might be a challenge to decode
6|3|2|386362644|But I'll figure it out...
We see we have a couple of interpretation issues, here: the status is an integer that needs to be interpreted, and is the sender. The date is some form of epoch time, and my eyes tell me its likely Mac Absolute Time. I expect to find the interpretations in other tables in the database. But what jumps off the screen at me is that there is no obvious answer the following question: To whom is the sender sending the message? The time stamp gives us a sequence of messages, but how do we know that sender "2" is sending messages to sender "10"? Couldn’t sender "2" be sending his message to, say, sender "5" and in the midst, receives a message from sender "10"? Absolutely!
So, how to we rectify this issue? Well, I sort of mischievously left off the zdiscussion column in my query. I did this to steer the conversation and simulate what can happen when an investigator encounters a new database for the first time: overlook an important column. If we include the column, we see something interesting:
SQLite Command Line Interface
$ sqlite3 -header TextMe2.sqlite "select z_pk, zstatus, zdiscussion,
zsender, ztimestamp, zbody from zmessage limit 5;"
Z_PK|ZSTATUS|ZDISCUSSION|ZSENDER|ZTIMESTAMP|ZBODY
2|2|2|2|386362581|Hi!
3|4|2|10|386362603|hey, what are you doing?
4|3|2|2|386362630|I'm checking out this new app
5|3|2|2|386362634|It might be a challenge to decode
6|3|2|2|386362644|But I'll figure it out...
Now we see that the conversation is all part of the same discussion. And if we studied the whole database, we’d see example of where more than one conversation was occurring a the same time, and by sorting on the discussion field, we make sense of those conversations. But date stamp alone does not clue us in.
This might not seem like a big deal, but most messaging databases I have encountered have the remote party in the message record for both sent and received messages. This works well and leads to easy interpretation, e.g., "sent to Joe" and "received from Joe". But this database without properly understanding the discussion column, is the equivalent of "sent by Joe" and "sent by Jane", leading to the question "to whom?"
Rather than breakdown the rest of the analysis, I’m just going to share the query I used to complete the analysis:
select
m.z_pk as "Key",
datetime(ztimestamp + 978307200, 'unixepoch', 'localtime') as "Date",
case
when m.z_ent then (select z_name from z_primarykey natural join zmessage)
else "Unknown(" || m.z_ent || ")" end
as "Type",
case zstatus
when 2 then "Sent"
when 3 then "Delivered"
when 4 then "Received"
else "Unknown(" || zstatus || ")" end
as "Status",
zdiscussion as "DiscussionID",
zusername as "Contact",
zbody as "Message"
from zmessage as m, zcontact as c
on m.zsender = c.z_pk
order by discussionid asc, date asc;
By way of brief description:
-
the AS statements, such as that seen in m.z_pk as "Key", create aliases, effectively renaming the columns (or in the FROM statement, the tables) to make the output more informative.
-
The first CASE statement queries the z_primarykey to interpret the Z_ENT integer into its textual value.
-
The second CASE statement interprets the ZSTATUS flag into its English equivalent, which was not stored in the database, but determined by studying the application user-interface. This begs the question, then where does the application get the textual interpretation? Probably within its code.
-
The FROM statement queries two tables, zmessage and zcontact, to interpret the ZSENDER integer. But wait, you say, there is no ZSENDER in the select statement! (see next bullet point…)
-
The ON statement is the SQLite equivalent of an SQL INNER JOIN, which is an intersection of the two tables, i.e., it select rows from both tables where the columns match. In this case, the columns to be matched are ZSENDER from the ZMESSAGE table and Z_PK from the ZCONTACT table. The effect is that the SELECT statement knows which ZUSERNAME from ZCONTACT to return based on the ZSENDER value in the ZMESSAGE table.
-
The ORDER BY statement sorts the output first by the ZDISCUSSION column, then by the ZDATESTAMP, both in ascending order. Note that the column alias names are used.
I hope this gives some insight into the workings of TextMe2.sqlite databases and helps you in the future with the analysis of never-before-seen SQLite databases.