Thursday, October 4, 2012

Who's Texting? The iOS6 sms.db

It didn't take long from the release of iOS6 on 09/14/12 until I got my first iOS6 device to process: 18 days.  In this case, it was an iPhone4 with iOS6.  In so short a time, there are not too many commercial or open source tools out there prepared to analyze the new OS or its new data formats.

Fortunately for me, the phone was not locked.  To process the phone, in which the prime interest was text messages and contacts, I used the latest iTunes to create a backup of the phone.  Fortune was again on my side: the phone was not set for encrypted backups.

iTunes Backup

If you are not familiar with iTunes backups, then allow me to introduce them generally.  iTunes copies user data (media, databases, settings, applications, etc.) to a directory in the operating system.  The location of the directory varies by operating system and version.  Since this is primarily a discussion about the new iOS6 sms.db, I'll let you find the location yourself.  What's most important to know is this:
  1. the backups are flat:  all the files are dumped into one directory and are not preserved in their original tree structure.
  2. the files are renamed: the file names are sha1 values calculated from the domain, path, and filename of the original file.
The nature of hash digest is that they produce unique, fixed length values based on the data present in the data structure being analyzed.  This means that a 1 byte file and a 200gb data stream both produce a 40-byte sha1 hash value.  This also means that the data on which the hash value was calculated cannot be reverse engineered from the value alone.  Why did I just state all this?  Certainly not to ignite another hash controversy by anything I might just misstated, but instead to point out that we can't really know the original file name, path, and domain from the sha1 value file names found in the backup.  Sure, you'll find postings that tell you the sms.db is renamed "3d0d7e5fb2ce288813306e4d4636395e047a3d28", but you won't find a file name for every file in your backup.  And you won't know any metadata about any of the backup files from the hash value, either.

Now, Linux helps us here.  On the command line, the file command will tell us the file type of each of the backup files, and most Linux file managers will render thumbnails for known file types.  This means its quite easy to view media files and identify databases just by opening a file browser pointed at the backup directory.  Its good for low hanging fruit, but how do you differentiate between the databases, for example?  Keyword search on a table name, maybe?  Sure, that might narrow the field, but it's a less than perfect solution.

Identifying Files in the Backup

Keeping the discussion general, know one more thing about iTunes backups before we proceed with the sms.db discussion: there are database files included in the backup that identify the files by domain, path, and file name.  In fact, the file metadata such a MAC times, ownership, permissions, and file size are included in the databases.

Now, the databases have changed with each iOS evolution--at least since iOS3 when I started examining iTunes backups.  Well, more accurately, I should they changed with each evolution except iOS6.  iOS6 retains the format introduced in iOS5.  That was again fortunate, because it just so happens that I coded a tool, based on a python script posted on stackoverflow posted by user galloglass, to 'unback' the iTunes backup based on the information in the backup databases.  The iOS5-unback.py tool works just fine on the iOS6 backup.

The sms.db

Where the backup format did not change in iOS6, the sms.db changed markedly.  Compare, if you will, the database schema side by side with an iOS5 sms.db.


iOS6 iOS5
Number of tables: 10 8
Table Names: SqliteDatabaseProperties, message, sqlite_sequence, chat, attachment, handle, message_attachment_join, chat_handle_join, chat_message_join, sqlite_stat1 _SqliteDatabaseProperties, message, sqlite_sequence, msg_group, group_member, msg_pieces, madrid_attachment, madrid_chat
Number of triggers: 9 7
Trigger Names: set_message_has_attachments, update_message_roomname_cache_insert, delete_attachment_files, clean_orphaned_attachments, clean_orphaned_handles, clear_message_has_attachments, clean_orphaned_messages, update_message_roomname_cache_delete, clean_orphaned_handles2 insert_unread_message, mark_message_unread, mark_message_read, delete_message, insert_newest_message, delete_newest_message, delete_pieces
Number of indexes: 16 18
Index Names: sqlite_autoindex__SqliteDatabaseProperties_1, sqlite_autoindex_message_1, sqlite_autoindex_chat_1, sqlite_autoindex_attachment_1, sqlite_autoindex_handle_1, sqlite_autoindex_handle_2, sqlite_autoindex_message_attachment_join_1, sqlite_autoindex_chat_handle_join_1, sqlite_autoindex_chat_message_join_1, message_idx_is_read, message_idx_failed, message_idx_handle, chat_idx_identifier, chat_idx_room_name, message_idx_was_downgraded, chat_message_join_idx_message_id sqlite_autoindex__SqliteDatabaseProperties_1, madrid_attachment_message_index, madrid_attachment_guid_index, madrid_attachment_filename_index, madrid_chat_style_index, madrid_chat_state_index, madrid_chat_account_id_index, madrid_chat_chat_identifier_index, madrid_chat_service_name_index, madrid_chat_guid_index, madrid_chat_room_name_index, madrid_chat_account_login_index, message_group_index, message_flags_index, pieces_message_index, madrid_guid_index, madrid_roomname_service_index, madrid_handle_service_index
For anyone familiar with the iOS5 sms.db database structure, the most apparent change in the iOS6 sms.db is the lack of "madrid" tables.  In iOS5, the iMessage app was introduced, and iMessage texting and SMS texting were unified in the sms.db.  This is still true in iOS6, but it is handled quite differently, and by "differently," I mean "better."

Time Stamps

When parsing the iOS5 database, it was necessary to convert two different date formats: Unix epoch time and Mac Absolute Time.  The dates of SMS messages sent through the wireless carrier were recorded in unix epoch time in the date field of the message table.  iMessages, on the other hand, were recorded in Mac Absolute Time in the same table and field as the SMS message date!  

What's the difference between unix epoch and Mac Absolute Time?  Exactly 978307200 seconds.  Unix epoch starts on 01/01/1970, while Mac Absolute time starts on 01/01/2001.  SQLite queries that integrated the SMS and iMessage texts had to account for these differences.  Some examiners with automated tools for parsing the sms.db were likely blissfully ignorant of this issue, but it exited none-the-less.  But it presented challenges to those of use extracting data from the database with the SQLite command line program or a SQLite GUI browser.

The iOS6 sms.db simplifies the date issue.  All text message dates, whether SMS or iMessage are recorded in Mac Absolute time.  The datetime function in SQLite can be used to convert that time by adding 978307200 seconds to the date data.  This converts the time to unix epoch which is one of the 'modifiers' the datetime function was coded to handle (Mac Absolute Time is not a valid datetime modifier):
sqlite> select datetime(370884516 + 978307200, 'unixepoch');
datetime(370884516 + 978307200, 'unixepoch')
2012-10-02 22:28:36
While on the topic of time stamps, two additional time stamps are possible in each record: date_read and date_delivered.  The date_read field defaults to 0 until the message is opened in the iMessage application.  The date_sent is populated with a date when the message is sent through the iMessage service but not SMS.  This meas it is possible to differentiate between the time an iMessage was initiated by the device user and when it was actually sent.

Addresses 

Another significant change in the database is the address field.  In the iOS5 version, the "address" was the phone number of the remote contact in the text conversation.  The address field has been replaced with the handle_id in iOS6.  The handle_id corresponds to the ROWID in the handle table, which contains the phone number of the the contact in the id field.

Flags

I'll quickly mention a couple of other changes.  In the previous versions of the sms.db, fields such as flags and read were used to mark the type (sent, received, etc) and status (read, unread, etc) of the message.  New fields exist for these attributes including is_from_me, is_empty, is_delayed, is_auto_reply, is_prepared, is_read, is_system_message, is_sent.  The values 0 and 1 are used as "no" and "yes" respectively in interpreting these fields.

I mentioned earlier that the service used to transmit the text, SMS or iMessage, resulted in different time stamp type.  It also resulted in different fields being populated in the message table.  In iOS6, the messages share the same flags regardless of the service used to send them.  The service utilized is recorded in the the new service field of the message table.

Putting it All Together

I did not come anywhere close to describing all the data in the new sms.db table, nor how to related all the tables.  I only intended to alert investigators that there are differences in the new database that must be considered.

That said, I'd be remiss if I did not provide a sample query to produce a basic chat list.  The following query produces a list with several important fields (RowID, Date, Phone Number,Service|Type|Date Read/Sent|Text):

SELECT m.rowid as RowID, DATETIME(date + 978307200, 'unixepoch', 'localtime') as Date, h.id as "Phone Number", m.service as Service, CASE is_from_me WHEN 0 THEN "Received" WHEN 1 THEN "Sent" ELSE "Unknown" END as Type, CASE WHEN date_read > 0 THEN DATETIME(date_read + 978307200, 'unixepoch') WHEN date_delivered > 0 THEN DATETIME(date_delivered + 978307200, 'unixepoch') ELSE NULL END as "Date Read/Sent", text as Text FROM message m, handle h WHERE h.rowid = m.handle_id ORDER BY m.rowid ASC;


I'll make that a little easier to read:

SELECT 
  m.rowid as RowID, 
  DATETIME(date + 978307200, 'unixepoch', 'localtime') as Date, 
  h.id as "Phone Number", m.service as Service, 
  CASE is_from_me 
    WHEN 0 THEN "Received" 
    WHEN 1 THEN "Sent" 
    ELSE "Unknown" 
  END as Type, 
  CASE 
    WHEN date_read > 0 then DATETIME(date_read + 978307200, 'unixepoch')
    WHEN date_delivered > 0 THEN DATETIME(date_delivered + 978307200, 'unixepoch') 
    ELSE NULL END as "Date Read/Sent", 
  text as Text 
FROM message m, handle h 
WHERE h.rowid = m.handle_id 
ORDER BY m.rowid ASC;

Your results, when imported to a spreadsheet, should look like this:

ROWID Date Phone Number Service Type Date Read/Sent Text
2484 2012-10-02 08:28:36 11231234567 iMessage Sent 2012-10-02 08:28:39 Hey
2485 2012-10-02 08:45:17 11231234567 iMessage Sent 2012-10-02 08:46:11 Call me when you get this
2486 2012-10-02 08:46:06 13217654321 SMS Received 2012-10-02 08:47:21 Can I borrow some bucks?
2487 2012-10-02 08:47:10 1321765321 SMS Sent
No! I don't have any doe.