Thursday, October 4, 2012

Who's Texting? The iOS6 sms.db

It didn't take long from the release of iOS6 on 09/14/12 until I got my first iOS6 device to process: 18 days.  In this case, it was an iPhone4 with iOS6.  In so short a time, there are not too many commercial or open source tools out there prepared to analyze the new OS or its new data formats.

Fortunately for me, the phone was not locked.  To process the phone, in which the prime interest was text messages and contacts, I used the latest iTunes to create a backup of the phone.  Fortune was again on my side: the phone was not set for encrypted backups.

iTunes Backup

If you are not familiar with iTunes backups, then allow me to introduce them generally.  iTunes copies user data (media, databases, settings, applications, etc.) to a directory in the operating system.  The location of the directory varies by operating system and version.  Since this is primarily a discussion about the new iOS6 sms.db, I'll let you find the location yourself.  What's most important to know is this:
  1. the backups are flat:  all the files are dumped into one directory and are not preserved in their original tree structure.
  2. the files are renamed: the file names are sha1 values calculated from the domain, path, and filename of the original file.
The nature of hash digest is that they produce unique, fixed length values based on the data present in the data structure being analyzed.  This means that a 1 byte file and a 200gb data stream both produce a 40-byte sha1 hash value.  This also means that the data on which the hash value was calculated cannot be reverse engineered from the value alone.  Why did I just state all this?  Certainly not to ignite another hash controversy by anything I might just misstated, but instead to point out that we can't really know the original file name, path, and domain from the sha1 value file names found in the backup.  Sure, you'll find postings that tell you the sms.db is renamed "3d0d7e5fb2ce288813306e4d4636395e047a3d28", but you won't find a file name for every file in your backup.  And you won't know any metadata about any of the backup files from the hash value, either.

Now, Linux helps us here.  On the command line, the file command will tell us the file type of each of the backup files, and most Linux file managers will render thumbnails for known file types.  This means its quite easy to view media files and identify databases just by opening a file browser pointed at the backup directory.  Its good for low hanging fruit, but how do you differentiate between the databases, for example?  Keyword search on a table name, maybe?  Sure, that might narrow the field, but it's a less than perfect solution.

Identifying Files in the Backup

Keeping the discussion general, know one more thing about iTunes backups before we proceed with the sms.db discussion: there are database files included in the backup that identify the files by domain, path, and file name.  In fact, the file metadata such a MAC times, ownership, permissions, and file size are included in the databases.

Now, the databases have changed with each iOS evolution--at least since iOS3 when I started examining iTunes backups.  Well, more accurately, I should they changed with each evolution except iOS6.  iOS6 retains the format introduced in iOS5.  That was again fortunate, because it just so happens that I coded a tool, based on a python script posted on stackoverflow posted by user galloglass, to 'unback' the iTunes backup based on the information in the backup databases.  The iOS5-unback.py tool works just fine on the iOS6 backup.

The sms.db

Where the backup format did not change in iOS6, the sms.db changed markedly.  Compare, if you will, the database schema side by side with an iOS5 sms.db.


iOS6 iOS5
Number of tables: 10 8
Table Names: SqliteDatabaseProperties, message, sqlite_sequence, chat, attachment, handle, message_attachment_join, chat_handle_join, chat_message_join, sqlite_stat1 _SqliteDatabaseProperties, message, sqlite_sequence, msg_group, group_member, msg_pieces, madrid_attachment, madrid_chat
Number of triggers: 9 7
Trigger Names: set_message_has_attachments, update_message_roomname_cache_insert, delete_attachment_files, clean_orphaned_attachments, clean_orphaned_handles, clear_message_has_attachments, clean_orphaned_messages, update_message_roomname_cache_delete, clean_orphaned_handles2 insert_unread_message, mark_message_unread, mark_message_read, delete_message, insert_newest_message, delete_newest_message, delete_pieces
Number of indexes: 16 18
Index Names: sqlite_autoindex__SqliteDatabaseProperties_1, sqlite_autoindex_message_1, sqlite_autoindex_chat_1, sqlite_autoindex_attachment_1, sqlite_autoindex_handle_1, sqlite_autoindex_handle_2, sqlite_autoindex_message_attachment_join_1, sqlite_autoindex_chat_handle_join_1, sqlite_autoindex_chat_message_join_1, message_idx_is_read, message_idx_failed, message_idx_handle, chat_idx_identifier, chat_idx_room_name, message_idx_was_downgraded, chat_message_join_idx_message_id sqlite_autoindex__SqliteDatabaseProperties_1, madrid_attachment_message_index, madrid_attachment_guid_index, madrid_attachment_filename_index, madrid_chat_style_index, madrid_chat_state_index, madrid_chat_account_id_index, madrid_chat_chat_identifier_index, madrid_chat_service_name_index, madrid_chat_guid_index, madrid_chat_room_name_index, madrid_chat_account_login_index, message_group_index, message_flags_index, pieces_message_index, madrid_guid_index, madrid_roomname_service_index, madrid_handle_service_index
For anyone familiar with the iOS5 sms.db database structure, the most apparent change in the iOS6 sms.db is the lack of "madrid" tables.  In iOS5, the iMessage app was introduced, and iMessage texting and SMS texting were unified in the sms.db.  This is still true in iOS6, but it is handled quite differently, and by "differently," I mean "better."

Time Stamps

When parsing the iOS5 database, it was necessary to convert two different date formats: Unix epoch time and Mac Absolute Time.  The dates of SMS messages sent through the wireless carrier were recorded in unix epoch time in the date field of the message table.  iMessages, on the other hand, were recorded in Mac Absolute Time in the same table and field as the SMS message date!  

What's the difference between unix epoch and Mac Absolute Time?  Exactly 978307200 seconds.  Unix epoch starts on 01/01/1970, while Mac Absolute time starts on 01/01/2001.  SQLite queries that integrated the SMS and iMessage texts had to account for these differences.  Some examiners with automated tools for parsing the sms.db were likely blissfully ignorant of this issue, but it exited none-the-less.  But it presented challenges to those of use extracting data from the database with the SQLite command line program or a SQLite GUI browser.

The iOS6 sms.db simplifies the date issue.  All text message dates, whether SMS or iMessage are recorded in Mac Absolute time.  The datetime function in SQLite can be used to convert that time by adding 978307200 seconds to the date data.  This converts the time to unix epoch which is one of the 'modifiers' the datetime function was coded to handle (Mac Absolute Time is not a valid datetime modifier):
sqlite> select datetime(370884516 + 978307200, 'unixepoch');
datetime(370884516 + 978307200, 'unixepoch')
2012-10-02 22:28:36
While on the topic of time stamps, two additional time stamps are possible in each record: date_read and date_delivered.  The date_read field defaults to 0 until the message is opened in the iMessage application.  The date_sent is populated with a date when the message is sent through the iMessage service but not SMS.  This meas it is possible to differentiate between the time an iMessage was initiated by the device user and when it was actually sent.

Addresses 

Another significant change in the database is the address field.  In the iOS5 version, the "address" was the phone number of the remote contact in the text conversation.  The address field has been replaced with the handle_id in iOS6.  The handle_id corresponds to the ROWID in the handle table, which contains the phone number of the the contact in the id field.

Flags

I'll quickly mention a couple of other changes.  In the previous versions of the sms.db, fields such as flags and read were used to mark the type (sent, received, etc) and status (read, unread, etc) of the message.  New fields exist for these attributes including is_from_me, is_empty, is_delayed, is_auto_reply, is_prepared, is_read, is_system_message, is_sent.  The values 0 and 1 are used as "no" and "yes" respectively in interpreting these fields.

I mentioned earlier that the service used to transmit the text, SMS or iMessage, resulted in different time stamp type.  It also resulted in different fields being populated in the message table.  In iOS6, the messages share the same flags regardless of the service used to send them.  The service utilized is recorded in the the new service field of the message table.

Putting it All Together

I did not come anywhere close to describing all the data in the new sms.db table, nor how to related all the tables.  I only intended to alert investigators that there are differences in the new database that must be considered.

That said, I'd be remiss if I did not provide a sample query to produce a basic chat list.  The following query produces a list with several important fields (RowID, Date, Phone Number,Service|Type|Date Read/Sent|Text):

SELECT m.rowid as RowID, DATETIME(date + 978307200, 'unixepoch', 'localtime') as Date, h.id as "Phone Number", m.service as Service, CASE is_from_me WHEN 0 THEN "Received" WHEN 1 THEN "Sent" ELSE "Unknown" END as Type, CASE WHEN date_read > 0 THEN DATETIME(date_read + 978307200, 'unixepoch') WHEN date_delivered > 0 THEN DATETIME(date_delivered + 978307200, 'unixepoch') ELSE NULL END as "Date Read/Sent", text as Text FROM message m, handle h WHERE h.rowid = m.handle_id ORDER BY m.rowid ASC;


I'll make that a little easier to read:

SELECT 
  m.rowid as RowID, 
  DATETIME(date + 978307200, 'unixepoch', 'localtime') as Date, 
  h.id as "Phone Number", m.service as Service, 
  CASE is_from_me 
    WHEN 0 THEN "Received" 
    WHEN 1 THEN "Sent" 
    ELSE "Unknown" 
  END as Type, 
  CASE 
    WHEN date_read > 0 then DATETIME(date_read + 978307200, 'unixepoch')
    WHEN date_delivered > 0 THEN DATETIME(date_delivered + 978307200, 'unixepoch') 
    ELSE NULL END as "Date Read/Sent", 
  text as Text 
FROM message m, handle h 
WHERE h.rowid = m.handle_id 
ORDER BY m.rowid ASC;

Your results, when imported to a spreadsheet, should look like this:

ROWID Date Phone Number Service Type Date Read/Sent Text
2484 2012-10-02 08:28:36 11231234567 iMessage Sent 2012-10-02 08:28:39 Hey
2485 2012-10-02 08:45:17 11231234567 iMessage Sent 2012-10-02 08:46:11 Call me when you get this
2486 2012-10-02 08:46:06 13217654321 SMS Received 2012-10-02 08:47:21 Can I borrow some bucks?
2487 2012-10-02 08:47:10 1321765321 SMS Sent
No! I don't have any doe.


33 comments:

  1. Excellent query, is there a way that a layman could modify it to look for only deleted texts or add the deleted texts back in. I'm seeing big gaps in my rowid and assume those are deleted texts.

    ReplyDelete
    Replies
    1. In a word: No. There is no internal mechanism for recovering dropped records from a table. While it is true that the data from a dropped record can still be on the disk in space allocated by the database, recovering that data and restoring it to a table is no small feat and in many cases not possible at all.

      It was a good observation on your part of the gaps in the rowids. It is, as you considered, an indication of dropped records, and that is why I included it in the query. Too many automated tools ignore this field, but it could prove a very valuable fact in the analysis.

      Delete
  2. "create a backup of the phone"? is this forensically sound in your country?

    ReplyDelete
    Replies
    1. I am going to try to keep this top from spinning out of control by keeping the discussion conceptual:

      The term forensics basically mean that the analysis is being done for legal purposes. What is admitted into court is a matter for the court to decide, not a specific country, agency, or individual. Courts usually have rules governing the admissibility of evidence, and the courts of jurisdiction where you practice digital forensics are where your question is best answered.

      Specifically, in this case, we are using a tool, iTunes backup, that was coded by the manufacturer for the purpose of preserving data on a compatible device like the iPhone. I have found there is no difference in the sms.db file extracted through iTunes backup, libimobiledevie backup, or other extraction methods. Since the analysis in this case was specifically of the sms.db file contents, and not a full file system where I am would be using file system metadata to form conclusions, then I can make a reasonable argument that the data extraction technique in this case was proper.

      As always, the devil is in the details. Your mileage may vary.

      Delete
  3. I have always been able to export the db file to an sqllite browser in the past. I am unable to do this in ios6. Do you know of a viewer that will allow it to be exported?

    ReplyDelete
    Replies
    1. You'll have to be more clear. I don't know what you mean by 'always been able to export the db.' Methods will vary depending on operating system and available tools.

      If you are looking for a simple, cross-platform way to access your sms.db file, perform and iTunes backup and access the sms.db through the backup. Just be aware that iTunes renames the files to a SHA1 hash of the absolute path, i.e., SHA1('HomeDomain-Library/SMS/sms.db') = 3d0d7e5fb2ce288813306e4d4636395e047a3d28

      Delete
  4. When viewing the sms.db file with something like sqlite, im unsure what some of the records pertain to, for example, I know in chat, the account ID is the owner of the sms.db file you have open,and there is a unique id for each type of service_name. (eg, if sent via iMessage its one id, if sent by email its another id, if its a SMS its another id. etc, But I cant figure out what the guid is for in the message table? The handle ID could be the same for 20 messages, but each mesage is a different guid for that handle id. The reason im asking is because im trying to figure out in the sms.db file what the alphanumeric code is that proceeds the message. That way I can find out who sent/recieved the message. Do you know of anyway to do this? Something like Sqlite is fine for most things, but as you said, it doesent show deleted records. If you want to view the deleted records you have to just work your way manually through the sms.db file with a text editor and thats why im trying to find out how to use those codes to figure out the user they are connected to.

    ReplyDelete
    Replies
    1. I'm sorry, I've not studied the issue. If and when I do, I'll try to remember to append this article.

      Delete
  5. Excellent write-up, thank you! You saved me about 5 hours of my life.

    ReplyDelete
  6. This is very helpful - thanks!

    Any ideas how to access the attachments in the sms.db table? I've tried looking for filenames of the SHA1 of the filename column, and the filename column with everything before Library swapped out for HomeDomain- (to match the hash you described above) - no luck though!

    ReplyDelete
  7. I recently delete a conversation, I did some procedures that I found in the internet but without success. I have acces to the other's person iPhone, and she have the entire conversation in her iPhone, do you think is possible to copy the conversation from her iphone and "paste" in my sms.db?

    ReplyDelete
    Replies
    1. Basically, no. You could use a SQLite insert statement to add data to your database. Take a look at http://www.w3schools.com/sql/sql_insert.asp.

      Delete
  8. Hi!

    Please help me!

    I have iPhone4 with 5.0.1 iOS.
    I make manual backup from sms.db.
    After the updgrade to 6.1 iOS, and put back the backup to the original place, with correct permission.
    After restart iPhone then make new sms.db and rename my old backup to sms.db.incompatible.v21.sqlitedb.
    What is the problem, or how can i convert to the correct format.

    ReplyDelete
    Replies
    1. The iOS5 and iOS6 databases are very different, storing data differently and in different formats (sms date codes come to mind). Thus, you will not be able to open an iOS5 database with the iOS6 Message application.

      While conversion might be technically possible by exporting and converting the data to the proper format, and finally importing to the iOS6 sms.db, it would be difficult at best. I'm not an iPhone user, but I would expect the iOS itself to import and convert the data on upgrade from iOS5 to iOS6.

      Delete
  9. Hi, my problem is this:
    im view from predefinite data view sms.
    example: date +%s >> DATA

    DATE=`cat DATE`
    pippo="$DATE"

    sqlite3 /sms.db < '$pippo';
    !

    this my script view all sms, no show from my date.
    old ios 5 is easy.
    please help thanks...

    ReplyDelete
    Replies
    1. Sorry, I don't understand what you are trying to do.

      Delete
  10. Hello, John.

    Is it possible to convert sms.db (iOS5) to sms.db (iOS6)?

    ReplyDelete
    Replies
    1. I don't know of any utility to do that. It is always possible to export data from one database and then import into another. But unless you are skilled with SQLite and understand the data structures in both the iOS5 and iOS6 sms.db, I wouldn't advise it. Does not iOS6 import old sms messages on upgrade?

      Delete
  11. While a good number of the table fields in the sms DB are pretty self-evident, some (like, say "was_downgraded" in messages) are less than clear. While I've searched for an complete iOS6 schema breakdown (with only partial success), I was wondering if you had had better luck?

    And thanks for the writeup!

    ReplyDelete
  12. hi hi, thanks for the work u've done. i've tried it out and it worked despite one problem. Most of my sms are Chinese, so the all became "??" after i exported it use ur statements into a csv file. Is there any chance that i could fixed that?

    ReplyDelete
    Replies
    1. You haven't provided information about your environment, but I suspect you are on a Windows box. The Windows command line does not use a UTF-8 code page, but an ASCII one. You can check with the command "chcp" (Change Codepage) with no arguments. In a Windows 7 system I checked, the code page was 437. To change to UTF-8, issue the command "chcp 65001."

      Delete
    2. hi, thanks again for ur kind reply. i actually changed the active code page to 65001, but the output CSV was strange words like
      +852xxxxxxxx|鏄晩銆傚瀵﹀緢灏肩應鍟娿€傘€傘€備互寰屾垜鍊戣澶氭敞鎰?+85266890630|銆傘€傛垜浠婂ぉ鍚冧簡濂藉銆傘€備及璁¢┈涓婂氨涓嶇敤鏀硅。鏈嶄簡

      Delete
    3. thanks for ur kind reply, i did the "chcp 65001", but the output still messy. Not chinese at all.


      +85266890630|鏄晩銆傚瀵﹀緢灏肩應鍟娿€傘€傘€備互寰屾垜鍊戣澶氭敞

      Delete
    4. Then maybe you should specify the Chinese code page.

      Delete
    5. from the command line, run:

      CMD /u /c sqlite QUERY

      that should solve it.

      Delete
  13. Hi John,

    Great post, thank you.

    Do you know of a way to determine from the SMS db which device was used in order to send an IMessage?

    For example, if I use my Iphone to send an IMessage, the Account column would typically show my cell number. (If this is the selected account in my IMessage settings)

    Let say someone else managed to gain access to my Apple ID, linked his MAC to my ID, and now are sending IMessages on behalf of myself.

    These message will sync to my Iphone as having been sent by myself, but the Account column would show my E-mail address, not my Cell number.

    Do you know of an indication on the phone that will show if IMessages were sent from my Iphone or from another device linked with my Apple ID, and subsequently just synced to my Iphone's sent items?

    Thanks,

    ReplyDelete
    Replies
    1. I don't have any personal iDevices available to tests. Take a look at the 'service' and 'account_guid' fields of the message table to see if they give you your answer.

      Delete
  14. Excellent SQL command. Worked beautifully for my sms backups which are originally in the 3d0d7e5fb2ce288813306e4d4636395e047a3d28 file.

    ReplyDelete
  15. Masterful article.

    I am trying to learn more about the use of indexes in the new iOS6 format. For example, integrity and consistency of a DB is ok; opening records and reading table data works with no issue. However the iOS search feature (both within iMessages and Spotlight) does not return accurate results even though the contents can be seen manually. This seems like a corrupt index problem, no? Is there a way to externally rebuild from a sqlite cmd or purge and have iOS rebuild?

    Many thanks~

    ReplyDelete
    Replies
    1. The sqlite REINDEX command is probably what you are looking for. You can read about the command here: http://www.sqlite.org/lang_reindex.html

      Delete
  16. Hi thanks for this helpful post.

    I was wondering what does the other 2 files do:

    sms.db-shm
    sms.db-wal

    Could you help me.
    Thank you

    ReplyDelete
    Replies
    1. The -shm and -wal files are part of the SQLite data protection schema known as Write Ahead Logging (WAL). Basically, changes to the database are first written to the -wal file until they are "committed" to the main database file. The -shm file is a shared memory file which speeds operations. You can read about the interaction of these files with the main database at http://www.sqlite.org/wal.html.

      In short, the database is not complete without the -wal file. They function as a unit. You should export all the files--the main database and the -wal and -shm files--to one directory before analysis of the main database. Don't be alarmed if after opening and closing the main database that the -wal and -shm files are gone: the data has been committed to the main database. Functionally, the data in the main database is the same after the commit. But without the -wal file, you are missing data from the database.

      Delete

Time Perspective

Telling time in forensic computing can be complicated. User interfaces hide the complexity, usually displaying time stamps in a human reada...