Wednesday, September 18, 2013

iPhone: Recovering from Recovery

I was attempting to brute force an iPhone 4 passcode for data recovery. The phone was in poor condition and had undergone modifications: the home button had been replaced as well as the back cover, maybe more. I could not reliably get the phone into recovery mode, possibly the result of a faulty home button, so i used libimobiledevice’s ideviceenterrecovery command.

It worked wonderfully. Maybe too wonderfully. I eventually achieved DFU mode (the home button was probably the culprit in making this normally simple process quite difficult), executed my exploit, and obtained the passcode and data. My goal was to unlock the phone and pass it off to another investigator. But, when I rebooted the phone after DFU mode I found it was in recovery again!

I tried a variety of things, from trying to rest with FirmwareUmbrella (formerly TinyUmbrella) to disassembling the phone and disconnecting the battery, but nothing worked. Then a colleague (thanks, Perry) suggested iRecovery.

What is libirecovery?

libirecovery is a cross-platform library which implements communication to iBoot/iBSS found on Apple’s iOS devices via USB. A command-line utility is also provided.

https://github.com/libimobiledevice/libirecovery
— libirecovery

libirecovery can be compiled in Linux. I found I had to install the libreadline-dev package in my ubuntu install, but you may find you have to do more depending on the packages you already have installed. Building requires you to execute the autogen.sh followed by make and then make install. I had to also run ldconfig to register the library since this was not done automatically.

The command line utility is the irecovery tool. It is used as follows:

iRecovery - iDevice Recovery Utility
Usage: irecovery [args]
        -i <ecid>       Target specific device by its hexadecimal ECID
        -v              Start irecovery in verbose mode.
        -c <cmd>        Send command to client.
        -f <file>       Send file to client.
        -k [payload]    Send usb exploit to client.
        -h              Show this help.
        -r              Reset client.
        -s              Start interactive shell.
        -e <script>     Executes recovery shell script.

On first blush, it might seem that the solution to my problem was the command irecovery -r to reset the device. But that is not so. Instead, I needed to enter the shell, change and environment variable, and reboot.

iRecovery Shell
$ sudo irecovery -s
> setenv auto-boot true
> saveenv
> reboot
Important
Running the command as root was required or the program failed with a segmentation fault.

The device rebooted into the normal operating system and I was able to unlock it with the passcode I had recovered. If you find yourself in a recovery loop, I hope this post will help you, uh, recover from it!


Friday, September 13, 2013

Recovering Data from Deleted SQLite Records: Redux

Rising from the Ashes

I’ve received many, many inquiries about recovering deleted records from SQLite databases ever since I posted an article about my first attempt to recover deleted data. Well, the hypothesis of checking the difference between the original database and a vacuumed copy seemed sound at the time and did in fact yield dropped record data, but it also included data from allocated records. The main thing I learned was that I had much to learn about the SQLite database file format.

Since that time, I’ve run into more and more SQLite databases, and the issue of recovering dropped records has become paramount. I have learned how to do so, and I’ll share some of the secrets with you now. But first, you need to know a little about SQLite databases…

This article is not a treatise on the SQLite file format. The best resource for that is located at SQLite.org. I hope to put the salient points here so you can understand the complexity of the task of recovering dropped records from SQLite databases.

SQLite Main Database Header

The first 100 bytes of a SQLite database define and describe the database. The key value to record recovery is the page size, a 16-bit (two-bytes) big-endian integer at byte offset 16. SQLite databases are divided into pages, usually matching the underlying file system block size. Each page has a single use, and those containing the records that are the interest of forensic examiners is the table b-tree leaf page, which I’ll refer to as the TLP. The TLP is distinguished from other page types by its first byte, \x0d or integer 13.

Thus, we can find the TLPs with the knowledge of the database page size we obtain from the database header and check the first byte of each page for \x0d. In python, that might look like:

Python 3: Finding table b-tree leaf pages
from struct import unpack

with open('some.db', 'rb') as f:
    data = f.read()

pageSize = unpack('>h', data[16:18])[0]

pageList = []

for offset in range(0, len(data), pageSize):
    if data[offset] == 13;
        pageList.append(offset)
Note
The code above prints the offset of TLPs. Make sure you are using Python 3 if you want to try this for yourself.

Table B-Tree Leaf Pages

The TLPs hold the records, and consequently, the dropped (deleted) records data when they occur. Each page has an 8-byte header, broken down as follows:

Table 1. Table b-tree leaf page header
Offset Size Value

0

1

Page byte \x0d (int 13)

1

2

Byte offset to first freeblock

3

2

Number of cells

5

2

Offset to first cell

7

1

Number of freebytes

The header introduces some terms that need explaining. A freeblock is unallocated space in the page below one or more allocated records. It is created by the dropping of a record from the table. It has a four-byte header: the first two bytes are a 16-bit big-endian integer pointing to the next freeblock (zero means its the last freeblock), and the second two bytes are a 16-bit big-endian integer representing the size of the freeblock, including the header.

Cells are the structures that hold the records. The are made up of a payload length, key, and payload. The length and key, also known as the rowid, are variable length integers. What are those? I’m glad you asked:

Variable-length Integers

A variable-length integer or "varint" is a static Huffman encoding of 64-bit twos-complement integers that uses less space for small positive values. A varint is between 1 and 9 bytes in length. The varint consists of either zero or more byte which have the high-order bit set followed by a single byte with the high-order bit clear, or nine bytes, whichever is shorter. The lower seven bits of each of the first eight bytes and all 8 bits of the ninth byte are used to reconstruct the 64-bit twos-complement integer. Varints are big-endian: bits taken from the earlier byte of the varint are the more significant and bits taken from the later bytes.

http://www.sqlite.org/fileformat2.html
— SQLite.org

I won’t go into varints any further in this post, because I will not be discussing cell decoding in this post. Suffice it to say that with the payload length, we can define the payload, which itself is made up of a header and columns. The header is a list of varints, the first describing the header length, and the remainder decribing the column data and types. The page header contains number of cells and the offset to the first cell on the page.

The last value in the header, freebytes, describes the number of fragmented bytes on the page. Fragmented bytes are byte groupings of three or less that cannot be reallocated to a new cell (which takes a minimum of four bytes).

Immediately following the page header is a cell pointer array. It is made up of 16-bit big endian integers equal in length to the number of cells on the page. Thus, if there are 10 cells on the page, the array is 20 bytes long (10 2-btye groupings).

Page Unallocated Space

There are three types of unallocated space in a TLP. Freeblocks and freebytes we’ve discussed, and the third is the space between the end of the cell array and the first cell on the page referred to in the SQLite documentation as simply "unallocated". Freeblocks and unallocated can contain recoverable record data, while freebytes are too small for interpretation. Thus, knowing the first freeblock (defined in the page header), the length of the cell array (interpreted from the number of cells defined in the page header) and the offset to the first cell (yep, you guessed it, defined in the page header), we can recover all the unallocated space in the page for analysis.

Python 3: Finding table b-tree leaf page unallocated space
for offset in pageList:
    page = data[offset: offset + pageSize]

    pageHeader = unpack('>bhhhb', page[:8])
    pageByte, fbOffset, cellQty, cellOffset, freebytes = pageHeader

    # get unallocated
    start = 8 + cellQty * 2
    end = cellOffset-start
    unalloc = page[start:end]
    print(offset, unalloc, sep=',')

    # get freeblocks, if any
    if fbOffset > 0:
        while fbOffset != 0:
            start, size = unpack('>hh', page[fbOffset: fbOffset + 4])
            freeblock = page[fbOffset: fbOffset + size]
            print(offset, freeblock, sep = ',')
            fbOffset = start

With the lines from the two code boxes, we have coaxed the unallocated data from the "some.db" SQLite database. We have printed the offset of each unallocated block and the contents (in python bytes format) to stdout. With just a little manipulation, we can turn this into a script a reuseable program, and the content can be grepped for strings. At bare minimum, we now have a way to determine if there is deleted content in the database related to our investigation, e.g., we could grep the output of the Android mmssms.db for a phone number to see if there are deleted records. Searching against the whole database would not be valuable because we cannot separate the allocated from the unallocated content!

Now, this obviously does not reconstruct the records for us, but recovering the unallocated data is a good start. In future posts I will describe how to reconstruct allocated records with an eye towards reconstructing unallocated records.


Wednesday, June 19, 2013

SQLite on the Case

It is very common in SQLite databases for integers to represent a deeper meaning than their numeric value. We usually refer to this as a flag. Take the iOS call_history.db for example: the call table has a column literally called flags and the integers in that column represent the type of call.

Integer Interpretation

4

Incoming

5

Outgoing

8

Blocked

16

Facetime

Note More values are possible than those in the table above. The data above is accurate for iOS 6, but not complete. Other values integers are possible that represent a network error of some time, and while it is possible to determine whether the call was incoming, Facetime, etc. when the error occurred, this exceeds the scope of this post.

Just as common as integer flags is the utter lack of an explanation for them in the database itself. Take a look at the table schemas:

CREATE TABLE _SqliteDatabaseProperties (key TEXT, value TEXT, UNIQUE(key));
CREATE TABLE call (ROWID INTEGER PRIMARY KEY AUTOINCREMENT, address TEXT, date INTEGER, duration INTEGER, flags INTEGER, id INTEGER, name TEXT, country_code TEXT, network_code TEXT, read INTEGER, assisted INTEGER, face_time_data BLOB, originalAddress TEXT);
CREATE TABLE data (ROWID INTEGER PRIMARY KEY AUTOINCREMENT, pdp_ip INTEGER, bytes_rcvd REAL, bytes_sent REAL, bytes_last_rcvd REAL, bytes_last_sent REAL, bytes_lifetime_rcvd REAL, bytes_lifetime_sent REAL);

The tables are essentially stand alone—the data table is not joined with the call table to give meaning to the flags column integer. So, how do we determine the meaning? Why, by testing, of course. If you thought use another tool and see what it says then go outside and drag your knuckles on the concrete for a bit. It’s okay, I’ll wait….

The best resource for checking flag values is the device’s user interface: get a test phone, export the database, and compare the records shown through the device interface to the values in the table. Yes, you have to do some work, but that is as it should be. At a minimum, you need to check if your tool or manual query is corret, right? By the way, you might be wondering how the UI knows the interpretation: its in the application programming, but you don’t get to see that because its in binary form.

So, let’s go forward with the assumption we have checked the UI for this database or a same generation database and are confident of the interpretation. Can we craft a SQL statement that can perform the interpretation for us? You bet!

The Case Expression

The SQLite CASE expression is the IF/THEN of SQL queries. The basic form of the expression is as follows (credit: sqlite.org):

  • CASE x WHEN w1 THEN r1 WHEN w2 THEN r2 ELSE r3 END

  • CASE WHEN x=w1 THEN r1 WHEN x=w2 THEN r2 ELSE r3 END

Think of the x value in the expressions above as the column name. Rather than try to explain further with hypothetical values, lets turn back to the call_history.db flags. We could use the following excerpt in our query:

case flags
    when 4 then "Incoming"
    when 5 then "Outgoing"
    when 8 then "Blocked"
    when 16 then "Facetime"
    else "Dropped"
end as flags
Note We use the as flags alias at the end of the case statement above to name the column "flags." Without the alias, the column would take the name of the entire case statement.

The case statement above essentially reads: for the flags column, interpret 4 as Incoming, 5 as Outgoing, 8 as Blocked, 16 as Facetime, and anything else as Dropped. If we were interested in the producing output from the call_history.db that included the rowid, phone number, date, call duration, and call type, we could use the following query:

select
        rowid,
        address,
        datetime(date,'unixepoch','localtime') as date,
        duration,
        case flags
                when 4 then "Incoming"
                when 5 then "Outgoing"
                when 8 then "Blocked"
                when 16 then "Facetime"
                else "Dropped"
        end as flags
from call

Our output would look like the following:

ROWID address date duration flags

1948

8885551212

2013-02-30 19:46:41

15

Outgoing

1949

8881215555

2013-02-30 21:05:11

67

Incoming

1950

8881215555

2013-02-01 00:17:05

113

Incoming

1951

8885551212

2013-02-01 10:32:32

54

Outgoing

1952

8885551212

2013-02-01 10:37:25

34

Outgoing

But if you act now, there’s more…

It’s very nice that we can interpret these flags using a SQL statement. This means we don’t have to write a program, use an in line editor, or write a spreadsheet rule to do it. One stop shopping, just what every busy forensicator wants! But, I like to see the original value, too, just in case it turns out my interpretation was wrrr…, uh, less correct than I would like.

Enter SQL operators. Operators are syntax (usually symbols, but sometimes words) that cause the SQL engine to perform a specific task on the data such as concatenation or filtering (e.g., equals, greater than, less than). The double pipes || operator, which is the focus of this section, concatenates values. With it we can make it clear that the duration is in seconds, and we can append the append the flag integer to its English equivalent.

select
        rowid,
        address,
        datetime(date,'unixepoch','localtime') as date,
        duration || " sec" as duration,
        case flags
                when 4 then "Incoming"
                when 5 then "Outgoing"
                when 8 then "blocked"
                when 16 then "Facetime"
                else "Dropped"
        end || " (" || flags || ")" as flags
from call

Now our output could would look like the following:

ROWID address date duration flags

1948

8885551212

2013-02-30 19:46:41

15 sec

Outgoing (5)

1949

8881215555

2013-02-30 21:05:11

67 sec

Incoming (4)

1950

8881215555

2013-02-01 00:17:05

113 sec

Incoming (4)

1951

8885551212

2013-02-01 10:32:32

54 sec

Outgoing (5)

1952

8885551212

2013-02-01 10:37:25

34 sec

Outgoing (5)

Note It’s very true that we could have just labeled our duration column Duration(sec), but I wanted to illustrate another variation of the use concatenation in a query.

The second-to-last line in the query, end || " (" || flags || ")" as flags, adds the integer value of flags, encapsulated in parenthesis, to the end of the interpreted value. Thus, the complete case statement can be read _print the translation of the flag, followed by a space, open parenthesis, the flag integer, and finally a closed parenthesis.

I hope this gives you some ideas on how to produce output with more clarity and analytical value.


Monday, June 17, 2013

TextMe App: Lesson Learned from Unusual Tables

I recently had the opportunity to help a colleague with an iPhone database that was not supported by his automated tools. The application was the TextMe application, and predictably, the texting app stored its chat in a SQLite database. What made the database interesting was the fact that there was no immediately obvious way to identify to whom a message was sent.

Let me illustrate: A quick scan of the database reveals some of the capabilities of the application: texting (ZMESSAGE table) and calling (ZCALL, ZVOICEMAIL)

SQLite Command Line Interface
$ sqlite3 TextMe2.sqlite .tables
ZATTACHMENT     ZDISCUSSION     ZMESSAGEHEIGHT  Z_METADATA
ZCALL           ZFAVORITE       ZVOICEMAIL      Z_PRIMARYKEY
ZCONTACT        ZMESSAGE        Z_3DISCUSSIONS
$

The subject of this investigation was the text messages, so I needed to see how the table was constructed.

SQLite Command Line Interface
$ sqlite3 TextMe2.sqlite ".schema zmessage"
CREATE TABLE ZMESSAGE ( Z_PK INTEGER PRIMARY KEY, Z_ENT INTEGER,
Z_OPT INTEGER, ZSTATUS INTEGER, ZCALL INTEGER, ZDISCUSSION INTEGER,
ZHEIGHT INTEGER, ZSENDER INTEGER, Z3_SENDER INTEGER, ZTIMESTAMP
TIMESTAMP, ZBODY VARCHAR, ZGUID VARCHAR, ZLOCATION VARCHAR, ZREMOTEID
VARCHAR );
...
$

The CREATE TABLE statement shows us there are 14 fields in the table, and a majority are integers: Z_PK, Z_ENT, Z_OPT, ZSTATUS, ZCALL, ZDISCUSSION, ZHEIGHT, ZSENDER, Z3_SENDER, ZTIMESTAMP, ZBODY, ZGUID, ZLOCATION, ZREMOTEID. But, don’t fall into the trap that the declared type of each column (e.g. INTEGER, TIMESTAMP, VARCHAR)actually constrain the data to those types, they don’t. Treat the type-name as informative only, but verify the data before forming any conclusions.

Inspecting the columns, we see some of obvious value in a forensic examination:

  • Z_PK (which is the primary key, an auto incrementing integer who’s chief value is assisting us in identifying if messages have been deleted from the database)

  • ZSTATUS

  • ZSENDER

  • ZTIMESTAMP

  • ZBODY

Others might be grabbing your attention, but I’m going to keep this discussion focused on these columns. Some sample data is in order:

SQLite Command Line Interface
$ sqlite3 -header TextMe2.sqlite "select z_pk, zstatus, zsender,
ztimestamp, zbody from zmessage limit 5;"
Z_PK|ZSTATUS|ZSENDER|ZTIMESTAMP|ZBODY
2|2|2|386362581|Hi!
3|4|10|386362603|hey, what are you doing?
4|3|2|386362630|I'm checking out this new app
5|3|2|386362634|It might be a challenge to decode
6|3|2|386362644|But I'll figure it out...

We see we have a couple of interpretation issues, here: the status is an integer that needs to be interpreted, and is the sender. The date is some form of epoch time, and my eyes tell me its likely Mac Absolute Time. I expect to find the interpretations in other tables in the database. But what jumps off the screen at me is that there is no obvious answer the following question: To whom is the sender sending the message? The time stamp gives us a sequence of messages, but how do we know that sender "2" is sending messages to sender "10"? Couldn’t sender "2" be sending his message to, say, sender "5" and in the midst, receives a message from sender "10"? Absolutely!

So, how to we rectify this issue? Well, I sort of mischievously left off the zdiscussion column in my query. I did this to steer the conversation and simulate what can happen when an investigator encounters a new database for the first time: overlook an important column. If we include the column, we see something interesting:

SQLite Command Line Interface
$ sqlite3 -header TextMe2.sqlite "select z_pk, zstatus, zdiscussion,
zsender, ztimestamp, zbody from zmessage limit 5;"
Z_PK|ZSTATUS|ZDISCUSSION|ZSENDER|ZTIMESTAMP|ZBODY
2|2|2|2|386362581|Hi!
3|4|2|10|386362603|hey, what are you doing?
4|3|2|2|386362630|I'm checking out this new app
5|3|2|2|386362634|It might be a challenge to decode
6|3|2|2|386362644|But I'll figure it out...

Now we see that the conversation is all part of the same discussion. And if we studied the whole database, we’d see example of where more than one conversation was occurring a the same time, and by sorting on the discussion field, we make sense of those conversations. But date stamp alone does not clue us in.

This might not seem like a big deal, but most messaging databases I have encountered have the remote party in the message record for both sent and received messages. This works well and leads to easy interpretation, e.g., "sent to Joe" and "received from Joe". But this database without properly understanding the discussion column, is the equivalent of "sent by Joe" and "sent by Jane", leading to the question "to whom?"

Rather than breakdown the rest of the analysis, I’m just going to share the query I used to complete the analysis:

select
  m.z_pk as "Key",
  datetime(ztimestamp + 978307200, 'unixepoch', 'localtime') as "Date",
  case
    when m.z_ent then (select z_name from z_primarykey natural join zmessage)
    else "Unknown(" || m.z_ent || ")" end
    as "Type",
  case zstatus
    when 2 then "Sent"
    when 3 then "Delivered"
    when 4 then "Received"
    else "Unknown(" || zstatus || ")" end
    as "Status",
  zdiscussion as "DiscussionID",
  zusername as "Contact",
  zbody as "Message"
from zmessage as m, zcontact as c
on m.zsender = c.z_pk
order by discussionid asc, date asc;

By way of brief description:

  • the AS statements, such as that seen in m.z_pk as "Key", create aliases, effectively renaming the columns (or in the FROM statement, the tables) to make the output more informative.

  • The first CASE statement queries the z_primarykey to interpret the Z_ENT integer into its textual value.

  • The second CASE statement interprets the ZSTATUS flag into its English equivalent, which was not stored in the database, but determined by studying the application user-interface. This begs the question, then where does the application get the textual interpretation? Probably within its code.

  • The FROM statement queries two tables, zmessage and zcontact, to interpret the ZSENDER integer. But wait, you say, there is no ZSENDER in the select statement! (see next bullet point…)

  • The ON statement is the SQLite equivalent of an SQL INNER JOIN, which is an intersection of the two tables, i.e., it select rows from both tables where the columns match. In this case, the columns to be matched are ZSENDER from the ZMESSAGE table and Z_PK from the ZCONTACT table. The effect is that the SELECT statement knows which ZUSERNAME from ZCONTACT to return based on the ZSENDER value in the ZMESSAGE table.

  • The ORDER BY statement sorts the output first by the ZDISCUSSION column, then by the ZDATESTAMP, both in ascending order. Note that the column alias names are used.

I hope this gives some insight into the workings of TextMe2.sqlite databases and helps you in the future with the analysis of never-before-seen SQLite databases.