Pages

Saturday, January 26, 2013

Android Messaging: Is Android Getting Religious?

"Cleanliness is next to Godliness," it is often said. And if you believe that, then you might think the Android operating system is seeking after the divine when it comes to its messaging service. Why do I say that? Because in my quest for a thorough understanding of SQLite databases, I discovered that the mmssms.db, Android’s built-in messaging database, has the auto-vacuum option enabled! And in Full-mode at that!

SQLite Vacuum

In SQLite, Vacuum is an operation that rebuilds the entire database. Frequent updates, deletions and insertions can leave the database file fragmented. Vacuum reduces the size of fragmented databases by copying the active records to a temporary file and then overwriting the original database file. During this process, it uses the rollback journal or write-ahead log as it would for any database transaction.
SQLite has two auto-vacuum modes, full and incremental. The auto-vacuum mode can only be set when the database is created. The setting is stored in the database header (the first 100 bytes of the database file), at file offset 52. If the 32-bit, big-endian integer at offset is non-zero, it represents the address (page number) of the largest root b-tree page. For this discussion, the significance of the non-zero value is that database auto-vacuum is enabled.
The 32-bit, big-endian File offset 64 indicates the auto-vacuum mode. An non-zero value means the database is set for incremental vacuum mode, while a zero value means full mode.
Figure 1. SQLite Database Header
Note
Don’t be fooled by shortcutting your analysis by jumping straight to offset 64 to check the value. A zero value and offset 64 coupled with a zero value at offset 52 means auto-vacuum is not enabled!

SQLite Structure

Before we can really understand the SQLite vacuum operation, we have to first understand a little bit about how SQLite manages its data. SQLite organizes itself into pages. The page sizes usually match the underlying file system block size and can be determined definitively by the 16-bit, big-endian integer located at file offset 16. Each page has a single purpose and can be any one of the following types:
  • Lock-byte
  • Freelist
    • trunk
    • leaf
  • B-tree
    • table interior
    • table leaf
    • index interior
    • index leaf
  • Payload overflow
  • Pointer map
For this discussion, we need to know about about the freelist pages and B-Tree table pages.

Freelist Pages

When data is deleted, or dropped, from the SQLite database, the database file does not get smaller (absent a vacuum operation). The database notes the location of the free space and reuses it as needed. The freelist contains the addresses, by page number, of full pages no longer being used to store data. The number of freelist pages in the database is store in the database file header as a 32-bit, big-endian integer at file offset 36.
Freelist trunk pages store the addresses—by page number, not offset—to the next trunk page, if any, and to freelist leaf pages. The freelist leaf pages are the pages that once stored data, that is, they were once B-Tree pages.

B-Tree Pages

Table records and structures are stored in B-Tree pages. B-Tree pages have headers that describe the data in the page:
  • Byte offset to first freeblock (unallocated space between the records)
    • Free blocks are chained together, each one pointing to the next
  • Number of cells on the page
    • B-Tree table leaf pages contain cells with table data
  • Offset to the first cell on the page
  • Number of fragmented free bytes
    • May not exceed 60 bytes
A cell pointer array follows immediately after the B-Tree page header. The array is a list of offsets to the allocated cells on the page. Cells are self describing, using integers to describe things like the cell length, the unique record index number (ROWID), and the cell payload content (by means of a record header). Not all B-Tree pages contain the table data that is the usual subject of an examination, but those that do can be identified by the page header.
The take away here is that it is B-Tree pages that contain table data. B-Tree pages can contain both allocated and unallocated space, and become fragmented when one record is dropped from the midst of other records. All records may be deleted from a B-Tree table leaf page making it subject to becoming a Freelist page.
Note
A SQLite database may reorganize, or defragment a page so there are no freeblocks or byte fragments (groups of three or less bytes), packing all the allocated cells at the end of the page. This is an internal housekeeping function independent of the vacuum function.

A Tale of Two Modes

As I already stated, auto-vacuum comes in two flavors: Full and Incremental. So, what is the difference and how does it affect our examinations?

Auto-Vacuum: Full Mode

In full auto-vacuum mode, every transaction commit to the database causes the pages in the freelist to be moved to the end of the database, and the database is truncated to remove the pages. It is important to distinguish that only the freelist pages are removed, not the fragmented B-Tree pages. Also, Full auto-vacuum does not cause B-Tree page defragmentation to occur.

Auto-Vacuum: Incremental Mode

In Incremental auto-vacuum mode, vacuuming does not occur with every commit. Instead, the database programatically receives a command to remove N pages from the freelist. The pages are moved to the end of the database, and the database is truncated. The page references are removed from the free list. If there are fewer pages in the list than required by the command, all the freepages are moved and truncated.

So What’s the Big Deal?

I started this discussion by noting that I had discovered that the Android mmssms.db was set to full auto-vacuum mode. This means that every commit to the database could cause dropped records in freepages to be moved to the end of the database and dropped off a cliff. Tools designed to recover dropped records from logigal SQLite databases won’t recover the records because they are no longer part of the database! And logical file extraction tools won’t recover the deleted pages, either.
Think of it like this: A drug dealer is seen conducting a transaction and flees when approached by police. He momentarily escapes, and takes the opportunity to delete all his text messages should he be captured. Sure enough, the good guys find him. While he’s being pat down, one of his customers texts the internationally recognized "do you have any drugs?" abbreviation: "Wuz up?"
Wuz up? You just helped the drug dealer remove all dropped records from his messing database, that’s WUZ UP!

Saturday, January 19, 2013

Cracking Android Passwords: The Need for Speed


Impossibly Large Numbers Revisited

In October, 2012 I posted about a article about cracking Android passwords. I spoke primarily on the difficulty in cracking the passwords based on the sheer number of possibilities (a whopping 37,556,971,331,618,802,349,234,821,094,576!)
Don’t believe me? Let’s to a little rehashing: The key space (range of possible ASCII characters) for each position in the password is 94 (upper and lower case letters, digits, and extended characters), or hexadecimal range \x21-\x7F. The password can be a minimum length of 4 and a maximum of 16 characters long.
A little Python 3 math
>>> total = 0
>>> for i in range(4,17):
...     total = total + 94**i
...
>>> print(total)
37556971331618802349234821094576
>>> #python will even put in the commas!
>>> print('{:,}'.format(total))
37,556,971,331,618,802,349,234,821,094,576
And voilĂ , we have 37.6 trillion-quadrillion possibilities! (Just rolls off the tongue, doesn’t it?) I spoke then that while the CCL Forensics python script was a great tool, python was not the best choice for password cracking because its relatively slow for that purpose. I introduced hashcat, a cross-platform password recovery application, as a better way to do business.

Hashcat-lite: Harness Feline Speed

Hashcat is coded with performance in mind and can use multi-core CPUs and GPUs (Nvidia and AMD) to perform the calculations. The CPU version, hashcat is remarkably faster than the CCL python script, and the GPU verson,oclHashcat-plus leaves the CPU version in the dust!
Using hashcat for cracking Android passwords can be a bit confusing, however, and I hope to deobfuscate the process here.

Spicy Passwords

Android uses a salted password and stores the password as a hash in the /data/system/password.key file. Well, two hashes, actually. A SHA-1 hash is calculated followed by an MD5 hash, and the values are concatenated into a single 72 byte string.
The salt, in this case a randomly generated signed, 64-bit integer, randomizes the hash and makes dictionary and brute force attacks ineffective. The integer is stored in the settings.db located in the /data/data/com.android.providers.settings/databases directory. The integer is converted a hexadecimal string (8 bytes in length) and is appended to the password. The password + salt string is then hashed and stored.
The CrackStation website has an excellent treatise on salted password hashing if you are looking for a more in-depth explanation. The Android salted password formate is not the only salted password hashing method in practice.

Creating Test Data

We can use python to create some salted hashes after the manner of Android. This is useful for testing hashcat or other tools you might use. After all, if you don’t first test, a failed crack attempt leaves you wondering if the tool failed or if you failed to use the tool properly. To create an Android style password hash, we need a 4-16 character length ASCII character
First, let’s pick a password. We’ll keep it fairly short to allow it to be cracked in a reasonable amount of time: "secret". Keeping it lower case allows us to attack it with a 26 character key space—after all, we’re about cracking the password here, not generating secure passwords!
BASH
$ password="secret"
$
Next, we need to generate a random salt integer (we could just make something up here, but we’ll use python to randomly generate a salt to keep the exercise more realistic). The maximum size of a 64-bit integer is 9,223,372,036,854,775,807. It is signed, meaning it can be positive or negative. Yes, mathematicians, that’s the definition of an integer: a positive or negative whole number including zero. But knowing its signed is important for the hexadecimal conversion in Python or other programming languages. To keep the exercise simple, however, we’ll stay in bash and generate a random number (we’re fudging a bit in generating the random number, but it works for our purposes)/
BASH
$ password="secret"
$ salt=$(($RANDOM**4))
$ echo $salt
15606337825758241
$
Note
Extracting the salt from settings.db
Recall that in an Android device, the salt would be stored in the /data/data/com.android.providers.settings/databases/settings.db in the "secure" table. The table salt can be obtained as follows:
BASH
$ sqlite3 settings.db 'SELECT value FROM secure WHERE \
name = "lockscreen.password_salt";'
15606337825758241
$
On the BASH command line, we can convert the salt to a 8-byte hex string with the built-in function printf. The function formats and prints the a string, in this case we’ll be using the salt, according to a format string. Below, we tell print f to convert the string held in the variable $salt to hexadecimal, padding it with leading zeros if necessary until the output string is 16 characters long.
BASH
$ password="secret"
$ salt=$(($RANDOM**4))
$ echo $salt
15606337825758241
$
Now, we generate a hash by concatenating the password and hexadecimal salt into a string and hashing it. We’ll use the MD5 algorithm because it is faster to crack than SHA-1 (recall the password.key file contains both hashes). We pass the -n option to echo to prevent it from appending the output with a line feed as this would change our MD5 hash.
BASH
$ password="secret"
$ salt=$(($RANDOM**4))
$ echo $salt
15606337825758241
$ echo -n $password$salt | md5sum
b6b97079899c5f22d94f27027549cd7d  -
Now we have the salted MD5 hash of the password secret using the salt 15606337825758241!
Note
Extracting the MD5 from password.key
We have been generating a salted hash for testing. You’ll need to extract the MD5 from the password.key when working with real data. The following command makes short work of it.
BASH
$ tail -c32 password.key
b6b97079899c5f22d94f27027549cd7d
$

Using Hashcat

I’ll be demonstrating the Nvidia version of hashcat. You’ll want to check the help for your version of hashcat, but you’ll find the following demonstration informative.
The basic command for hashcat follows this form: --- hashcat [options] hash [mask] ---
The chief options we are interested in are the hash type (-m) and minimum/maximum password lengths (--pw-min/--pw-max). Reading the help (-h/--help) tells us that for salted MD5 passwords, we us the -m10 option. And since we are using the -m10 option, we need to append the salt to the hash using a colon (:) separator.
Our command and ouput look as follows:
BASH
$ ./cudaHashcat-lite64.bin -m10 --pw-min=4 --pw-max=16 \
b6b97079899c5f22d94f27027549cd7d:15606337825758241
cudaHashcat-lite v0.13 by atom starting...

Password lengths: 4 - 16
Watchdog: Temperature abort trigger set to 90c
Watchdog: Temperature retain trigger set to 80c
Device #1: GeForce 9500 GT, 1023MB, 1375Mhz, 4MCU


b6b97079899c5f22d94f27027549cd7d:15606337825758241:secret

Session.Name...: cudaHashcat-lite
Status.........: Cracked
Hash.Target....: b6b97079899c5f22d94f27027549cd7d:15606337825758241
Hash.Type......: md5($pass.$salt)
Time.Started...: Sat Jan 19 16:49:59 2013 (10 secs)
Time.Estimated.: Sat Jan 19 16:50:39 2013 (26 secs)
Plain.Mask.....: ?1?2?2?2?2?2
Plain.Text.....: ***yd3
Plain.Length...: 6
Progress.......: 1051066368/3748902912 (28.04%)
Speed.GPU.#1...:   102.8M/s
HWMon.GPU.#1...: -1% Util, 45c Temp, 100% Fan

Started: Sat Jan 19 16:49:59 2013
Stopped: Sat Jan 19 16:50:13 2013
$
Wait. Was that 14 seconds? Yes, it was!

Put on a Mask and Speed Your Results

Now, if the password gets very much longer, the exponential increase in the number of password possibilities gets quite large. Hashcat has one more trick up its sleeve (actually, there’s at least one more, but we’ll cover than another time). Hashcat makes use of masks that allow you to narrow the key space. Simply put, you can choose limited character sets to be used in the search, either from a predefined list, or lists your own creation.
The predefined character sets are:
  • ?l = abcdefghijklmnopqrstuvwxyz
  • ?u = ABCDEFGHIJKLMNOPQRSTUVWXYZ
  • ?d = 0123456789
  • ?s = !"#$%&'()*+,-./:;<⇒?@[\]^_`{|}~
  • ?a = ?l?u?d?s
  • ?h = 8 bit characters from 0xc0 - 0xff
  • ?D = 8 bit characters from German alphabet
  • ?F = 8 bit characters from French alphabet
  • ?R = 8 bit characters from Russian alphabet
To limit the password search to passwords containing only lowercase letters, for example, you would pass the command:
$ ./cudaHashcat-lite64.bin -m10 --pw-min=4 --pw-max=16 \
b6b97079899c5f22d94f27027549cd7d:15606337825758241 \
?l?l?l?l?l?l?l?l?l?l?l?l?l?l?l?l
I hope this gets you started using hashcat. It is a very effective tool, and it keeps on improving!

Thursday, January 17, 2013

Rotten Apples: Watch out for Worms!

Oh, Apple, you've done it to me again!...

With each iOS incarnation, key databases change structure.  This is no secret to anyone who examines data from iDevices.  The iOS4 sms.db differs greatly from the iOS5 sms.db, and both differ significantly from the new iOS6 sms.db.  This is expected, and no heartburn here at all.

But last month I was slapped in the face by Apple in an unexpected way: I found two different versions of the sms.db from the same version of iOS!  This is unexpected, and highlights why me must not take our tools for granted and assume our output in this case is accurate because of a tool test we conducted in a previous case.

The Quandry


The exhibits:

  • iPhone #1: Product type: iPhone 4,1; Product Version 5.1.1
  • iPhone #2: Product type: iPhone 4,1; Product Version 5.1.1

So, for all intents and purposes, I was dealing with the same phone and operating system.

Take a look at the sms.db message table schemas:
iPhone #1 
CREATE TABLE message (ROWID INTEGER PRIMARY KEY AUTOINCREMENT, address TEXT, date INTEGER, text TEXT, flags INTEGER, replace INTEGER, svc_center TEXT, group_id INTEGER, association_id INTEGER, height INTEGER, UIFlags INTEGER, version INTEGER, subject TEXT, country TEXT, headers BLOB, recipients BLOB, read INTEGER, madrid_attributedBody BLOB, madrid_handle TEXT, madrid_version INTEGER, madrid_guid TEXT, madrid_type INTEGER, madrid_roomname TEXT, madrid_service TEXT, madrid_account TEXT, madrid_flags INTEGER, madrid_attachmentInfo BLOB, madrid_url TEXT, madrid_error INTEGER, is_madrid INTEGER, madrid_date_read INTEGER, madrid_date_delivered INTEGER, madrid_account_guid TEXT);
iPhone #2
CREATE TABLE message (ROWID INTEGER PRIMARY KEY AUTOINCREMENT, address TEXT, date INTEGER, text TEXT, flags INTEGER, replace INTEGER, svc_center TEXT, group_id INTEGER, association_id INTEGER, height INTEGER, UIFlags INTEGER, version INTEGER, subject TEXT, country TEXT, headers BLOB, recipients BLOB, read INTEGER, madrid_attributedBody BLOB, madrid_handle TEXT, madrid_version INTEGER, madrid_guid TEXT, madrid_type INTEGER, madrid_roomname TEXT, madrid_service TEXT, madrid_account TEXT, madrid_account_guid TEXT, madrid_flags INTEGER, madrid_attachmentInfo BLOB, madrid_url TEXT, madrid_error INTEGER, is_madrid INTEGER, madrid_date_read INTEGER, madrid_date_delivered INTEGER);

Do you see it?  I didn't initially, because I tried to automate extracting the text messages from iPhone #1 with a python program I had previously authored.  When it failed, I was very confused because I had just used the program on iPhone #2 similar device days earlier.  And frankly, it didn't dawn on me to immediately check the schema while seeking the error source, which is the purpose of this post: saving you some of my pain.

Finding the Worms


If you didn't spot the issue, don't worry, I'll help:

iPhone #1 
(CREATE TABLE message (ROWID INTEGER PRIMARY KEY AUTOINCREMENT, address TEXT, date INTEGER, text TEXT, flags INTEGER, replace INTEGER, svc_center TEXT, group_id INTEGER, association_id INTEGER, height INTEGER, UIFlags INTEGER, version INTEGER, subject TEXT, country TEXT, headers BLOB, recipients BLOB, read INTEGER, madrid_attributedBody BLOB, madrid_handle TEXT, madrid_version INTEGER, madrid_guid TEXT, madrid_type INTEGER, madrid_roomname TEXT, madrid_service TEXT, madrid_account TEXT, madrid_flags INTEGER, madrid_attachmentInfo BLOB, madrid_url TEXT, madrid_error INTEGER, is_madrid INTEGER, madrid_date_read INTEGER, madrid_date_delivered INTEGER, madrid_account_guid TEXT);
iPhone #2
CREATE TABLE message (ROWID INTEGER PRIMARY KEY AUTOINCREMENT, address TEXT, date INTEGER, text TEXT, flags INTEGER, replace INTEGER, svc_center TEXT, group_id INTEGER, association_id INTEGER, height INTEGER, UIFlags INTEGER, version INTEGER, subject TEXT, country TEXT, headers BLOB, recipients BLOB, read INTEGER, madrid_attributedBody BLOB, madrid_handle TEXT, madrid_version INTEGER, madrid_guid TEXT, madrid_type INTEGER, madrid_roomname TEXT, madrid_service TEXT, madrid_account TEXT, madrid_account_guid TEXT, madrid_flags INTEGER, madrid_attachmentInfo BLOB, madrid_url TEXT, madrid_error INTEGER, is_madrid INTEGER, madrid_date_read INTEGER, madrid_date_delivered INTEGER);

Ok, you say, "I see the highlights, but they are the same content.  What's the big deal?"  I concur, they do say the same thing... but the "madrid_account_guid" is in a different position in the database, or to be more linguistically correct, the field order is different in the two databases!  Does it matter?  Well, yes, and no...

Consider:
'SELECT ROWID, address, text, madrid_date_read FROM message;'
This query would work equally well in either phone's message table because it calls the fields by name.  Any application, forensic or otherwise, making specific queries would continue to operate completely oblivious of the database differences.  But a more generic query, could lead to trouble.
'SELECT * FROM message;'
This query would output every field in the record, and any tool that tried to read data output positionally would get unexpected data in the last eight fields (in one case, anyway).  This is what happened in my program. "Well, stupid," you say, "don't code like that."  Again, I concur, and I fixed my program by changing the manner in which I queried the database... but it turns out I'm not the only one coding this way.

What I failed to mention was why I was processing these phones.  iPhone #1 was part of a shooting investigation.  I retrieved from a suspect vehicle and initially processed it by making an iTunes backup of the running device.  The second phone was brought to me after an up-to-date Cellebrite UFED failed to extract ANY message from the iMessage Service.  The 'madrid' fields relate to the iMessage service, and it is the madrid fields that are thrown out of order by the database schema change.  Seems that Cellebrite may have been thrown by the flag order in the same way I was.  At least I'm in good company!

This also has implications in SQLite record carving.  In the raw data, fields are laid down in the order of the schema.  If a template for carving fields from dropped SQLite records has the wrong schema (and why would you expect one iOS 5.1.1 sms.db to differ from another), then you are getting incorrect and unreliable data.

I have been working quite hard on recovering dropped records from SQLite pages and have been quite successful.  Stay tuned for what I've learned on this front...