Saturday, May 11, 2013

SQLite: Hidden Data in Plain Sight

Important

The title of this post is not intended to imply there are some new clever data hiding techniques for sqlite, but to alert the reader that data in plain view is often going unanalyzed or under-analyzed.

My experience in data forensics has taught me two things about SQLite databases: they are both ubiquitous and poorly understood by examiners. I find that most examiners rely on "viewers," such as SQLite Database Browser or automated tools that parse the SQLite for them. Both of these approaches can be very flawed, however.

The first practice—using viewers—is not faulty on its face. SQLite Database Browser and similar applications are a quick way to visualize the content of a database. But most examiners are using the viewers to open tables in a "flat" view. This is akin to looking at the table data in a spreadsheet: you see the content but not in relation to the data from other tables in the database (or from other databases for that matter) as the database was designed to be used. Further, the data is often not meaningful in the form in which it is stored, i.e., timestamps are often in some form of epoch time and not a human readable format.

The second practice of relying on automated tools means you are counting on a programmer (who may not be a forensic examiner) to tell you what’s important in the database, and you are relying on the coder’s SQLite skills, which might be lacking. (What follows is not a rant against a product, but just an illustration of my point) For example Cellebrite Physical Analyzer parses the iPhone sms.db (sms/iMessage) into a neat, well formed report. But it doesn’t include the rowID in the output, which is an auto-incrementing integer applied to each sent or received message. From the rowID, you can tell if intervening messages have been deleted, but you won’t know that from the automated report from Cellebrite. And what about when your automated tool doesn’t parse a database you have discovered?

Demanding Answers (Learning to Query)

The key to understanding SQLite is to learn the structure query language (SQL). One of the best on-line resources I have found for this is w3schools.com. The lessons are brief, and in less that a dozen 5-10 minute sessions, you will have the basics of SQL under your belt. And with the basics, you can accomplish much.

The query language is designed to be human readable, which, by the by, makes it easier to remember. It consists of sentences composed of subjects and predicates. For example, to view all the contents of a table:

SQLite Command Line Program (command line mode)
$ sqlite3 some.db 'select * from some_table;'
1|some_data
2|some_more_data

Using the above SQL statement, our subject is "select *" which is translated "select all fields" (or columns, if you will), and the predicate, "from some_table" is fairly self-explanatory. The table "some_table" is located in the "some.db" SQLite database. To be brief, in a SQL query, we tell the database engine what we want (subject) followed by qualifiers (predicate).

Finding SQLite Databases for Analysis

So, how do we overcome the shortcomings of the common SQLite Database analysis techniques? Let’s take a recent examination I conducted as an example. I was looking for communications recorded in an iPhone iOS v.6.1.3 backup. I had restored the backup to its original file structure (DOMAIN/path) to facilitate the analysis. After examining the sms.db, I discovered a large block of deleted messages (by RowID analysis) for the time frame that was the subject of the investigation. The messages were not recoverable (more on recovering deleted SQLite records another time).

I decided to take a look at other communications applications that might have been overlooked when the user was deleting data. Unfortunately, I don’t know all the different communications applications available for Apple mobile devices and likely never will. I can get a list of the installed applications on the Device from the Info.plist in the backup directory, but in reality, that doesn’t really help me too much because, as I said, I don’t recognize many of them. I do know, however, that most communications applications store their data in SQLite databases. So, I search for those:

BASH
$ find unback/ -type f -exec file {} \; | grep SQLite
unback/HomeDomain/Library/Voicemail/voicemail.db: SQLite 3.x database
unback/HomeDomain/Library/SMS/sms.db: SQLite 3.x database
unback/HomeDomain/Library/Safari/Bookmarks.db: SQLite 3.x database, user version
 31
unback/AppDomain-com.cardify.tinder/Documents/Tinder.sqlite: SQLite 3.x database
...
Note

I restored the iTunes backup to a directory called unback consistent with the method used by the open libimobiledevice software library, which offers a device backup/unback utility as well as others useful for iDevice analysis.

You can see that I use the find command to look for files (-type f) in the "unback" directory and the execute the file command to determine the file type. I piped the results through the grep command, filtering for "SQLite". With the command, I get a list of SQLite files, true, but I still don’t know what there all are. Some are familiar and/or obvious, but some are not. Take the last item on the abbreviated list above: Tinder by Cardify. I had never heard of Tinder (as well as many other applications that appeared in the results).

Getting a Peek Inside

It would be more informative to list the databases, and then get a look at the tables contained in each one. While table names don’t necessarily tell you the content, they can be informative while in data "triage" mode. So, how do we modify our find command to show us the databases as well as their tables?

BASH (incorporating SQLite Command Line Program)
$ find unback/ -type f | while read i; do file $i | grep -q SQLite; \
[ $? = 0 ] && (echo $i; sqlite3 $i .tables; echo); done
unback/HomeDomain/Library/Voicemail/voicemail.db
_SqliteDatabaseProperties  voicemail

unback/HomeDomain/Library/Safari/Bookmarks.db
bookmark_title_words  folder_ancestors      sync_properties
bookmarks             generations

unback/HomeDomain/Library/SMS/sms.db
_SqliteDatabaseProperties  chat_message_join
attachment                 handle
chat                       message
chat_handle_join           message_attachment_join

unback/AppDomain-com.cardify.tinder/Documents/Tinder.sqlite
ZLIKE             ZPHOTO            ZUSER             Z_METADATA
ZMESSAGE          ZPROCESSEDPHOTO   Z_5SHAREDFRIENDS  Z_PRIMARYKEY
...

Ok, that’s much more helpful. But how does the command work? Like the initial command, find is used to locate files (directories are excluded). The results of the file command are piped to a while loop, which assigns each file name to the variable i. Similar to the first command, the file command displays the file type which is filtered for "SQLite" by grep. The "-q" option in grep is used to keep grep silent; it is the exit status that is of interest.

I want the exit status, or alternatively: the return status or exit code, to perform a test. All commands, scripts and functions return an exit status, and an exit code of "0" means success. The exit status is captured in the variable ?, and recalled, like all BASH variables, by prepending with a dollar sign: $?. In the command, I test the exit status of the last command which was grep. If the regular expression "SQLite" is matched in the file command output, the grep exits with "0". The test "[ $? = 0 ]" is shorthand notation for "if the last command’s exit code is 0", then do what follows: (echo $i; sqlite3 $i .tables; echo), i.e., print the file name, print the tables of the database, and then print a blank line (for readability). I discussed while loops in a previous post if you have more interest, or you could look here.

Making Sense of the Data

From the output thus far, I see that Tinder has a message table, as well as user and shared friends tables. Looks like it is a social networking application, and the data might be relevant to the investigation. So, how do I get see the contents of the tables? I could look at one table at a time:

SQLite Command Line Program
$ sqlite3 -header Tinder.sqlite 'select * from zmessage limit 5;'
Z_PK|Z_ENT|Z_OPT|ZINBOUND|ZUSER|ZCREATIONDATE|ZBODY
1|2|1|1|832|379798036.741|Hi!
2|2|1|1|1156|379797384.794|hey!
3|2|1|1|832|379798729.794|What's doin?
4|2|1|1|1318|379804817.728|Are you online often?
5|2|1|0|1318|379806963.685|Wouldn't you like to know?!
...

In the command above, I used the SQLite "-header" option to display the column titles, and I limited the output to five records to get a sense of the data. Dropping the limit syntax would result in the entire table and all record fields being printed to standard output (the screen).

Is there a way we could quickly look at a few records of every table to see what is of interest, if anything? You bet!

SQLite Command Line Program
$ for i in $(sqlite3 Tinder.sqlite .tables); do echo Table: $i; \
sqlite3 Tinder.sqlite -header "select * from $i limit 5;"; echo; done
...
Table: ZUSER
Z_PK|Z_ENT|Z_OPT|ZCOMMONFRIENDCOUNT|ZCOMMONLIKECOUNT|ZGENDER|ZHASIMAGE|
ZHASUNVIEWEDMESSAGES|ZISACTIVE|ZISMATCH|ZISRECOMMENDED|ZISUNSEENNEWMATC
H|ZSERVERMESSAGECOUNT|ZBIRTHDATE|ZCHATLASTVIEWED|ZDISTANCEINMILES|ZLAST
ACTIVITYDATE|ZMATCHEDDATE|ZPINGTIME|ZBIO|ZFACEBOOKID|ZMATCHID|ZNAME|ZUS
ERID|ZIMAGE
42|5|126|8|0|0|0||0|1|1||0|||76.4215774536133|379794350.747|379794350.7
47|379794187.955|I like pie.|604832678|50f2fc2fbe8d00b3d4f58c36|Gunter|
50d39d6024571b7803001639|
90|5|1|0|0||0||0|0|0||0|||0.0|||||#########||Gretta||
91|5|1|0|0||0||0|0|0||0|||0.0|||||#########||Hilde||
92|5|1|0|0||0||0|0|0||0|||0.0|||||#########||Agnes||
93|5|1|0|0||0||0|0|0||0|||0.0|||||#########||Johanna||

Table: ZMESSAGE
Z_PK|Z_ENT|Z_OPT|ZINBOUND|ZUSER|ZCREATIONDATE|ZBODY
1|2|1|1|832|379798036.741|Hi!
2|2|1|1|1156|379797384.794|hey!
3|2|1|1|832|379798729.794|What's doin?
4|2|1|1|1318|379804817.728|Are you online often?
5|2|1|0|1318|379806963.685|Wouldn't you like to know?!

Table: Z_5SHAREDFRIENDS
Z_5SHAREDFRIENDS|REFLEXIVE
42|90
42|91
42|92
42|93
42|94

Briefly, the primary difference in the last command from those executed earlier is the use of a for loop. The for loop takes the ouput of the SQLite .tables command,

The output above demonstrates the relational nature of SQLite databases. Looking at the ZMESSAGE table, we see the message content, but the user is an integer (ZUSER field). The integer appears correlate to the ZUSER table (Z_PK field). Just looking at the ZMESSAGE table, we see the conversation but we don’t know with whom it occurred.

SQLite lets us query the tables in relation to make more meaningful output.

SQLite Command Line Program
$ sqlite3 Tinder.sqlite 'select m.z_pk, zinbound, zuser, zname, \
zcreationdate, zbody from zmessage as m, zuser as u where \
m.zuser = u.z_pk limit 5;'
Z_PK|ZINBOUND|ZUSER|ZNAME|ZCREATIONDATE|ZBODY
1|1|832|Tobias|379798036.741|Hi!
2|1|1156|Siegfried|379797384.794|hey!
3|1|832|Tobias|379798729.794|What's doin?
4|1|1318|Theoduff|379804817.728|Are you online often?
5|0|1318|Theoduff|379806963.685|Wouldn't you like to know?!

In this command, I specified the fields I wanted returned as opposed to all fields. This is necessary whenever you are relating tables to one another. You may have noticed that in the predicate I queried both the ZMESSAGE and ZUSER tables. The "as" statements create aliases to the tables (zmessage = m, zuser = u) to keep the command more concise. In the select statement, I asked for the ZNAME field, which is located in the ZUSER table where ZUSER from the ZMESSAGE table matched Z_PK from the ZUSER table. In the select statement, I had to specify the Z_PK (m.z_pk) field from the ZMESSAGE table because both tables contain that field.

Two more fields don’t have much meaning in our result: ZINBOUND and ZCREATIONDATE. ZINBOUND is a flag, that with some context, lead me to understand that 0 = sent and 1 = received. ZCREATIONDATE appears from its value to be Mac Absolute Time and file system timestamps support this evaluation. The case expression can be used to interpret the flags. It is the equivalent to an if/then statement in scripting languages. The datetime function converts the unix epoch to a human-readable date. Because the values in the Tinder database are Mac Absolute Time, the timestamps have to first be converted to unix epoch by adding 978307200 seconds.

SQLite Command Line Program
$ sqlite3 -header Tinder.sqlite 'select m.z_pk, case zinbound when 0 \
then "sent" when 1 then "received" else "unknown" end as zinbound, \
zuser, zname, datetime(zcreationdate + 978307200, "unixepoch", \
"localtime") as zcreationdate, zbody from zmessage as m, zuser as u \
where m.zuser = u.z_pk limit 5;'
Z_PK|ZINBOUND|ZUSER|ZCREATIONDATE|ZBODY
1|received|832|Tobias|2013-01-13 11:27:16|Hi!
2|received|1156|Siegfried|2013-01-13 11:16:24|hey!
3|received|832|Tobias|2013-01-13 11:38:49|What's doin?
4|received|1318|Theoduff|2013-01-13 13:20:17|Are you online often?
5|sent|1318|Theoduff|2013-01-13 13:56:03|Wouldn't you like to know?!

Now we have meaningful data by relating two tables, interpreting flags (case expression), and converting timestamps (datetime function). You can find more information about the case expression here, and the datetime function here.

Summing Up

I covered a lot of ground in this post, from using find, file, grep, and while and for loops to basic and intermediate SQLite Command Line Program usage. I left a lot of explanation out of the discussion, and I barely scratched the surface of SQLite analysis. My goal was to: - highlight the fact that automated tool and viewer users are likely leaving a lot of data on the table (pardon the pun) - show how command line tools can be used to rapidly locate and evalute SQLite databases - demonstrate that learning SQLite queries will go a long way to filling the gap left wide open by automated tools - encourage you to learn more about SQLite and improve your investigative skills.

I hope to start delving into more specific SQLite analysis topic in future posts.

Happy Querying!


Thursday, March 21, 2013

Xmount: When "Changing" the Evidence isn't so Bad

"Do no harm" is the modern translation of the Hippocratic Oath which is applied to physicians. But it has application in data forensics as well. It takes shape in the edicts that require write-blocking be used during the acquisition of data sources and analysis to be done on copies rather than original data. (I’ll leave the very valid discussion about triage through direct examination of data sources aside for another time. We’re talking general principles here.)

Can we ever change the evidence?

The short answer is, "No." We should never change the original data, period. But that doesn’t mean that we can’t render a copy of the data readable. After all, it is better to read the data with the programs intended to use the data… that way we know we are rendering the information as it was intended to be read. And if there is a way to repair a file or file system in a manner that doesn’t change the substantive content, should we not consider that option?

Enough vagaries. Let’s get down to brick and mortar to make this point. I’ve previously discussed using xmount to run the operating systems encapsulated in forensic images. In that situation, xmount uses a cache file to record and read changes to the file system that necessarily occur when the operating system and applications are running. Because the changes are written to the cache, the forensic image is unchanged.

Yesterday, I encountered another use for xmount when examining an image of an eMMC NAND chip from a Samsung Galaxy S3. I attempted to mount the 12GB userdata partition for analysis, but mounting failed. This may have happened to you in the past: you attempted to mount a file system the way you always do, but for unknown reasons, your command failed. What to do?

When Standard Procedure Fails

Let’s take my S3 image for example. My goal was to access the userdata partition and extract the files. The S3 is supposed to have ext4 partitions which I should be able to mount and examine in Linux.

BASH
$ mmls image
GUID Partition Table (EFI)
Offset Sector: 0
Units are in 512-byte sectors

     Slot    Start        End          Length       Description
00:  Meta    0000000000   0000000000   0000000001   Safety Table
01:  -----   0000000000   0000008191   0000008192   Unallocated
02:  Meta    0000000001   0000000001   0000000001   GPT Header
03:  Meta    0000000002   0000000033   0000000032   Partition Table
04:  00      0000008192   0000131071   0000122880   modem
05:  01      0000131072   0000131327   0000000256   sbl1
06:  02      0000131328   0000131839   0000000512   sbl2
07:  03      0000131840   0000132863   0000001024   sbl3
08:  04      0000132864   0000136959   0000004096   aboot
09:  05      0000136960   0000137983   0000001024   rpm
10:  06      0000137984   0000158463   0000020480   boot
11:  07      0000158464   0000159487   0000001024   tz
12:  08      0000159488   0000160511   0000001024   pad
13:  09      0000160512   0000180991   0000020480   param
14:  10      0000180992   0000208895   0000027904   efs
15:  11      0000208896   0000215039   0000006144   modemst1
16:  12      0000215040   0000221183   0000006144   modemst2
17:  13      0000221184   0003293183   0003072000   system
18:  14      0003293184   0028958719   0025665536   userdata
19:  15      0028958720   0028975103   0000016384   persist
20:  16      0028975104   0030695423   0001720320   cache
21:  17      0030695424   0030715903   0000020480   recovery
22:  18      0030715904   0030736383   0000020480   fota
23:  19      0030736384   0030748671   0000012288   backup
24:  20      0030748672   0030754815   0000006144   fsg
25:  21      0030754816   0030754831   0000000016   ssd
26:  22      0030754832   0030765071   0000010240   grow
27:  -----   0030765072   0030777343   0000012272   Unallocated

$ sudo mount -o ro,loop,offset=$((3293184*512)) $image /mnt
mount: wrong fs type, bad option, bad superblock on /dev/loop0,
       missing codepage or helper program, or other error
       In some cases useful info is found in syslog - try
       dmesg | tail  or so
$
Note

I’m i’ve created a link from image to the original raw device image to keep commands simple.

So, what happened here? I used the Sleuthkit mmls tool to read the partition table. I located the partition of interest - userdata - and tried to mount it read-only. I did not specify a file system type but instead let mount auto-magically determine it. I used the mount options of ro (read-only), loop (to create loopback device), and provided the offset to the partition. Since the offset required by mount is in bytes, I used shell math to translate the sector offset provided in the mmls output to bytes. But, in the end, mount did not appear to recognize the partition.

What do we do in such situation? We could use the mount -v verbose flag to try to determine what’s wrong.

BASH
$ sudo mount -v -o ro,loop,offset=$((3293184*512)) image /mnt
mount: enabling autoclear loopdev flag
mount: going to use the loop device /dev/loop0
mount: you didn't specify a filesystem type for /dev/loop0
       I will try type ext4
mount: wrong fs type, bad option, bad superblock on /dev/loop0,
       missing codepage or helper program, or other error
       In some cases useful info is found in syslog - try
       dmesg | tail  or so
$

In this case, verbose output is not much help other than showing that mount attempted to use ext4 as the file system type. And, though ext4 is what I expected for the partition, too, maybe it is wrong. Short of a hex editor, how can we check a partition type in a disk image?

The file command is a well-known tool for providing file types by reading the file magic (the file’s hexadecimal signature). But did you know it will tell you partition types as well?

BASH
$ img_cat image -s 3293184 | file -
/dev/stdin: Linux rev 1.0 ext4 filesystem data,
UUID=57f8f4bc-abf4-655f-bf67-946fc0f9f25b (needs journal recovery)
(extents) (large files)
$

The img_cat tool is another member of the Sleuthkit tool chest. It exports data from an image to stdout in blocks. Here we provide the starting sector offset to the userdata partition and pipe it to the file command. The hyphen following the file command to tells file to use stdout as input rather than a file.

What did we learn here? Well, though there is plenty of information, two values are of particular interest. First, the file system is in fact formatted ext4. Second, It appears that the file system journal is damaged. There is the smoking gun for our mounting problem.

So, we need a way to fix the journal so we can mount the partition, but we must not alter the original image. We have a few options here: * make a copy of the image * make a copy of just the partition Initially, making copies doesn’t sound too bad. After all, we’re only talking about 16GB for the image or 12GB for the partition. But what if this were a 250gb image, or larger? That sounds less palatable. Further, either action could consume a lot of resources in the form of drive space, processing power and time.

Important Note

Reader Carlos (with credit to Hal Pomerantz) correctly points out that the dirty journal issue can be avoid in the mount command by passing the noload option. In fact, according to the mount man page, noload is a good option to invoke whenever mounting ext3/4 read only to ensure no writes with dirty filesystems.

In our example, the command would be: sudo mount -o ro,noload,loop,offset=$3293184*512 image /mnt

This fact does not invalidate the rest of this discussion. You file system may need more significat repairs, such a repairing a partition table, and these repairs can still be effected with the technique that follows. As always, pick the path the meets the needs of your investigation.

Xmount to the Rescue

What if there was a way to fix the journal without taking the time and resources mentioned above? Enter xmount. Just as xmount can create a cache file to capture changes in an OS running from an image, it can capture changes when repairing a file system. We can quickly mount and repair the file system and leave the original image none-the-worse for wear. And because the blessings of the fuse file system driver on which xmount is built, we’ve only consumed new disk space for the xmount cache compared to full disk images and partitions.

BASH
$ md5sum image
15a9134d72a5590cf6589e9c9702a5ba  image
$

We start with an MD5 baseline of the image file. We’ll use this to determine if xmount allows any changes to the image.

BASH
$ sudo xmount --in dd  --out dd -o ro,allow_other \
--cache image.cache image /mnt
$ ls /mnt
image.dd  image.info
$ mount
...
xmount on /mnt type fuse.xmount (rw,nosuid,nodev,allow_other)
$

We use xmount to create a virtual image from our original image file. The --in option specifies the format of the original image. The input format can be Expert Witnes Format (ewf), Advanced Forensic Format (AFF) or raw (dd). The --out option specifies the format of the virtual image and can be raw (dd) or any one of the following virtual machine formats: vdi, vhd, or vdmk. The fuse -o allow_other option gives access to the virtual file system to all users (not just root). The final option --cache specifies the file to use for disk caching (image.cache). Then, much like the mount command, we specify the input file (image) and the mount point (/mnt).

The result of out xmount command is a virtual disk file being created in the /mnt folder that is accessible to normal users. Forget this option and you’ll have issues with listing the /mnt directory as a normal user. The name of the virtual disk image is the input file name appended with the format type. Thus, "image" became "image.dd". In the /mnt folder is also an .info file with image information.

The virtual image doesn’t consume real disk space. It is mounted read-write because we are going to fix the journal, but don’t fret, by passing the --cache image.cache option to xmount, we told xmount to capture the changes in the image.cache file. The image.cache file does not need to previously exist; xmount will create it for us.

The virtual disk image can be accessed just like the original image.

BASH
$ mmls /mnt/image.dd
GUID Partition Table (EFI)
Offset Sector: 0
Units are in 512-byte sectors

     Slot    Start        End          Length       Description
00:  Meta    0000000000   0000000000   0000000001   Safety Table
01:  -----   0000000000   0000008191   0000008192   Unallocated
02:  Meta    0000000001   0000000001   0000000001   GPT Header
03:  Meta    0000000002   0000000033   0000000032   Partition Table
04:  00      0000008192   0000131071   0000122880   modem
05:  01      0000131072   0000131327   0000000256   sbl1
06:  02      0000131328   0000131839   0000000512   sbl2
07:  03      0000131840   0000132863   0000001024   sbl3
08:  04      0000132864   0000136959   0000004096   aboot
09:  05      0000136960   0000137983   0000001024   rpm
10:  06      0000137984   0000158463   0000020480   boot
11:  07      0000158464   0000159487   0000001024   tz
12:  08      0000159488   0000160511   0000001024   pad
13:  09      0000160512   0000180991   0000020480   param
14:  10      0000180992   0000208895   0000027904   efs
15:  11      0000208896   0000215039   0000006144   modemst1
16:  12      0000215040   0000221183   0000006144   modemst2
17:  13      0000221184   0003293183   0003072000   system
18:  14      0003293184   0028958719   0025665536   userdata
19:  15      0028958720   0028975103   0000016384   persist
20:  16      0028975104   0030695423   0001720320   cache
21:  17      0030695424   0030715903   0000020480   recovery
22:  18      0030715904   0030736383   0000020480   fota
23:  19      0030736384   0030748671   0000012288   backup
24:  20      0030748672   0030754815   0000006144   fsg
25:  21      0030754816   0030754831   0000000016   ssd
26:  22      0030754832   0030765071   0000010240   grow
27:  -----   0030765072   0030777343   0000012272   Unallocated
$

We can automatically repair the journal by mounting the /userdata partition read-write. Once its repaired and mounted, we can remount as read-only for analysis.

BASH
$ mkdir mnt
$ sudo mount -o loop,offset=$((3293184*512)) /mnt/image.dd mnt
$ mount
...
xmount on /mnt type fuse.xmount (rw,nosuid,nodev,allow_other)
$ sudo mount -o remount,ro mnt
$ mount
...
/mnt/image.dd on /home/user/mnt type ext4 (ro)
$ ls -S mnt/
data
dalvik-cache
smart_stay.dmc
anr
app
app-asec
app-private
audio
backup
BackupPlus
bluetooth
bms
clipboard
dontpanic
drm
fota
fota_test
local
log
lost+found
media
misc
property
...
$

First, I created a new directory in the current working directory called "mnt". Don’t confuse this with the /mnt directory where the virtual disk image is located. Like before, we used the mount command to create a loopback device and address the partition by offset. This time, we did not set the read-only flag, and we specified a new directory for the partion since we were using /mnt to host the virtual disk. This time, we succeeded in mounting the partition, and then we immediately remounted read-only to avoid making further changes.

Wrapping Up

In summary, we tried to mount a partition in our original disk image, but it failed. We determined the partition had a damaged journal, so we created a virtual disk image to effect repairs. Then we mounted the repaired partition and listed the root directory. But did we do no harm?

BASH
$ md5sum image /mnt/image.dd
15a9134d72a5590cf6589e9c9702a5ba  image
2e16cbbeefc9e33bc754b47d2f8a4da0  /mnt/image.dd
$

The original image hash remains unchanged. The xmounted image shows its been changed. So xmount has done its job, protecting the original image while allowing us to repair the partition in the virtual image!

Oh, and what about that cache file? How much real space did we use when we created and repaired the virtual image?

BASH
$ ls -lh image.cache
-rw-r--r-- 1 root root 40M Mar 21 15:51 image.cache
$

Yep, a whole 40mb was used to repair the journal and mount a 12GB partition. That’s not too shabby, and you didn’t wait for a long copy operation of halve your storage capacity!

Xmount Cache Caveats

A quick sidebar on the xmount cache file. The cache can be reused, meaning that the changes from the last session are brought forward to the next session. In plain terms, if we unmount the userdata/ partition and then the virutal image, but later remount the image while pointing to the cache file we previous created with the --cache option, the file system will remain repaired. If we want to start afresh, we would use the overwrite cache option, or --owcache. Finally, we don’t need to specify a cache at all if changes are not necessary, and, in fact, this is the manner I usually employ xmount.


Sunday, February 24, 2013

Cashing in on the Google Chrome Cache

The Google Chrome cache represents a challenge to forensic investigators. If the extent of your examination has been to open the cache folder and view the files in a file browser, you are likely missing a lot of content.

For starters, files stored in the cache are renamed from their original names on the web-server. Next, text elements (like HTML, JSON, etc.) are zlib compressed. Finally, files smaller than 16384 bytes (16k) are stored in block files which are container files that hold many smaller files. The meta-data about the cache files are stored in these container files, too, and its all mapped by a binary index file.

So, while its easy enough to point a file browser or image viewer at the cache directory and see some recognizable data structures, making sense of all that’s there can be more challenging. In the remainder of this discussion, I’ll attempt to give you more insight into the Google Chrome cache. This should be of interest to disk and mobile forensicators alike, as the structure is the same whether you are examining a desktop computer or a mobile device such as an Android phone or tablet.

Cache Structure

All the files in the Google Chrome cache are stored in a single folder called cache. The cache consists of at least five files: and index file and four data files known as block files. As I stated above, downloaded files are stored in one of the block files or directly to the cache directory, and the index keeps track of the transaction and storage location.

A cache can consist of only the five mentioned files (named index, data_0, data_1, data_2, and data_3) if all the data files in the cache is smaller than 16k. Larger files are stored outside the block files. Go ahead, check your cache if you don’t believe me… I’ll wait. In case you don’t have one handy, here’s as truncated file listing from a recent Android exam I conducted, sorted by size.

Output of ls -lSr
-rw-r--r-- 1 user user   16519 Feb 20 13:06 f_00008a
-rw-r--r-- 1 user user   16566 Feb 20 13:06 f_0000cf
-rw-r--r-- 1 user user   16604 Feb 20 13:06 f_0000cc
-rw-r--r-- 1 user user   16944 Feb 20 13:06 f_0000cd
...
-rw-r--r-- 1 user user   70659 Feb 20 13:06 f_0000f7
-rw-r--r-- 1 user user   73804 Feb 20 13:06 f_00008f
-rw-r--r-- 1 user user   74434 Feb 20 13:06 f_0000df
...
-rw-r--r-- 1 user user   81920 Feb 20 13:06 data_0
...
-rw-r--r-- 1 user user  262512 Feb 20 13:06 index
-rw-r--r-- 1 user user 1581056 Feb 20 13:06 data_1
-rw-r--r-- 1 user user 2105344 Feb 20 13:06 data_2
-rw-r--r-- 1 user user 4202496 Feb 20 13:06 data_3
Warning
When one or more of the five base files get corrupted or deleted, the entire set gets recreated. I experimented by deleting a data block file and restarting Chrome. On browser restart, the entire cache was deleted and new base files were created.

You might be wondering about the the four data blocks. How are they distinguished? The answer is: size; not size of the block-files themselves, but size of the internal data blocks. Each block file is defined to hold data in blocks, much like a file system. And like different sized file systems can be defined with different sized blocks, so the cache data block-files are defined with different block sizes, and data can be allocated to no more than four blocks at a time before being considered too large for that block-file.

Table 1. Default Data Block-file sizes
File Block SZ Max Data SZ

data_0

36b

rankings

data_1

256b

1k

data_2

1k

4k

data_3

4k

16K

Note
When a data block-file reaches maximum capacity (each file is only allowed to hold a defined number of objects) a new data block-file is created and pointed to in the previous data block-file header.

Cache addresses

All cached objects have an address. The address is a 32-bit integer that describes where the data is stored. Meta-data about the object is stored, too, and includes:

  • HTTP Headers

  • Request Data

  • Entry name (Key)

  • Other auxiliary information (e.g., rankings)

Examples of cache addresses:
0x00000000: not initialized
0x8000002A: external file f_0002A
0xA0010003: block-file number 1 (data_1), initial block number 3, 1 block of length.
Important
The addresses above are ordered as they read, but on disk you will find them in little-endian format, e.g, the external file appears on disk as 0x2A000080

Cache addresses are interpreted at the bit level. That means we have to convert the 32-bit integer into bits and evaluate to understand the address. The first 4 bits are the header, which consist of the initialized bit followed three file type bits.

Table 2. File Types
BINARY INTEGER INTERPRETATION

000

0

separate file on disk

001

1

rankings block-file

010

2

256b block-file

011

3

1k block-file

100

4

4k block-file

The remaining 28-bits are interpreted according to the file type:

Table 3. Separate File
Init File Type File #

1

000

1111111111111111111111111111

Table 4. Block File
Init File Type Reserved Contiguous Blocks Block File Block#

1

001

00

11

00000000

1111111111111111

Lets take a look at the last two cache addresses above:

External File Address

0x8000002A, interpreted as a 32-bit integer, is 2147483690. In binary, it is 10000000000000000000000000101010. We interpret it as follows:

Table 5. Binary Interpretation
Init File Type File #

Binary

1

000

0000000000000000000000101010

Integer

1

0

42 (0x2A)

Block File Address

0x080001A0, interpreted as a 32-bit integer, is 2684420099. In binary, it is 10100000000000010000000000000011. We interpret it as follows:

Table 6. Binary Interpretation
Init File Type Reserved Contiguous Blocks Block File Block#

Binary

1

010

00

00

00000001

0000000000000011

Integer

1

2

0

0

1

3

Note
The odd man out here is Contiguous blocks, which weighs in at zero but appears to be interpreted as 1 block as based on Google documentation.

There is a lot more to discuss here, but this is a good start. You know know that the cache index holds a map of cached browser data, and that the data_# files can contain cached web object you’ve may have been missing. I’ll cover more on following the cache map and extracting the file content of the data-blocks in a future article (or two)!