Sunday, February 24, 2013

Cashing in on the Google Chrome Cache

The Google Chrome cache represents a challenge to forensic investigators. If the extent of your examination has been to open the cache folder and view the files in a file browser, you are likely missing a lot of content.
For starters, files stored in the cache are renamed from their original names on the web-server. Next, text elements (like HTML, JSON, etc.) are zlib compressed. Finally, files smaller than 16384 bytes (16k) are stored in block files which are container files that hold many smaller files. The meta-data about the cache files are stored in these container files, too, and its all mapped by a binary index file.
So, while its easy enough to point a file browser or image viewer at the cache directory and see some recognizable data structures, making sense of all that’s there can be more challenging. In the remainder of this discussion, I’ll attempt to give you more insight into the Google Chrome cache. This should be of interest to disk and mobile forensicators alike, as the structure is the same whether you are examining a desktop computer or a mobile device such as an Android phone or tablet.

Cache Structure

All the files in the Google Chrome cache are stored in a single folder called cache. The cache consists of at least five files: and index file and four data files known as block files. As I stated above, downloaded files are stored in one of the block files or directly to the cache directory, and the index keeps track of the transaction and storage location.
A cache can consist of only the five mentioned files (named index, data_0, data_1, data_2, and data_3) if all the data files in the cache are smaller than 16k. Larger files are stored outside the block files. Go ahead, check your cache if you don’t believe me… I’ll wait. In case you don’t have one handy, here’s as truncated file listing from a recent Android exam I conducted, sorted by size.
Output of ls -lSr
-rw-r--r-- 1 user user   16519 Feb 20 13:06 f_00008a
-rw-r--r-- 1 user user   16566 Feb 20 13:06 f_0000cf
-rw-r--r-- 1 user user   16604 Feb 20 13:06 f_0000cc
-rw-r--r-- 1 user user   16944 Feb 20 13:06 f_0000cd
...
-rw-r--r-- 1 user user   70659 Feb 20 13:06 f_0000f7
-rw-r--r-- 1 user user   73804 Feb 20 13:06 f_00008f
-rw-r--r-- 1 user user   74434 Feb 20 13:06 f_0000df
...
-rw-r--r-- 1 user user   81920 Feb 20 13:06 data_0
...
-rw-r--r-- 1 user user  262512 Feb 20 13:06 index
-rw-r--r-- 1 user user 1581056 Feb 20 13:06 data_1
-rw-r--r-- 1 user user 2105344 Feb 20 13:06 data_2
-rw-r--r-- 1 user user 4202496 Feb 20 13:06 data_3
Warning
When one or more of the five base files get corrupted or deleted, the entire set gets recreated. I experimented by deleting a data block file and restarting Chrome. On browser restart, the entire cache was deleted and new base files were created.
You might be wondering about the the four data blocks. How are they distinguished? The answer is: size; not size of the block-files themselves, but size of the internal data blocks. Each block file is defined to hold data in blocks, much like a file system. And like different sized file systems can be defined with different sized blocks, so the cache data block-files are defined with different block sizes, and data can be allocated to no more than four blocks at a time before being considered too large for that block-file.
Table 1. Default Data Block-file sizes
File Block SZ Max Data SZ
data_0
36b
rankings
data_1
256b
1k
data_2
1k
4k
data_3
4k
16K
Note
When a data block-file reaches maximum capacity (each file is only allowed to hold a defined number of objects) a new data block-file is created and pointed to in the previous data block-file header.

Cache addresses

All cached objects have an address. The address is a 32-bit integer that describes where the data is stored. Meta-data about the object is stored, too, and includes:
  • HTTP Headers
  • Request Data
  • Entry name (Key)
  • Other auxiliary information (e.g., rankings)
Examples of cache addresses:
0x00000000: not initialized
0x8000002A: external file f_0002A
0xA0010003: block-file number 1 (data_1), initial block number 3, 1 block of length.
Important
The addresses above are ordered as they read, but on disk you will find them in little-endian format, e.g, the external file appears on disk as 0x2A000080
Cache addresses are interpreted at the bit level. That means we have to convert the 32-bit integer into bits and evaluate to understand the address. The first 4 bits are the header, which consist of the initialized bit followed three file type bits.
Table 2. File Types
BINARY INTEGER INTERPRETATION
000
0
separate file on disk
001
1
rankings block-file
010
2
256b block-file
011
3
1k block-file
100
4
4k block-file
The remaining 28-bits are interpreted according to the file type:
Table 3. Separate File
Init File Type File #
1
000
1111111111111111111111111111
Table 4. Block File
Init File Type Reserved Contiguous Blocks Block File Block#
1
001
00
11
00000000
1111111111111111
Lets take a look at the last two cache addresses above:

External File Address

0x8000002A, interpreted as a 32-bit integer, is 2147483690. In binary, it is 10000000000000000000000000101010. We interpret it as follows:
Table 5. Binary Interpretation
Init File Type File #
Binary
1
000
0000000000000000000000101010
Integer
1
0
42 (0x2A)

Block File Address

0x080001A0, interpreted as a 32-bit integer, is 2684420099. In binary, it is 10100000000000010000000000000011. We interpret it as follows:
Table 6. Binary Interpretation
Init File Type Reserved Contiguous Blocks Block File Block#
Binary
1
010
00
00
00000001
0000000000000011
Integer
1
2
0
0
1
3
Note
The odd man out here is Contiguous blocks, which weighs in at zero but appears to be interpreted as 1 block as based on Google documentation.
There is a lot more to discuss here, but this is a good start. You now know that the cache index holds a map of cached browser data, and that the data_# files can contain cached web object you’ve may have been missing. I’ll cover more on following the cache map and extracting the file content of the data-blocks in a future article (or two)!

Time Perspective

Telling time in forensic computing can be complicated. User interfaces hide the complexity, usually displaying time stamps in a human reada...