The Google Chrome cache represents a challenge to forensic investigators.  If the extent of your examination has been to open the cache folder and view the files in a file browser, you are likely missing a lot of content.
For starters, files stored in the cache are renamed from their original names on the web-server.  Next, text elements (like HTML, JSON, etc.) are zlib compressed.  Finally, files smaller than 16384 bytes (16k) are stored in block files which are container files that hold many smaller files.  The meta-data about the cache files are stored in these container files, too, and its all mapped by a binary index file.
So, while its easy enough to point a file browser or image viewer at the cache directory and see some recognizable data structures, making sense of all that’s there can be more challenging.  In the remainder of this discussion, I’ll attempt to give you more insight into the Google Chrome cache.  This should be of interest to disk and mobile forensicators alike, as the structure is the same whether you are examining a desktop computer or a mobile device such as an Android phone or tablet.
Cache Structure
All the files in the Google Chrome cache are stored in a single folder called cache.  The cache consists of at least five files: and index file and four data files known as block files.  As I stated above, downloaded files are stored in one of the block files or directly to the cache directory, and the index keeps track of the transaction and storage location.
A cache can consist of only the five mentioned files (named index, data_0, data_1, data_2, and data_3) if all the data files in the cache are smaller than 16k.  Larger files are stored outside the block files.  Go ahead, check your cache if you don’t believe me… I’ll wait.  In case you don’t have one handy, here’s as truncated file listing from a recent Android exam I conducted, sorted by size.
Output of ls -lSr
-rw-r--r-- 1 user user 16519 Feb 20 13:06 f_00008a -rw-r--r-- 1 user user 16566 Feb 20 13:06 f_0000cf -rw-r--r-- 1 user user 16604 Feb 20 13:06 f_0000cc -rw-r--r-- 1 user user 16944 Feb 20 13:06 f_0000cd ... -rw-r--r-- 1 user user 70659 Feb 20 13:06 f_0000f7 -rw-r--r-- 1 user user 73804 Feb 20 13:06 f_00008f -rw-r--r-- 1 user user 74434 Feb 20 13:06 f_0000df ... -rw-r--r-- 1 user user 81920 Feb 20 13:06 data_0 ... -rw-r--r-- 1 user user 262512 Feb 20 13:06 index -rw-r--r-- 1 user user 1581056 Feb 20 13:06 data_1 -rw-r--r-- 1 user user 2105344 Feb 20 13:06 data_2 -rw-r--r-- 1 user user 4202496 Feb 20 13:06 data_3
| 
Warning | When one or more of the five base files get corrupted or deleted, the entire set gets recreated. I experimented by deleting a data block file and restarting Chrome. On browser restart, the entire cache was deleted and new base files were created. | 
You might be wondering about the the four data blocks.  How are they distinguished?  The answer is: size; not size of the block-files themselves, but size of the internal data blocks.  Each block file is defined to hold data in blocks, much like a file system.  And like different sized file systems can be defined with different sized blocks, so the cache data block-files are defined with different block sizes, and data can be allocated to no more than four blocks at a time before being considered too large for that block-file.
| File | Block SZ | Max Data SZ | 
|---|---|---|
| 
data_0 | 
36b | 
rankings | 
| 
data_1 | 
256b | 
1k | 
| 
data_2 | 
1k | 
4k | 
| 
data_3 | 
4k | 
16K | 
| 
Note | When a data block-file reaches maximum capacity (each file is only allowed to hold a defined number of objects) a new data block-file is created and pointed to in the previous data block-file header. | 
Cache addresses
All cached objects have an address.  The address is a 32-bit integer that describes where the data is stored.  Meta-data about the object is stored, too, and includes:
- 
HTTP Headers
 
- 
Request Data
 
- 
Entry name (Key)
 
- 
Other auxiliary information (e.g., rankings)
 
Examples of cache addresses:
0x00000000: not initialized 0x8000002A: external file f_0002A 0xA0010003: block-file number 1 (data_1), initial block number 3, 1 block of length.
| 
Important | The addresses above are ordered as they read, but on disk you will find them in little-endian format, e.g, the external file appears on disk as 0x2A000080 | 
Cache addresses are interpreted at the bit level.  That means we have to convert the 32-bit integer into bits and evaluate to understand the address.  The first 4 bits are the header, which consist of the initialized bit followed three file type bits.
| BINARY | INTEGER | INTERPRETATION | 
|---|---|---|
| 
000 | 
0 | 
separate file on disk | 
| 
001 | 
1 | 
rankings block-file | 
| 
010 | 
2 | 
256b block-file | 
| 
011 | 
3 | 
1k block-file | 
| 
100 | 
4 | 
4k block-file | 
The remaining 28-bits are interpreted according to the file type:
| Init | File Type | File # | 
|---|---|---|
| 
1 | 
000 | 
1111111111111111111111111111 | 
| Init | File Type | Reserved | Contiguous Blocks | Block File | Block# | 
|---|---|---|---|---|---|
| 
1 | 
001 | 
00 | 
11 | 
00000000 | 
1111111111111111 | 
Lets take a look at the last two cache addresses above:
External File Address
0x8000002A, interpreted as a 32-bit integer, is 2147483690.  In binary, it is 10000000000000000000000000101010.  We interpret it as follows:
| Init | File Type | File # | |
|---|---|---|---|
| 
Binary | 
1 | 
000 | 
0000000000000000000000101010 | 
| 
Integer | 
1 | 
0 | 
42 (0x2A) | 
Block File Address
0x080001A0, interpreted as a 32-bit integer, is 2684420099. In binary, it is 10100000000000010000000000000011.  We interpret it as follows:
| Init | File Type | Reserved | Contiguous Blocks | Block File | Block# | |
|---|---|---|---|---|---|---|
| 
Binary | 
1 | 
010 | 
00 | 
00 | 
00000001 | 
0000000000000011 | 
| 
Integer | 
1 | 
2 | 
0 | 
0 | 
1 | 
3 | 
| 
Note | The odd man out here is Contiguous blocks, which weighs in at zero but appears to be interpreted as 1 block as based on Google documentation. | 
There is a lot more to discuss here, but this is a good start.  You now know that the cache index holds a map of cached browser data, and that the data_# files can contain cached web object you’ve may have been missing.  I’ll cover more on following the cache map and extracting the file content of the data-blocks in a future article (or two)!
 
 
 
Thanks for sharing this valuable information to the world. I for one am very appreciative of the fruits of your mental labors!
ReplyDelete