Saturday, January 26, 2013

Android Messaging: Is Android Getting Religious?

"Cleanliness is next to Godliness," it is often said. And if you believe that, then you might think the Android operating system is seeking after the divine when it comes to its messaging service. Why do I say that? Because in my quest for a thorough understanding of SQLite databases, I discovered that the mmssms.db, Android’s built-in messaging database, has the auto-vacuum option enabled! And in Full-mode at that!

SQLite Vacuum

In SQLite, Vacuum is an operation that rebuilds the entire database. Frequent updates, deletions and insertions can leave the database file fragmented. Vacuum reduces the size of fragmented databases by copying the active records to a temporary file and then overwriting the original database file. During this process, it uses the rollback journal or write-ahead log as it would for any database transaction.
SQLite has two auto-vacuum modes, full and incremental. The auto-vacuum mode can only be set when the database is created. The setting is stored in the database header (the first 100 bytes of the database file), at file offset 52. If the 32-bit, big-endian integer at offset is non-zero, it represents the address (page number) of the largest root b-tree page. For this discussion, the significance of the non-zero value is that database auto-vacuum is enabled.
The 32-bit, big-endian File offset 64 indicates the auto-vacuum mode. An non-zero value means the database is set for incremental vacuum mode, while a zero value means full mode.
Figure 1. SQLite Database Header
Note
Don’t be fooled by shortcutting your analysis by jumping straight to offset 64 to check the value. A zero value and offset 64 coupled with a zero value at offset 52 means auto-vacuum is not enabled!

SQLite Structure

Before we can really understand the SQLite vacuum operation, we have to first understand a little bit about how SQLite manages its data. SQLite organizes itself into pages. The page sizes usually match the underlying file system block size and can be determined definitively by the 16-bit, big-endian integer located at file offset 16. Each page has a single purpose and can be any one of the following types:
  • Lock-byte
  • Freelist
    • trunk
    • leaf
  • B-tree
    • table interior
    • table leaf
    • index interior
    • index leaf
  • Payload overflow
  • Pointer map
For this discussion, we need to know about about the freelist pages and B-Tree table pages.

Freelist Pages

When data is deleted, or dropped, from the SQLite database, the database file does not get smaller (absent a vacuum operation). The database notes the location of the free space and reuses it as needed. The freelist contains the addresses, by page number, of full pages no longer being used to store data. The number of freelist pages in the database is store in the database file header as a 32-bit, big-endian integer at file offset 36.
Freelist trunk pages store the addresses—by page number, not offset—to the next trunk page, if any, and to freelist leaf pages. The freelist leaf pages are the pages that once stored data, that is, they were once B-Tree pages.

B-Tree Pages

Table records and structures are stored in B-Tree pages. B-Tree pages have headers that describe the data in the page:
  • Byte offset to first freeblock (unallocated space between the records)
    • Free blocks are chained together, each one pointing to the next
  • Number of cells on the page
    • B-Tree table leaf pages contain cells with table data
  • Offset to the first cell on the page
  • Number of fragmented free bytes
    • May not exceed 60 bytes
A cell pointer array follows immediately after the B-Tree page header. The array is a list of offsets to the allocated cells on the page. Cells are self describing, using integers to describe things like the cell length, the unique record index number (ROWID), and the cell payload content (by means of a record header). Not all B-Tree pages contain the table data that is the usual subject of an examination, but those that do can be identified by the page header.
The take away here is that it is B-Tree pages that contain table data. B-Tree pages can contain both allocated and unallocated space, and become fragmented when one record is dropped from the midst of other records. All records may be deleted from a B-Tree table leaf page making it subject to becoming a Freelist page.
Note
A SQLite database may reorganize, or defragment a page so there are no freeblocks or byte fragments (groups of three or less bytes), packing all the allocated cells at the end of the page. This is an internal housekeeping function independent of the vacuum function.

A Tale of Two Modes

As I already stated, auto-vacuum comes in two flavors: Full and Incremental. So, what is the difference and how does it affect our examinations?

Auto-Vacuum: Full Mode

In full auto-vacuum mode, every transaction commit to the database causes the pages in the freelist to be moved to the end of the database, and the database is truncated to remove the pages. It is important to distinguish that only the freelist pages are removed, not the fragmented B-Tree pages. Also, Full auto-vacuum does not cause B-Tree page defragmentation to occur.

Auto-Vacuum: Incremental Mode

In Incremental auto-vacuum mode, vacuuming does not occur with every commit. Instead, the database programatically receives a command to remove N pages from the freelist. The pages are moved to the end of the database, and the database is truncated. The page references are removed from the free list. If there are fewer pages in the list than required by the command, all the freepages are moved and truncated.

So What’s the Big Deal?

I started this discussion by noting that I had discovered that the Android mmssms.db was set to full auto-vacuum mode. This means that every commit to the database could cause dropped records in freepages to be moved to the end of the database and dropped off a cliff. Tools designed to recover dropped records from logigal SQLite databases won’t recover the records because they are no longer part of the database! And logical file extraction tools won’t recover the deleted pages, either.
Think of it like this: A drug dealer is seen conducting a transaction and flees when approached by police. He momentarily escapes, and takes the opportunity to delete all his text messages should he be captured. Sure enough, the good guys find him. While he’s being pat down, one of his customers texts the internationally recognized "do you have any drugs?" abbreviation: "Wuz up?"
Wuz up? You just helped the drug dealer remove all dropped records from his messing database, that’s WUZ UP!