Every registered Android mobile device has an associated Google account. Google accounts usually mean Gmail. And, for investigators interested in the Gmail content stored on Androids, that content can be found in the /data/com.google.android.gm/databases
directory in a database named in the following format:
mailstore.[GoogleAccount]@gmail.com.db
The database contains 23 tables (at least at the time of this writing), the most interesting of which is messages
.
The messages
table has 41 fields (or columns). To obtain the basic email content (say, for keyword searching), an investigator would likely want to export the sender’s and receiver’s addresses, the date sent or received, and the subject line, and the message body, at the very least. There is plenty more to be gleaned from the database, but your investigation will dictate the investigative needs.
Caution: | Automated tools do not provide the full wealth of data to be found in the mailstore database. It is always a good idea to become familiar with the database schema to learn the full potential for your investigation. |
The Big Squeeze
If you have experience searching SQLite databases, you might be thinking, "Why go to the trouble of exporting messages from the database? SQLite strings are usually UTF-8, so I can just search the database with regular expressions or plain keywords." Well, there is a catch when it comes to email content in the Gmail mailstore database: zlib compression.
Short length message bodies are written to the body
field in the messages
table as a plain text string. In a recent exam, the longest message I found in this field in a recent exam was 98 bytes, however. Longer message bodies are compressed using the zlib algorythm and stored in the bodyCompressed
field. While SQLite supports compressed databases, it has no function to decompress fields within databases. Instead, it stores such data as a blob type, and it is up to the database user to decompress the data.
Note:
|
The SQLite blob type is sort of a catch-all for any type or data. Data is stored in the format in which it was input. |
Extracting Messages
Python is a good option for exporting messages from the Gmail messagestore database. It can both open and query databases, and it can decompress the long message bodies.
import sqlite3 import zlib # open and query the database conn = sqlite3.connect('messagestore.db') # database name abbreviated c = conn.cursor c.execute("select _id, fromAddress, datetime(dateSentMS/1000, 'unixepoch', 'localtime'), datetime(dateReceivedMS/1000, 'unixepoch', 'localtime'), case when body not Null then body else bodycompressed end from messages ") rows = c.fetchall() # interate through the rows and decompress the long messages for row in rows: id, _from, sent, recv, body = row try: body = zlib.decompress(body) except: pass print('{}|{}|{}|{}|{}'.format(id, _from, sent, recv, body))
Note:
|
The final line can be adapted to your own needs, i.e., writing the content to a new file or database, or use python regular expressions to search the content, etc. |
Some Explaination
The example above is just that: an example. It is intended, like all my posts, to remind me how to process the data and demonstrate how just a few lines of python can be leveraged to extract data. The script could have been shorter, but it would have come at the cost of clarity. That said, there is still some explanation to be had:
The SQLite query in the c.execute method might need some dissecting for you to understand what I did.
select _id, fromAddress, datetime(dateSentMS/1000, 'unixepoch', 'localtime'), datetime(dateReceivedMS/1000, 'unixepoch', 'localtime'), case when body not Null then body else bodyCompressed end from messages
The dateSentMS
and dateReceivedMS
fields are recorded in milliseconds since 1/1/1970 (Unix epoch). I let SQLite do the date converstion for me, rather and doing it python, and I converted from Unix epoch to local time. The case
statement pulls a little trickery to select the body
field or bodyCompressed
field. Basically, the row’s body
field is checked to see if it is populated. If so, it is returned. If not, the contents of the bodyCompressed
field are returned.
In the body decompression section of the script, the contents of the each row are assigned to variables id
, _from
, sent
, recv
and body
. The try/except
clause attempts attempts to decompress the body. If it fails, as it will on the short message bodies, it just uses the contents of the body
variable as is. Finally, the row is printed in a pipe-delimited fashion.