Wednesday, January 7, 2015

Getting Attached: Apple Messaging Attachments

I sometimes get questions about showing attachments in Apple iDevice messaging databases. The questions, however, seem to come at a time when I don’t have any databases on hand to study the issue. Well, this week I stumbled on the chats.db during an exam of a MacBook Air. The database contains iMessage and SMS messages, and can be configured to sync with the user’s other iDevices (iPhone, for example) through iCloud. So, I took a look at the database an determined a way to match the attachments with the messages.

The chats.db is found in the users directory in the Library/Messages folder.

Location of chats.db
Library/Messages/
Library/Messages/Attachments
Library/Messages/chat.db
Library/Messages/chat.db-shm
Library/Messages/chat.db-wal

As you can see, message attachments are located in the Attachments sub-folder. But how are they referenced in the chats.db, and how are they matched to the correct message? The database schema gives us the clues we need.

The chats.db table schema
CREATE TABLE _SqliteDatabaseProperties
    (key TEXT,
    value TEXT,
    UNIQUE(key));

CREATE TABLE chat
    (ROWID INTEGER PRIMARY KEY AUTOINCREMENT,
    guid TEXT UNIQUE NOT NULL,
    style INTEGER,
    state INTEGER,
    account_id TEXT,
    properties BLOB,
    chat_identifier TEXT,
    service_name TEXT,
    room_name TEXT,
    account_login TEXT,
    is_archived INTEGER DEFAULT 0,
    last_addressed_handle TEXT,
    display_name TEXT,
    group_id TEXT);

CREATE TABLE attachment
    (ROWID INTEGER PRIMARY KEY AUTOINCREMENT,
    guid TEXT UNIQUE NOT NULL,
    created_date INTEGER DEFAULT 0,
    start_date INTEGER DEFAULT 0,
    filename TEXT,
    uti TEXT,
    mime_type TEXT,
    transfer_state INTEGER DEFAULT 0,
    is_outgoing INTEGER DEFAULT 0,
    user_info BLOB,
    transfer_name TEXT,
    total_bytes INTEGER DEFAULT 0);

CREATE TABLE handle
    ( ROWID INTEGER PRIMARY KEY AUTOINCREMENT UNIQUE,
    id TEXT NOT NULL,
    country TEXT,
    service TEXT NOT NULL,
    uncanonicalized_id TEXT,
    UNIQUE (id,
    service) );

CREATE TABLE chat_handle_join
    ( chat_id INTEGER REFERENCES chat (ROWID) ON DELETE CASCADE,
    handle_id INTEGER REFERENCES handle (ROWID) ON DELETE CASCADE,
    UNIQUE(chat_id,
    handle_id));

CREATE TABLE message
    (ROWID INTEGER PRIMARY KEY AUTOINCREMENT,
    guid TEXT UNIQUE NOT NULL,
    text TEXT,
    replace INTEGER DEFAULT 0,
    service_center TEXT,
    handle_id INTEGER DEFAULT 0,
    subject TEXT,
    country TEXT,
    attributedBody BLOB,
    version INTEGER DEFAULT 0,
    type INTEGER DEFAULT 0,
    service TEXT,
    account TEXT,
    account_guid TEXT,
    error INTEGER DEFAULT 0,
    date INTEGER,
    date_read INTEGER,
    date_delivered INTEGER,
    is_delivered INTEGER DEFAULT 0,
    is_finished INTEGER DEFAULT 0,
    is_emote INTEGER DEFAULT 0,
    is_from_me INTEGER DEFAULT 0,
    is_empty INTEGER DEFAULT 0,
    is_delayed INTEGER DEFAULT 0,
    is_auto_reply INTEGER DEFAULT 0,
    is_prepared INTEGER DEFAULT 0,
    is_read INTEGER DEFAULT 0,
    is_system_message INTEGER DEFAULT 0,
    is_sent INTEGER DEFAULT 0,
    has_dd_results INTEGER DEFAULT 0,
    is_service_message INTEGER DEFAULT 0,
    is_forward INTEGER DEFAULT 0,
    was_downgraded INTEGER DEFAULT 0,
    is_archive INTEGER DEFAULT 0,
    cache_has_attachments INTEGER DEFAULT 0,
    cache_roomnames TEXT,
    was_data_detected INTEGER DEFAULT 0,
    was_deduplicated INTEGER DEFAULT 0,
    is_audio_message INTEGER DEFAULT 0,
    is_played INTEGER DEFAULT 0,
    date_played INTEGER,
    item_type INTEGER DEFAULT 0,
    other_handle INTEGER DEFAULT -1,
    group_title TEXT,
    group_action_type INTEGER DEFAULT 0,
    share_status INTEGER,
    share_direction INTEGER,
    is_expirable INTEGER DEFAULT 0,
    expire_state INTEGER DEFAULT 0,
    message_action_type INTEGER DEFAULT 0,
    message_source INTEGER DEFAULT 0);

CREATE TABLE chat_message_join
    ( chat_id INTEGER REFERENCES chat (ROWID) ON DELETE CASCADE,
    message_id INTEGER REFERENCES message (ROWID) ON DELETE CASCADE,
    PRIMARY KEY (chat_id,
    message_id));

CREATE TABLE message_attachment_join
    ( message_id INTEGER REFERENCES message (ROWID) ON DELETE CASCADE,
    attachment_id INTEGER REFERENCES attachment (ROWID) ON DELETE CASCADE,
    UNIQUE(message_id,
    attachment_id));

I’ll provide a summary of the database as I understand it. Messages are predictably stored in the message table. The message table lacks a reference to attachments, other than the fact that one exists: + cache_has_attachments INTEGER DEFAULT 0+. The default setting is zero, meaning no attachements are stored. A value of 1 indicates there is an attachment in the Attachments sub-folder. One other issue we find when examining the message table is that there is a reference to the remote party in the conversation (handle_id INTEGER DEFAULT 0), but not the extact address—email, account identifier, or phone number—that an investigator would desire. That information is stored in the handle table. It is up to us to figure out how to relate the tables together.

Can’t we all just get along?

The difficulty in examining SQLite databases is determining how they are intended to relate information. There is seldom anything in the database itself that explains its intended use. It can similar to stumbling upon raw building materials and trying to figure out what is being built. Sometimes it’s easy, other times, not so much. But with the chats.db database, three table schema entries give us a clue as to the database design.

SQLite table join hints
CREATE TABLE chat_handle_join
    ( chat_id INTEGER REFERENCES chat (ROWID) ON DELETE CASCADE,
    handle_id INTEGER REFERENCES handle (ROWID) ON DELETE CASCADE,
    UNIQUE(chat_id,
    handle_id));

CREATE TABLE chat_message_join
    ( chat_id INTEGER REFERENCES chat (ROWID) ON DELETE CASCADE,
    message_id INTEGER REFERENCES message (ROWID) ON DELETE CASCADE,
    PRIMARY KEY (chat_id,
    message_id));

CREATE TABLE message_attachment_join
    ( message_id INTEGER REFERENCES message (ROWID) ON DELETE CASCADE,
    attachment_id INTEGER REFERENCES attachment (ROWID) ON DELETE CASCADE,
    UNIQUE(message_id,
    attachment_id));

INFO: The message_attachment_join table shows us that the message_id column in the table refers to the message table rowid column. Likewise, the attachment_id refers to the attachment table rowid. Thus, the message_attachment_join table is used to match attachments to messages.

Each of the table names above ends in the word join. As used, the word join is just part of a table name, but it hints at a SQLite table operation called a JOIN. A join combines two tables into one, and in SQLite there are two basic joins: INNER and OUTER. Inner joins, which come three variations, result in a combined table that includes only rows matching the join criteria. That is, the combined table only includes records with rows from each table that have one or more matching column values. While these are the default type of JOIN in SQLite, we are interested in results that show all messages, not just those with attachments.

OUTER joins, by contrast, do not require the records from each table to have a matching column. This means we can have combined table that shows all message rows, and if properly joined to the attachment table, rows containing messages with attachments will show attachment details. Further, if we join the handle table to the message table, we have everything we might want for an investigation.

I will be using a LEFT OUTER JOIN, which is shortened in syntax to LEFT JOIN. The basic syntax is "SELECT column(s) FROM left_table LEFT JOIN right_table ON left_table.columnName = right_table.columnName". A LEFT JOIN returns all rows of the left_table regardless of matching rows in the right table. Where rows match in the right table, they are joined to the matching left table row.

Tip
It is easier to understand and troubleshoot SQL queries by reading them backwards: Predicate, then subject. For example, reading the query in the paragraph above as "FROM left_table LEFT JOIN right_table ON left_table.columnName = right_table.columnName SELECT column(s)" can lend clarity to the output.

Applying a LEFT JOIN to the chats table, we can create a "Super table" combining the message, attachment, and handle tables.

SELECT *
FROM message AS m
LEFT JOIN message_attachment_join AS maj ON message_id = m.rowid
LEFT JOIN attachment AS a ON a.rowid = maj.attachment_id
LEFT JOIN handle AS h ON h.rowid = m.handle_id
Tip
The "expr1 AS expr2" statement sets expr2 as an alias for expr1, saving keystrokes and making the lines easier to read. Thus message_attachement_join.attachment_id becomes maj.attachment_id.

Entirely accurate, but probably containing more information than we need, the above query results in the following columns:

Table 1. Columns

ROWID

guid

text

replace

service_center

handle_id

subject

country

attributedBody

version

type

service

account

account_guid

error

date

date_read

date_delivered

is_delivered

is_finished

is_emote

is_from_me

is_empty

is_delayed

is_auto_reply

is_prepared

is_read

is_system_message

is_sent

has_dd_results

is_service_message

is_forward

was_downgraded

is_archive

cache_has_attachments

cache_roomnames

was_data_detected

was_deduplicated

is_audio_message

is_played

date_played

item_type

other_handle

group_title

group_action_type

share_status

share_direction

is_expirable

expire_state

message_action_type

message_source

message_id

attachment_id

ROWID

guid

created_date

start_date

filename

uti

mime_type

transfer_state

is_outgoing

user_info

transfer_name

total_bytes

ROWID

id

country

service

uncanonicalized_id

Note
If you look carefully at the schema at the top of this article, and the column listing above, you will notice that the columns are those of all four tables combined and in the order they are referenced.

We can refine the output by identifying specific columns we wish to display from each row. We can use the DATETIME function to convert the Mac Absolute Time in the date column to local time (by first converting to Unix epoch by adding a few more than 978 million seconds) and interpret the is_from_me column from integer to text using a CASE statement.

SELECT
    m.rowid,
    DATETIME(date +978307200, 'unixepoch', 'localtime') AS date,
    id AS address,
    m.service,
    CASE is_from_me
        WHEN 0 THEN "Received"
        WHEN 1 THEN "Sent"
        ELSE is_from_me
    END AS type,
    text,
    CASE cache_has_attachments
        WHEN 0 THEN Null
        WHEN 1 THEN filename
    END AS attachment
FROM message AS m
LEFT JOIN message_attachment_join AS maj ON message_id = m.rowid
LEFT JOIN attachment AS a ON a.rowid = maj.attachment_id
LEFT JOIN handle AS h ON h.rowid = m.handle_id

With this query, we end up with an easy to read output containing interpreted values with the following columns:

Table 2. Columns

ROWID

date

address

service

type

text

attachment

Tip
Why include message table ROWID? Row id’s a generated automatically for each message added to the database. A break in sequence will show a record has been deleted. Since it is possible to recover deleted records from SQLite databases, it is a convenient way to alert the investigator more analysis is required. Further, in the case of attachments multiple attachments, there will be one row for each attachment in a message. A repeating ROWID indicates two or more attachments are present for the message.

I hope this discussion of SQLite JOIN operations as they relate to the Apple iOS chats.db will help you in your examination of SQLite databases.


Thursday, July 10, 2014

Identifying Android Device Owners

I work in a college town.  That means lots of unsecured electronics.  Lots of unsecured electronics means lots of thefts and 'misplaced'--"I'm not as think as you drunk I am!"--devices.

I've seen a trend in recovered stolen devices over the past few years: the bad guys are rapidly restoring devices to factory settings to prevent them from being tracked by the owner or law enforcement.  That leaves me with a problem, though: how do I determine the owner of a device that has been restored?   Allocated data that could show ownership is deleted upon a system restore.  Since, I've discussed other devices in the past, today I'll focus on Androids.

Dispossessed Androids

I've had uneven success with Androids in the past.  This may be due in part to the fact that I've not always know what to look for.  But I received two more such devices this week and decided to apply myself, once again, to the problem of identifying the owners.  Since I became an Android owner myself over the past 18 mos, I had a device with known data with which to experiment.

Android Recovery

Nearly all data that contains identifying information is stored in the 'data' partition.  When a device is restored or 'wiped' through the Android recovery system, personal data is removed.  This process is usually quite fast, which leads me to believe that 'wiping' user data is a simple delete in most cases.   There are custom recoveries where this might not be true, but a study of unallocated data in a wiped device reveals a rich data field. 

In Unix-like systems, physical storage devices are attached to the operating system through special files (drivers) called device nodes.  These nodes provide raw access to devices and their partitions.  Thus, if a device node, also referred to as a block device, is addressed, all content is accessible, allocated and unallocated alike.  Block devices can be thought of and addressed by software tools as files.  To access block devices, however, one must have root access to the operating system.  I will not be discussing the various ways to achieve root access to an Android device in this article, however.  I will continue on the assumption that the device has been rooted.

Tinkering under the hood

Access to a running Android device is done through the Android Debug Bridge (adb).  In a stock recovery or Android operating system, adb provides shell user access to the file system.  The shell user has limited access to the device and commands, but the root user has full access.  Root access, when not immediately granted through the adb shell command, is obtained by the su command.
shell@device:/ $
shell@device:/ $ su
root@device:/ # 
Block device files are found in the /dev/block directory. The file representing the entire NAND flash is the /dev/block/mmcblk0 file. Partitions are represented as /dev/block/mmcblk0p1, /dev/block/mmcblk0p2, etc. A paritial directory listing in my device, for example, is:
/dev/block/mmcblk0
/dev/block/mmcblk0p1
/dev/block/mmcblk0p10
/dev/block/mmcblk0p11
/dev/block/mmcblk0p12
/dev/block/mmcblk0p13
/dev/block/mmcblk0p14
/dev/block/mmcblk0p15
/dev/block/mmcblk0p16
/dev/block/mmcblk0p17
/dev/block/mmcblk0p18
/dev/block/mmcblk0p19
We could address the entire memory storage device through mmcblk0, but it would be more efficient to address just the data partition.  But which of these is the data partition?  There are several ways to figure this out, and while not all of the following methods will work on every device, at least one should.
  1. If the data partition is mounted, such as would occur in a rooted and running operating system, simply issue the mount command:

    # mount | grep /data
    /dev/block/mmcblk0p25 on /data type ext4 (ro,relatime,barrier=1,data=ordered)

  2. Check the contents of the /etc/fstab file:

    # cat /etc/fstab
    /dev/block/mmcblk0p24 /system ext4 rw
    /dev/block/mmcblk0p25 /data ext4 rw
    /dev/block/mmcblk0p26 /cache ext4 rw
    /dev/block/mmcblk1p1 /sdcard vfat rw
    /dev/block/mmcblk0p28 /emmc vfat rw
    /dev/block/mmcblk1p2 /sd-ext  rw
    /dev/block/mmcblk0p21 /efs ext4 rw

  3. Look for the 'by-name' directory somewhere in the /dev/block/platform subtree:

    # ls /dev/block/platform/msm_sdcc.1/by-name/ -l | grep data  
    lrwxrwxrwx root root 2014-06-24 03:10 data -> /dev/block/mmcblk0p25

    Note that the 'by-name' data file is actually link to the /dev/block/mmcblk0p25.

Getting to the Point

Ok, we know how to identify and address the data partition, but for what do we search?  After some experimentation with my own device, it appears that a very profitable target are application license files. The com.application.vending domain contains application licensing information.  On my device, I found 16 binary files in the /data/data/com.android.vending/cache/main/ directory that appear to be application licenses from applications downloaded from the Google Play store.  While I could not find specific information about these files, a reading of Android developer page for licensing applications suggests this files purpose.  Importantly all contained my username in the form of:
account="androiduser@gmail.com"
Crafting a search of the data partition of a restored device with this knowledge is fairly simple:
# strings mmcblk0p25 | egrep -o 'account="?.{1,25}@gmail.com"?'
Note: the strings and egrep commands are available through busybox which can be temporarily installed to the /dev/ folder (a temporary file system in RAM) if not already present in your environment using the adb push busybox /dev/ command.
Output of the search can be sorted and counted using a sort | uniq pipeline for clean results.
# strings -td mmcblk0.raw | \egrep -o 'account="?.{1,25}@gmail.com"?' | \sort | uniq -c | sort -n
1 account=user1@gmail.com
13 account=user2@gmail.com
970 account="user2@gmail.com"
2161 account="user1@gmail.com"
From the output, we can see there have been two user accounts.  Did they both exist on the system at the same time.  Has the device changed hands?  We don't know, but we have two email addresses for contacting people who might know!


Tuesday, May 6, 2014

Searching for Searches

In a recent examination of smart phone content, it became necessary to know the personal interests of the device's owner.  You can browse internet and app history, but that can be extensive to review every URLs to every clicked link and served page.  To get directly to the point, I decided to search for his browser/app search query history.  I was hoping to craft a regular expression (or several) that would assist in giving me a good idea of the person's interests.

I studied some top search engine results and reviewed some browser history and crafted the following GNU extended regular expression:

[?&](k|p|q|query)=[a-zA-Z0-9+_%-]+

This search, run against strings output of files, found search queries for Google, Yahoo!, Bing, Ask, Aol, Faceboot, YouTube, Vimeo and some x-rated sites as well as app content such as Twitter.  Search results appear (depending on what you feed and how you configure GNU grep) similar to:

https://www.google.com/search?q=you+found+me
http://m.youtube.com/results?q=some%20video%20i%20like
https://m.facebook.com/search/?query=that%20guy%20

An added benefit to this expression is that it also hits on additional page results, Google images page refreshes, etc.   With little command line wiz-bangery, it's even possible to sort and count the results to get a histogram of searches:

strings History.plist | egrep -o '[?&](k|p|q|query)=[a-zA-Z0-9+_%-]+' | sed 's/.*=//' | sort | uniq -c | sort -nr

I'll explain the command above:
  1. strings History.plist # extract ascii strings from the iPhone Safari History.plist
  2. egrep -o '[?&](k|p|q|query)=[a-zA-Z0-9+_%-]+' # grep for the regular expression described above
  3. sed 's/.*=//' # strip off the query tag at the front of the user typed query
  4. sort # sort the results alphabetically
  5. uniq -c  # count the matching lines
  6. sort -nr # reverse sort, placing the most frequent query terms first.
Results of the command look similar to the following:

   21    I+search+for+this+a+lot
   11    this%20one%20a%20little%less
    2    why+would+anyone+read+linux+sleuthing
    1    testing%20one%20two

The expression could be run against all logical files in a device and against unallocated space, if applicable.  I only demonstrate it using the History.plist because it's easy illustrate.

I post this short article both because I want to remember this regular expression (the whole reason for my blog in the first place) and to solicit favorite search box/engine regular expressions you might have. Please share them in a comment if you get a chance.  Happy searching!

Tuesday, April 22, 2014

Finding Serial Numbers on Locked iPhones

Apple iDevices have their serial number engraved on the back, right? So why the article? Because it's not true of newer devices like the iPhone 5, 5s, and 5c. Also, original cases can be replaced and serial numbers obliterated through unprotected use or deliberate act. Now I have your attention again, I hope.

Getting the Message

I've written in the past about the libimobiledevice library and it's utilities.  One, which is quite handy for gathering device information is ideviceinfo.  It provides information such as the device description (color), device class (iPhone, iPod, iPad), device name, etc.  When the device is unlocked, you can retrieve the serial number, as well.  Basically, you retrieve the contents of the Info.plist.

But ideviceinfo is not so informative with a locked device.  In fact, it won't show you any output unless you use the -s simple option.  While you can obtain some information, such as the description, class, name, UDID (unique identifier), and Mac address, you can't display the serial number.  But never fear, there is a way...

Linux has a system log that tracks systems events, included the plugging and unplugging of devices. The system log can be dumped the the terminal with the dmesg command.  Run by itself, you dump the entire log and it's quite a lot of information to sift through, though in truth what you want will be found at or near the end of the log.  You can shorten the output to the content you need with

$ dmesg syslog

But an even niftier trick is to set up your system to display the log as it is created and watch the output:

$ tail -f /var/log/syslog

This will display the last 10 lines of the system log and the "follow" it until you cancel with ctrl-c. Now you can hotplug your iDevice and watch the data that the system log records about the device. Unfortunately, you will see that it displays the device UDID and not the serial number in the "SerialNumber" field for a locked iDevice.

Recovering the Serial Number

The serial number is recoverable in Recovery Mode, however.  Pressing and holding the hardware power button brings up the software power off slide button.  Power off the device, and then replug it into your Linux box while holding the hardware home button.  The device will boot into recovery mode.  Now check your syslog with either of the two methods discussed above.  Two serial numbers are displayed in the syslog after the product (iPhone, etc) and manufacturer (Apple) are listed.  The first is the UDID, but the second includes several key:value pairs, one of which is the device serial number (key SRNM).  

When you are done collecting the device data revealed in the syslog, reboot it, if required, by pressing and holding the power button approximately 10 seconds until the recovery screen goes blank.  The device will then reboot into the operating system, probably feeling very ashamed of itself for revealing its secrets so readily.

Wednesday, February 19, 2014

Identifying Owners of Locked Android Devices

Locked Devices are not Always Secure

I was handed a device I’ve never seen before: A Verizon Ellipsis 7" tablet. The device was suspected to be stolen, but it was password locked with no sd card or sim card installed. USB debugging and mass storage mode were disabled, too, checked by plugging the device into a computer while the device was booted into the normal operating system. What to do now?

I’ve learned through much hands-on experience to put a device through a few checks before I give up hope. Is there a bootloader mode? How about recovery? I’ve been surprised to find full access to devices in recovery mode, left wide open by the phone’s distributor. More often I find limited access, and sometimes none.

With a little online research—the forensic community owes a debt of gratitude the the modder community—I found that the way to put the Ellipsis into recovery mode: Press and hold between the up and down volume button while powering the device (pressing up and down at the same time did not work). I plugged the device into my PC again, ran adb devices and observed that the Ellipsis was running the adb daemon in recovery mode! I dropped into the ADB shell and determined I was the shell user, which meant limited privileges.

Getting the lay of the land in Android
$ adb shell
shell@android:/ $ printenv
_=/system/bin/printenv
LD_LIBRARY_PATH=/vendor/lib:/system/lib
HOSTNAME=android
TERM=vt100
PATH=/sbin:/vendor/bin:/system/sbin:/system/bin:/system/xbin
LOOP_MOUNTPOINT=/mnt/obb
ANDROID_DATA=/data
ANDROID_ROOT=/system
SHELL=/system/bin/sh
MKSH=/system/bin/sh
USER=shell
ANDROID_PROPERTY_WORKSPACE=8,49664
EXTERNAL_STORAGE=/storage/sdcard0
RANDOM=17656
SECONDARY_STORAGE=/storage/sdcard1
HOME=/data
ANDROID_BOOTLOGO=1
PS1=$(precmd)$USER@$HOSTNAME:${PWD:-?} $
shell@android:/ $

The printenv command reveals some other interesting details about the device. For example, I know where the user data is mounted (HOME=/data), where the operating system files are located (ANDROID_ROOT=/system), and where the sdcards are mounted (EXTERNAL_STORAGE=/storage/sdcard0, SECONDARY_STORAGE=/storage/sdcard1). I know the system path, i.e., the location of executable files that can be called from anywhere in the system. I can also see what partitions are mounted:

Mountpoints of Verizon Ellipsis in Recovery Mode
shell@android:/ $ mount
rootfs / rootfs ro,relatime 0 0
tmpfs /dev tmpfs rw,nosuid,relatime,mode=755 0 0
devpts /dev/pts devpts rw,relatime,mode=600 0 0
proc /proc proc rw,relatime 0 0
sysfs /sys sysfs rw,relatime 0 0
none /acct cgroup rw,relatime,cpuacct 0 0
tmpfs /mnt/obb tmpfs rw,relatime,mode=755,gid=1000 0 0
emmc@android /system ext4 ro,noatime,noauto_da_alloc,commit=1,data=ordered 0 0
emmc@usrdata /data ext4 rw,nosuid,nodev,noatime,nodelalloc,noauto_da_alloc,commit=1,data=ordered 0 0
/emmc@cache /cache ext4 rw,nosuid,nodev,noatime,discard,noauto_da_alloc,data=ordered 0 0
/emmc@protect_f /protect_f ext4 rw,nosuid,nodev,noatime,nodelalloc,noauto_da_alloc,commit=1,data=ordered 0 0
/emmc@protect_s /protect_s ext4 rw,nosuid,nodev,noatime,nodelalloc,noauto_da_alloc,commit=1,data=ordered 0 0
/emmc@fat /storage/sdcard0 vfat rw,dirsync,nosuid,nodev,noexec,relatime,uid=1000,gid=1015,fmask=0702,dmask=0702,allow_utime=0020,codepage=cp437,iocharset=iso8859-1,shortname=mixed,utf8,errors=remount-ro 0 0
shell@android:/ $

I see that the /data partition is mounted read/write, but upon exploration, I’ll see there is little I can see or retrieve from there because the shell user does not have sufficient rights. But where can I look to find information about the owner, then? Take a close look at that last entry:

Internal SDCard Mount Point
/emmc@fat /storage/sdcard0 vfat rw,dirsync,nosuid,nodev,noexec,relatime,uid=1000,gid=1015,fmask=0702,dmask=0702,allow_utime=0020,codepage=cp437,iocharset=iso8859-1,shortname=mixed,utf8,errors=remount-ro 0 0
shell@android:/ $ ls -dl storage/sdcard0
d---rwxr-x system   sdcard_rw          1969-12-31 16:00 sdcard0

Members of the sdcard_rw group have read/write/execute privileges in the /storage/sdcard0 directory, and other users can read and execute there. A little more exploration of the root directory, we see that /sdcard is a link to /storage/sdcard0, so we can shortcut our typing a bit.

What remains is figure out who owns this device from the data I can read in the /sdcard mount point. One thing all Androids have in common is that the users register them with Google and create associate the device with a gmail account. I performed a simple search:

Finding the Owner of a Device from SDCard Data
shell@android:/ $ ls -R sdcard/ | grep "\@gmail.com"
...
sdcard//Android/data/com.google.android.apps.books/files/accounts/somebody@gmail.com/volumes/######/res2:
sdcard//Android/data/com.google.android.apps.books/files/accounts/somebody@gmail.com/volumes/######/segments:
...
Note
The email address and path has been altered above to protect privacy. It is offered as an example of what can be expected from such a search.

I found over 230 instances of that email address (modified above for privacy) in the file paths alone, without looking inside any files at all. In fact, I found two accounts. I was able to contact those persons and determine the device was in fact stolen. There are certainly other ways to find user information, and I did in fact find that some of the apps that stored user namest hat corroborated the gmail accounts I found in the file paths.

I’ve known investigators to hear a device description of "Locked with no USB debugging" and declare, "There is nothing that can be done." I hope this quick post demonstrates otherwise. While it is true that some devices are buttoned up pretty tight, I find that the vast majority provide at least some access. Maybe now you’ll be inspired to look a little more closely, too.


Monday, February 17, 2014

Making Sense of the Senseless

SQLite to the Rescue

One of the tasks I’m asked to perform is to geolocate mobile phone calls from Call Detail Reports (CDR). These usually arrive from a carrier as spread sheets: one with details of calls to and from a particular number, and one or more cell tower listings. I’ve tried a variety of ways to process these over time such as BASH scripting and python coding. But by far the easiest and most flexible way to process these records is by importing them into a SQLite database.

The long term difficulty in processing CDRs is that they change over time. It seems that every time I get new records to process, the format has changed which breaks previous code. It takes much more effort to recode a script than it does to write a SQL query on the fly, and I’m certainly no SQL guru. SQLite has enough built in functions to handle nearly any problem you might encounter.

Lets take some Sprint records I recently processed as an example. I was tasked with plotting the locations of the voice calls on a map. There were over 3,600 records for a 12 day period with text message details mixed with voice call details. Only the call details contained references to the tower records, however. Call records included five digit integers that represented the first and last cell towers the mobile phone used during the communication. Text messages contained only zeros in these columns.

The challenge was to retrieve the call records for mapping, ignoring the text messages that did not contain cell tower details. SQLite seemed the easiest way to accomplish this in light of the follow up requirement of looking up each cell tower integer in any one of four associated tower record spread sheets.

Creating the Database

The first step was to create a SQLite database. Fortunately, creating a database is a simple, straight forward process. I performed the work using in the SQLite command line program. However, GUI tools like the excellent SQLite Manager can accomplish the same thing and I recommend them if you are new to SQLite as they can be good teachers.

To create a new database, I simply provided the new database name when I opened the command line program. I called my database cdr.sqlite.

Creating a SQLite database
$ sqlite3 cdr.sqlite
SQLite version 3.7.17 2013-05-20 00:56:22
Enter ".help" for instructions
Enter SQL statements terminated with a ";"
sqlite>

Next, I needed a table to hold the call records. I made this something easy to type, so I called it simply cdr. The columns I named for the columns found in the spreadsheet sent by Sprint.

SQLite Create Table Statement for the Call Sprint Detail Report
sqlite> CREATE TABLE "cdr"
   ...> (
   ...> "Calling_NBR" TEXT,
   ...> "Called_NBR" TEXT,
   ...> "Dialed_Digits" TEXT,
   ...> "Type" TEXT,
   ...> "Start_Date" TEXT,
   ...> "End_Date" TEXT,
   ...> "Duration" INTEGER,
   ...> "NEID" INTEGER,
   ...> "Repoll" TEXT,
   ...> "First_Cell" INTEGER,
   ...> "Last_Cell" INTEGER,
   ...> );
sqlite>
Note
In the SQLite command line program, all SQL queries must end in a semi-colon or the interpreter assumes you are adding lines to the statement. If you forget to put the semi-colon at the end of your command, you can enter it on the next line.

To import the CDR data from the spreadsheet, simply export the data without the header row, in text file with comma separated values (CSV). In this case, I called the file "call_records.csv". I had to tell SQLite how the data was delimited (SQLite uses pipes by default), and import the CSV file into the "cdr" table.

Importing the CDR data into the cdr table
sqlite> .separator ","
sqlite> .import call_records.csv cdr
Note
The "dot" commands are special SQLite functions (use .help to view them all) and do not require a semi-colon.

To import the cell tower details, I followed the same process: I created a table I called "towers" using the column headers from the spreadsheet as the column names of the table. Then I exported the cell tower data from each spreadseet to a CSV file and imported the CSVs into the tower table. While I won’t repeat the full process, I will display the table layout (schema) below.

Towers table schema
sqlite> .schema towers
CREATE TABLE "towers" (
    "Cell" INTEGER,
    "Cascade" TEXT,
    "Switch" TEXT,
    "NEID" INTEGER,
    "Repoll" INTEGER,
    "Site_Name" TEXT,
    "Address1" TEXT,
    "Address2" TEXT,
    "City" TEXT,
    "County" TEXT,
    "State" TEXT,
    "Zip" TEXT,
    "Latitude" TEXT,
    "Longitude" TEXT,
    "BTS_Manufacturer" TEXT,
    "Sector" TEXT,
    "Azimuth" TEXT,
    "CDR_Status" TEXT
);
sqlite>

Looking Up Records

Now it was time to lookup the call records in the tower tables to find that latitude and longitude of the tower used to initiate the call an place the mobile device in time and space. I expected it to be straight forward: take the 5-digit tower number from the first_call field of the cdr table, match it to the cell field in the towers table, and return the map coordinates. Easy peasy, right? The SQL equivalent of show me the latitude and longitude of the tower where the CDR first_cell integer matches the Tower cell integer.

First attempt to match call tower numbers to tower coordinates
sqlite> select latitude, longitude from towers, cdr where first_cell = cell;
sqlite> # Ruh roh, raggy, no matches!
sqlite> select first_cell from cdr where first_cell != 0 limit 5;
40385
10962
10962
20962
30392
sqlite> select cell, latitude, longitude from towers;
<redacted>
385|34.046944|-118.448056
385|34.046944|-118.448056
385|34.046944|-118.448056
392|34.063806|-118.30366
392|34.063806|-118.30366
392|34.063806|-118.30366
962|37.657222|-122.094653
962|37.657222|-122.094653
962|37.657222|-122.094653
385|37.838333|-122.298611
385|37.838333|-122.298611
392|37.693|-122.0939
392|37.693|-122.0939
392|37.693|-122.0939
385|37.403633|-121.89436
385|37.403633|-121.89436
385|37.403633|-121.89436
<redacted>
sqlite>

Whoa, the cdr first_cell and towers cell values do not jibe! And, as we can see, there is more than one entry in the tower table for each cell designator. Take cell 385 for example: there are three distinct groupings of tower 385 with three different map coordinates for each group. It turns out that cell towers are grouped by the switch they are part of, recorded in the CDR and tower records as the NEID. The appropriate cell tower record can be further reconciled by the sector number, or side of the tower from which the call originated. The first_cell value is actually a concatenation of the sector and the tower number. How did I figure all this out? The answer came from reading the documentation (RTFM) that came with the records and some analysis of the spreadsheets.

I’ll demonstrate below the values that make a tower record unique and that must be considered when matching call records to tower details. I’ll focus on tower 385

Values that make the tower records unique.
sqlite> .headers on
sqlite> .mode columns
sqlite> select cell, sector, neid, azimuth, latitude, longitude from towers
   ...> where cell = 385;
Cell        Sector      NEID        Azimuth     Latitude    Longitude
----------  ----------  ----------  ----------  ----------  -----------
385         1           65          60          34.046944   -118.448056
385         2           65          200         34.046944   -118.448056
385         3           65          290         34.046944   -118.448056
385         1           512         55          37.838333   -122.298611
385         2           512         155         37.838333   -122.298611
385         1           95          0           37.403633   -121.89436
385         2           95          110         37.403633   -121.89436
385         3           95          190         37.403633   -121.89436
sqlite>

Now it is easy to see that the three different groupings of tower 385 are a result of that tower designator being used in three different switches, or NEIDs. Further, each tower can be resolved to a sector, which corresponds to a unique asimuth or direction the cell tower antenna points.

SQLite Substrings

The remaining problem in querying this data is the configuration of the first_cell value in the call details. Recall that it is the sector concatentated to the tower number. I needed a way to take the first digit from the integer and assign it to a sector value, and use the remaining four digits as the tower designator. Fortunately, SQLite has a built-in substring function to make this easy.

substr(X,Y,Z), substr(X,Y)

The substr(X,Y,Z) function returns a substring of input string X that begins with the Y-th character and which is Z characters long. If Z is omitted then substr(X,Y) returns all characters through the end of the string X beginning with the Y-th. The left-most character of X is number 1. If Y is negative then the first character of the substring is found by counting from the right rather than the left. If Z is negative then the abs(Z) characters preceding the Y-th character are returned. If X is a string then characters indices refer to actual UTF-8 characters. If X is a BLOB then the indices refer to bytes.

http://www.sqlite.org/lang_corefunc.html
— SQLite

From the SQLite documentation, we see that the substr function takes 2-3 arguments and returns a substring of the of the input string based according to those documents. To return the sector, I needed to take the first digit from the first_cell string in this manner: substr(first_cell, 1, 1). To return the tower identification, I needed to skip the first digit and return the rest of the string thusly: substr(first_cell, 2).

Note
I did not need to specify the third argument in the second substr() expression because I wanted the entirety of the string past the first digit.

Finally, I needed to include the NEID from the call detail records to ensure I’ve looked up the correct tower. Putting it all together, we can see how to created the values we need from the call records to find the matching tower details. I’ve added a second query to demonstrate, using the ltrim() function to strip the leading zeros from the cell column.

Using the SQLite substr() function
sqlite> select substr(first_cell, 1, 1) as sector,
   ...> substr(first_cell, 2) as cell, neid
   ...> from cdr where first_cell != 0 limit 5;
sector      cell        NEID
----------  ----------  ----------
4           0385        95
1           0962        169
1           0962        169
2           0962        169
3           0392        512
sqlite> select substr(first_cell, 1, 1) as sector,
   ...> ltrim(substr(first_cell, 2), 0) as cell, neid
   ...> from cdr where first_cell != 0 limit 5;
sector      cell        NEID
----------  ----------  ----------
4           385         95
1           962         169
1           962         169
2           962         169
3           392         512
sqlite>

Putting it All Together

As usual, the explanation is more step intensive that the actual work. The whole process can be done in one query, but I wanted to break it down so that it would be easier to recognize the elements of the query. To make it more legible, I’ll write it across several lines.

Matching the CDR call record to the correct tower details
sqlite> select start_date as Date, calling_nbr as Number, latitude, longitude
   ...> from cdr, towers
   ...> where substr(first_cell, 1, 1) = towers.sector and
   ...> ltrim(substr(first_cell, 2), 0) = cell and
   ...> cdr.neid = towers.neid limit 5;
Date                 Number      Latitude    Longitude
-------------------  ----------  ----------  -----------
2012-12-10 07:36:39  ##########  37.657222   -122.094653
2012-12-10 08:24:21  ##########  37.657222   -122.094653
2012-12-10 08:26:09  ##########  37.657222   -122.094653
2012-12-10 09:59:40  ##########  37.693      -122.0939
2012-12-10 10:00:26  ##########  37.705128   -122.047417
sqlite>

This can be converted to a CSV file suitable for mapping through a website like gpsvisualizer.com[GPS Visualizer] or a program like GPSBabel. First, I change the output mode to CSV and I change the columns names to comply with the mapping software’s requirements, printing a sample to ensure I have the format I am seeking. Then I output the data to a file for import to the mapping program.

Exporting the data for mapping
sqlite> .mode csv
sqlite> select start_date as name, calling_nbr as desc, latitude, longitude
   ...> from cdr, towers
   ...> where substr(first_cell, 1, 1) = towers.sector and
   ...> ltrim(substr(first_cell, 2),0) = cell and
   ...> cdr.neid = towers.neid limit 5;
name,desc,Latitude,Longitude
"2012-12-10 07:36:39",##########,37.657222,-122.094653
"2012-12-10 08:24:21",##########,37.657222,-122.094653
"2012-12-10 08:26:09",##########,37.657222,-122.094653
"2012-12-10 09:59:40",##########,37.693,-122.0939
"2012-12-10 10:00:26",##########,37.705128,-122.047417
sqlite> .output call_map.csv
sqlite> select start_date as name, calling_nbr as desc, latitude, longitude
   ...> from cdr, towers
   ...> where substr(first_cell, 1, 1) = towers.sector and
   ...> ltrim(substr(first_cell, 2),0) = cell and
   ...> cdr.neid = towers.neid;
sqlite>

So, in its simplest form, you can see this is not necessarily a difficult process. It can be distilled into three basic steps:

  1. Review the records and determine the relationships between them

  2. Import the data into a SQLite database

  3. Query the database for the output needed

Though there is commercial mapping software available, the software I’ve seen either lacks flexibility to deal with differences in records or in output. Further they usually require configuration that can take as long or longer than importing the data into SQLite and writing the specific query you need for your investigation. If you have the software and a happy with it, use it. If you find it is lacking the flexibility you need, consider doing the work by hand. You’ll be better for it!

Note
In the interest of full disclosure, the results demonstrated here are over simplified. For the real casework, I interpreted the call direction with a case statement to select the called number or calling number as appropriate. The result was a Google Earth map with waypoints named for the date and time of the call. Clicking the waypoint showed call details, e.g., (To: ()-## for 37 secs, azimuth 270).

Time Perspective

Telling time in forensic computing can be complicated. User interfaces hide the complexity, usually displaying time stamps in a human reada...