Pages

Tuesday, September 29, 2015

Compression and Android Gmail

Every registered Android mobile device has an associated Google account. Google accounts usually mean Gmail. And, for investigators interested in the Gmail content stored on Androids, that content can be found in the /data/com.google.android.gm/databases directory in a database named in the following format:

mailstore.[GoogleAccount]@gmail.com.db

The database contains 23 tables (at least at the time of this writing), the most interesting of which is messages.

The messages table has 41 fields (or columns). To obtain the basic email content (say, for keyword searching), an investigator would likely want to export the sender’s and receiver’s addresses, the date sent or received, and the subject line, and the message body, at the very least. There is plenty more to be gleaned from the database, but your investigation will dictate the investigative needs.

Caution: Automated tools do not provide the full wealth of data to be found in the mailstore database. It is always a good idea to become familiar with the database schema to learn the full potential for your investigation.

The Big Squeeze

If you have experience searching SQLite databases, you might be thinking, "Why go to the trouble of exporting messages from the database? SQLite strings are usually UTF-8, so I can just search the database with regular expressions or plain keywords." Well, there is a catch when it comes to email content in the Gmail mailstore database: zlib compression.

Short length message bodies are written to the body field in the messages table as a plain text string. In a recent exam, the longest message I found in this field in a recent exam was 98 bytes, however. Longer message bodies are compressed using the zlib algorythm and stored in the bodyCompressed field. While SQLite supports compressed databases, it has no function to decompress fields within databases. Instead, it stores such data as a blob type, and it is up to the database user to decompress the data.

Note:
The SQLite blob type is sort of a catch-all for any type or data. Data is stored in the format in which it was input.

Extracting Messages

Python is a good option for exporting messages from the Gmail messagestore database. It can both open and query databases, and it can decompress the long message bodies.

Exporting Gmail Messages
import sqlite3
import zlib

# open and query the database
conn = sqlite3.connect('messagestore.db') # database name abbreviated
c = conn.cursor
c.execute("select _id, fromAddress, datetime(dateSentMS/1000,
    'unixepoch', 'localtime'), datetime(dateReceivedMS/1000,
    'unixepoch', 'localtime'), case when body not Null then body
    else bodycompressed end from messages ")
rows = c.fetchall()

# interate through the rows and decompress the long messages
for row in rows:
    id, _from, sent, recv, body = row
    try:
        body = zlib.decompress(body)
    except:
        pass
    print('{}|{}|{}|{}|{}'.format(id, _from, sent, recv, body))
Note:
The final line can be adapted to your own needs, i.e., writing the content to a new file or database, or use python regular expressions to search the content, etc.

Some Explaination

The example above is just that: an example. It is intended, like all my posts, to remind me how to process the data and demonstrate how just a few lines of python can be leveraged to extract data. The script could have been shorter, but it would have come at the cost of clarity. That said, there is still some explanation to be had:

The SQLite query in the c.execute method might need some dissecting for you to understand what I did.

select
    _id,
    fromAddress,
    datetime(dateSentMS/1000, 'unixepoch',   'localtime'),
    datetime(dateReceivedMS/1000, 'unixepoch', 'localtime'),
    case
        when body not Null
        then body
        else bodyCompressed
    end
from messages

The dateSentMS and dateReceivedMS fields are recorded in milliseconds since 1/1/1970 (Unix epoch). I let SQLite do the date converstion for me, rather and doing it python, and I converted from Unix epoch to local time. The case statement pulls a little trickery to select the body field or bodyCompressed field. Basically, the row’s body field is checked to see if it is populated. If so, it is returned. If not, the contents of the bodyCompressed field are returned.

In the body decompression section of the script, the contents of the each row are assigned to variables id, _from, sent, recv and body. The try/except clause attempts attempts to decompress the body. If it fails, as it will on the short message bodies, it just uses the contents of the body variable as is. Finally, the row is printed in a pipe-delimited fashion.


Tuesday, August 25, 2015

Android SDK on 64-bit Linux

I commonly use adb and fastboot to access Android devices.  Ubuntu has packages for those tools making installation easy:
$ sudo apt-get install android-tools-adb android-tools-fastboot
But, in recent months, I have encountered instances where the adb and fastboot binaries in the Ubuntu repository are out of date for the device with which I am attempting to connect.
$ apt-cache show android-tools-adb
Package: android-tools-adb
Priority: extra
Section: universe/devel
Installed-Size: 227
Maintainer: Ubuntu Developers
Original-Maintainer: Laszlo Boszormenyi (GCS)
Architecture: amd64
Source: android-tools
Version: 4.2.2+git20130218-3ubuntu41
Depends: libc6 (>= 2.15), libssl1.0.0 (>= 1.0.0), zlib1g (>= 1:1.1.4)
Filename: pool/universe/a/android-tools/android-tools-adb_4.2.2+git20130218-3ubuntu41_amd64.deb
...

$ adb version
Android Debug Bridge version 1.0.31

Android Software Development Kit


There is another way to install adb: the Android Software Development kit.  There is an issue, though.  The development kit software is 32-bit, and modern computer systems run 64-bit operating systems.  So, one needs a way to install 32-bit libraries in 64-bit Linux to support 32-bit applications.

The Android SDK can be found here. However, before you start downloading the Android Studio displayed prominently at the top of the page, consider that Studio is large collection of software for developing Android applications.  Forensically speaking, it is overkill, and fortunately, there is an alternative download that servers our purpose better: SDK Tools Only.

I prefer to install tools not found in the Ubuntu repository in the /opt directory.  After I download the SDK, I decompress the archive:
$ sudo tar xf Downloads/android-sdk_r24.3.4-linux.tgz -C /opt
The next step in installation is to execute the Android SDK Manager.  Note: You will need java installed (java-common package) to run the application.
$ /opt/android-sdk-linux/tools/android 
The only component needed to successfully run adb and fastboot are the Android SDK tools (already installed) and Android SDK Platform-tools.  Check the later and go through the installation process.  Exit the manager when you are done.

Installing 32-bit Support in 64-bit Ubuntu


The adb and fastboot executable binaries are located in the platform-tools folder of the SDK.  However, if you try to execute the tools, you will get and error stating the that the command is not found, even though you see the tools in the directory, with proper permissions for execution, too! 
$ cd /opt/android-sdk-linux/platform-tools
$ ./adb version
bash: ./adb: No such file or directory
$ ls -l adb
-rwxr-xr-x 1 root root 1221540 Aug 25 13:05 adb
Confusing?  You bet. The issue is that you are trying to execute a 32-bit application in a 64-bit-only operating system.  There is no 32-bit support installed in Ubuntu by default.  But we can solve that:
$ sudo dpkg --add-architecture i386
$ sudo apt-get update
The specific libraries to support the SDK are installed as follows:
$ sudo apt-get install libc6:i386 libncurses5:i386 libstdc++6:i386
$ ./adb version
Android Debug Bridge version 1.0.32
Revision 57224c5cff69-android
And, voila!  The latest SDK binary tools installed.

Easy Access


There are two simple ways to gain access to the SDK platform tools: link files or path modification.  Link files are placed in a location that is part of your system path already, and path modification places the platform tools directory in your path.  Which is best?  Welcome to freedom, friend, it's your choice.

Link files

Link files are found in all modern operating systems, so I won't bother explaining them.  First, you need to know what directories are in your path so you know where to put the links:
$ echo $PATH
/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games
Most packages you install place their executable binaries in /usr/bin and /usr/sbin, depending on the need for administrative privileges.  Most software built and installed by the user are placed in the /usr/local/bin and /usr/locals/sbin folders.  Since we are installing the software, and it does not require admin rights to execute, /usr/local/bin is a good choice.

To create link files in Linux (symlinks are preferred for this type of operation), do the following:
$ ln -s /opt/android-sdk-linux/platform-tools/adb /usr/local/bin
$ ln -s /opt/android-sdk-linux/platform-tools/fastboot /usr/local/bin
Note: you will have a lot more success with link files if you get in the habit of using absolute paths.  There is a place for relative paths, but this is not it.

Modifying the PATH

The PATH system variable maintains the list of directories searched by the system for commands.  When you, as a user, type a command without it's full path, the list of directories in the PATH variable are queried for the command.  If the command is not located, an error results.  To place the adb and fastboot binaries into the path, we need to add the folder containing them to the PATH variable.

It is possible to modify the path, both temporarily and permanently.  For example:
$ echo $PATH
/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games
$ which
$ PATH=/opt/android-sdk-linux/platform-tools

$ echo $PATH
/opt/android-sdk-linux/platform-tools$ which
Command 'which' is available in the following places
 * /bin/which
 * /usr/bin/which
The command could not be located because '/bin:/usr/bin' is not included in the PATH environment variable.
which: command not found
The problem about illustrates that we have to be careful setting the path.  By setting the path to /opt/android-sdk-linux/platform-tools, we removed the directories previously stored there.  The method used, however, is only a temporary change to the PATH variable.  The path can be reset by restarting the terminal window, or by resetting the PATH variable to its original contents:
$ PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games
To make the change permanent, and to append to but not replace the contents of the PATH variable, we should add the following to our ~/.bashrc file (if your installation does not have the file, create it):
export PATH="$PATH:/opt/android-sdk-linux/platform-tools"
It is not necessary to have quotes in the statement above if there are no spaces in the path, but it is the safest way to execute the statement.  By placing this statement in .bashrc, the PATH variable will be expanded, have the platform-tools folder appended, and then the value will be reassigned to the PATH variable.  This will happen each time a new terminal session starts.



Monday, August 24, 2015

Riffbox and Windows 10

I decided to bite the bullet and try out Windows 10. I wanted to learn the new operating system and determine if I could run specific software/hardware combinations under the new Windows that I had been running in Windows 7, specifically Riffbox. I happy to report, that after some trial and error, I have determined a reliable way of running riffbox under Windows 10. This should work equally well under Windows 8, as the issue is the same.

Hey, What's Your Sign? 


Windows 8 and now 10 both required a driver be digitally signed by a vendor known to Microsoft of it will flat refuse to install the driver. Under Windows 7, you had the option to continue even if the driver was not digitally signed, but not so for Windows 8/10.  The only solution is to reboot the computer into "Disable device driver signing detection" mode.

Disabling Driver Signing


  1. Right-click the start menu, select Shut down or sign out.
  2. Click Restart while holding the Shift key.  The boot option will appear.
  3. Select Troubleshoot > Advanced Options > Startup Settings > Restart.
  4. In the Startup Settings menu, press F7 to disable driver signing.
The PC will now perform a one-time boot with driver signing disabled.  In this mode, it is possible to install the Riff Box software.  If the driver's do not automatically install, navigate to C:\Program Files (x86)\RIFF Box JTAG Manager\Drivers\64Bit directory and run the dpinst.exe program as Administrator.  Use Windows 7 compatibility mode if necessary.  You will be warned that the three drivers are not digitally signed, but you will have the option to install anyway.

After you install the drivers, plug in Riff Box and launch JTAG Manager.  

Common Issues


If you plug the riffbox into your Windows 10 system and the green light continually flashes, riffbox has not been properly recognized.  An easy fix for this is to right-click the Start menu icon and select 'Device Manager'.  Under 'Ports', you should find the 'RIFF BOX Control Port (COM #)' entry.

  1. Right click the entry and select Update Driver Software.
  2. Select Browse my computer for driver software.
  3. Select Let me pick from a list of device drivers on my computer.
  4. With the Show compatible hardware box checked, select RIFF BOX Control Port.
  5. Click the Next button to install the driver.  The Riff Box will now be recognized as a 'RIFF BOX Control Port.'
Note: Follow this procedure, too, if instead of the Riffbox entry, you see a generic 'USB Serial Device.'

Reboot Reincarnation


Rebooting reinstates the device driver signature requirement in Windows 8/10. Riff Box will not be recognized by JTAG Manager in normal boot mode because the unsigned drivers will not load.  Even if you follow the trouble shooting listed in the Common Issues section above, JTAG Manager will not detect Riff Box.  The only solution (to date) is to reboot into Disabled Signature Enforcement Mode.

That said, there are ways to force disabled signature enforcement mode on each boot, but do they work?

Permanently Disabling Driver Signature Enforcement


You will read that you can use the bcdedit command line tool to disable driver signing on each boot automatically.  The tool modifies the boot configuration data store which replaced the boot.ini file of windows 7 and earlier.  However, while this process will allow driver installation, it still does not allow the drivers to load when Riff Box is connected to the PC.  I include the instructions below to show you what I did in testing.

To disable driver signing mode on boot (and start in Test Mode):
  1. Right-click on the Start menu icon, select Command Prompt (admin).
  2. Run the command:
    bcdedit /set loadoptions DISABLE_INTEGRITY_CHECKS
  3. Run the command:
    bcdedit /set TESTSIGNING ON
  4. Reboot.

To reenable driver signing (and exit Test Mode):
  1. Right-click on the Start menu icon, select Command Prompt (admin).
  2. Run the command:
    bcdedit /deletevalue loadoption
  3. Run the command:
    bcdedit /set TESTSIGNING OFF
  4. Reboot
Another possible permanent solution would be to disable driver signature enforcement in the Group Policy Editor.  You will only have this tool if are running Windows Pro and higher, however.  I did not have Windows 10 pro at the time of this writing, so I have not tested the following procedure:
  1. Right-click the Start menu icon and select Run.
  2. Enter gpedit.msc and press Enter.
  3. In the editor, select User Configuration > Administrative Templates > System > Driver Installation.
  4. Double-click Code signing for device drivers.
  5. Click the Enable radio button and then select Ignore in the Options drop down.
  6. Apply the changes and reboot.
If you have success with this second method, please post a comment so others will know the solution.

Tuesday, February 24, 2015

URLs : U R Loaded with Information

In my early days of forensics, I considered URLs in web histories as nothing more than addresses to websites, and strictly speaking, that’s true. But URLs often contain form information supplied by the user and other artifacts that can be relevant to an investigation, too. Most of us in the business know this already, at least it concerns one commonly sought after ingot: the web search term.

Consider the following URL:

https://www.google.com/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#q=linuxsleuthing

Most examiners would key in on the domain google.com and the end of the url, q=linuxsleuthing, and conclude this was a Google search for the term "linuxsleuthing", and they’d be right. But is there anything else to be gleaned from the URL? Just what do all those strings and punctuation mean, anyway?

What’s in a URL

Let’s use the URL above as our discussion focus. I’ll break down each element, and I’ll mention at least one value of the element to the forensic investigator (you may find others). Finally, I’ll identify and demonstrate a Python library to quickly dissect a URL into its constituent parts.

Protocol

https://www.google.com/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#q=linuxsleuthing

The URL starts with the protocol, the "language" the browser must speak to communicate with the resource. In the Python urllib module that I will introduce later, the protocol is referred to as the "scheme".

Examples:

  • http: - Internet surfing

  • https: - Secure Internet surfing

  • ftp: - File transfer operations

  • file: - Local file operations

  • mailto: - Email operations

The forensics value of a protocol is that it clues you into the nature of the activity occurring at that moment with the web browser.

Domain

https://www.google.com/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#q=linuxsleuthing

The domain can be thought of as the place "where the resource lives." Technically, it can consist of three parts: the top-level domain (TLD), second-level domain, and the host name (or subdomain). If you are more interested in those terms, I’ll leave it to you to research. Suffice it to say that we think of it as the "name" of the website, and with good reason. The names exist in this form because they can be easily memorized and recognized by humans. You may also encounter the domains evil twin in a URL, the Internet Protocol (IP) address, which domain names represent.

The Python urllib module referes to the domain as the "netloc" and identifies it by the leading "//", which is the proper introduction according to RFC 1808.

The forensic value of a domain is that you know where the resource defined in the remainder of the URL can be found or was located in the past.

Port

https://www.google.com/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#q=linuxsleuthing

The port is not listed in this url, nor is it often included in URLs intended for human consumption. However, if you see something like www.google.com:80, the ":80" indicates communication is occurring across port 80. You’ll often see port numbers for URLs to video servers, but port numbers are by no means limited to such uses. The Python urllib module incorporates the port in the "netloc" attribute.

The chief forensic value of a port is that it can clue you into the type of activity occurring on the domain because many port numbers are well known and commonly used for specific tasks.

Path

https://www.google.com/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#q=linuxsleuthing

In terms of a web server, the path indicates the path to the resource on the server. If the "file:" protocol is seen in the URL, then the path signifies the logical location of the file on the local machine. In fact, there will not be a domain, though the domain preamble is present, which is why you see three forward slashes for a file:

file:///path.

The Python urllib module also uses the name "path" to describe this hierarchal path on the server. Please understand that both hard paths and relative paths are possible. In addition, Python describes "params" for the last path element which are introduced by a semicolon. This should not be confused with the parameters I describe in the next section.

The principle forensic value of the path is the same as the over riding principle of real estate: location, location, location.

Parameters

https://www.google.com/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#q=linuxsleuthing

Parameters are information passed to the web server by the browser. They are also referred to as "query strings". Parameters can include environment information, web form data, window size, and anything else the web site is coded to pass on. Parameter strings are indicated by a leading "?" followed by key:value pairs. Multiple parameters are separated by "&". Python calls parameters the "query."

Consider our sample URL. It can be seen to have four parameters:

  • sourceid=chrome-instant

  • ion=1

  • espv=2

  • ie=UTF-8

Parameters are really the meat and potatoes of URL analysis, in my opinion. It is here I find the most interesting details: the user name entered on the previous web page; in the case of mobile devices, the location of the device (lat/lon) when the Facebook post was made; the query on the search engine, etc.

Despite what I said in the preceding paragraph, note that query string is not present the case of our sample URL. The search was conducted through the Google Chrome browser address bar (sourceid=chrome-instant). Thus, it is not safe to assume that all search engine search terms or web form data are to be found in the URL parameters.

To throw a little more mud on the matter, consider that the entry point of the search and the browser make a difference in the URL:

Search for linuxsleuthing from the Ubuntu start page, FireFox
https://www.google.com/search?q=linuxsleuthing&ie=UTF-8&sa=Search&channel=fe&client=browser-ubuntu&hl=en&gws_rd=ssl

Here, we see the same search, but different parameters:

  • q=linuxsleuthing

  • ie=UTF-8

  • sa=Search

  • channel=fe

  • client=browser-ubuntu

  • hl=en

  • gws_rd=ssl

Caution
Parameters will mean different things to different sites. There is no "one-definition fits all" here, even if there be obvious commonality. It will take research and testing to know the particular meaning of any given parameter even though it may appear obvious on its face.

Anchor

https://www.google.com/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#q=linuxsleuthing

The anchor links to some location within the web page document itself. If you’ve ever clicked a link and found yourself halfway down a page, then you understand the purpose of the anchor. Somewhere in the html code of that page is a bookmark of sorts to which that anchor points. Python calls the anchor a "fragment."

In the case of our sample URL, the anchor is the search term I entered in the address bar of the Google Chrome browser.

The forensics value of an anchor is that you know what the user saw or should have seen when at that site. It might demonstrate a user interest or that they had knowledge of a fact, depending on your particular circumstances, of course.

Making Short Work of URL Parsing

Python includes a library for manipulating URLs named, appropriately enough, urllib. The python library identifies the components of a URL a little more precisely than I described above, which was only intended as an introduction. By way of quick demonstration, we’ll let Python address our sample URL

iPython Interative Session, Demonstrating urllib
In [1]: import urllib

In [2]: result = urllib.parse.urlparse('https://www.google.com/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#q=linuxsleuthing')

In [3]: print(result)
ParseResult(scheme='https', netloc='www.google.com', path='/webhp', params='', query='sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8', fragment='q=linuxsleuthing')

In [4]: result.query
Out[4]: 'sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8'

In [5]: result.query.split('&')
Out[5]: ['sourceid=chrome-instant', 'ion=1', 'espv=2', 'ie=UTF-8']

In [6]: result.fragment
Out[6]: 'q=linuxsleuthing'
Note
The Python urllib calls the parameters I discussed a query and the anchor a fragment.

If you have a little Python knowledge, then you can see how readily you could parse a large list of urls. If not, it is not much more difficult to parse a url using BASH.

Parsing URLs using BASH variable substitution

$ url="https://www.google.com/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#q=linuxsleuthing"
$ anchor=${url##*\#}
$ parameters=${url##*\?}
$ parameters=${parameters//#$anchor/}
$ echo ${parameters//&/ }
sourceid=chrome-instant ion=1 espv=2 ie=UTF-8
$ echo $anchor
q=linuxsleuthing

Finding Parameters

If you want to narrow your search for URLs containing parameters and anchors, you need only grep your list for the "&" or "#" characters. If you are processing a history database such as the Google Chrome History SQLite database, you can export the relevant urls with the following query:

SQLite query for Google Chrome History
select * from urls where url like "%?%" or url like "%#%";

What’s All the Fuss?

So, why go to all this length to study a URL? I’ll give two simple illustrations:

In the first case, I had the computer of a person suspected of drug dealing. I found little relevant data on his computer doing basic analysis, including an analysis of search engine search terms. When I examined URL parameters, however, I found searches at website vendors that demonstrated the purchase of materials for growing marijuana.

In the second case, a stolen computer was recovered in close proximity to a suspect who claimed to have no knowledge of the device. The Google Chrome browser in the guest account was used since the date of the theft, so analysis was in order. URL parameters showed a login to the suspect’s Apple account 12 hours after the left. There was no useful data in the cache, only the URL history.

Finally, bear in mind that the URL history is the only artifact you may have of secure website activity. Browsers, by default, do not cache secure elements. Understanding the contents of a URL can clue you into activity for which may find no other artifacts.

It is good to know what’s in a URL!


Monday, February 23, 2015

Finding Felons with the Find Command

Digital devices are common place. Digital device examiners are not. How does the digital dutch boy prevent the digital device dam from breaking? By sticking his preview thumb into the leak.

The point of a forensic preview is to determine if the device you are examining has evidentiary value. If it does, the device goes into your normal work flow. If it does not, it gets set aside. The dam remains intact by relieving it of the pressure of non-evidentiary devices.

The point of this post is not to enter a discussion of the benefits and short comings of forensic previewing. I’m merely going to record a method I recently used to differentiate between the files created by the owner of a laptop computer and those generated by the thief who stole the computer. Hopefully, you see something useful here to adapt to your investigation.

The Plot

Police officers recovered a laptop from a home that they believed was stolen. One roommate said the device had arrived in the home a few days earlier, but did not know how it got there. The remaining members of the household claimed to know nothing about the computer at all.

I booted the device with a Linux boot disc designed for forensic examination. The disc allows storage devices to be examined without making changes. I was lucky enough to find a user account that had been established a few years earlier, and files in that account that allowed me to identify and contact the computer’s owner. The owner reported the device had been stolen from him two weeks earlier. The owner had password protected his account, but there was a guest account available for use.

Catching the Thief

I could have stopped there, but the job would have been only half-done. I knew who owned the computer, but I didn’t know who’d stolen it. Fingerprints were not an option, so I decided to look for data in the computer that might identify who had used the computer since it had been stolen. A quick look in the guest account showed me I was not going to be as lucky identifying the suspect as I had the victim: there were no user created documents.

What I need to do was to find the files modified by the suspect and inspect those files for identifying information. The suspect may not have purposely created files, but browsing the Internet, etc, creates cache and history files that point out a person as surely than a witness in a suspect lineup (that is to say, not with 100 percent certainty, but often reliable none-the-less).

File systems are very helpful in examinations of this nature: they keep dates and times that files are created, accessed and modified, just to name a few date attributes. Modern operating systems are very helpful, too, because they usually auto-sync the computer’s clock with NTP (Network Time Protocol) servers. Simply stated, modern operating systems keep accurate time automatically.

With this knowledge in mind, I was looking for guest account files (and, ultimately, all files) that were modified in in the past two weeks. Files modified outside that range were changed by the owner and of no interest. Fortunately, the find command provides a solution:

GNU Find command, example 1
# This command returns all files modified less than 14 days ago
$ find path/to/search -mtime -14 -daystart
Note
The -daystart option causes find to measure times from the start of the day rather than the last 24 hours.

The -mtime n option takes integer argument n. This is where a little explanation is in order. Had I passed the integer "14", I would have only returned files modified 14 days ago. Passing "-14" returns all files modified less than 14 days ago. Passing "+14" would cause find to return all files modified more that 14 days ago. It is possible to pass two -mtime options to create a narrow range, such as:

GNU Find command, example 2
# This command returns all files modified between 7 and 14 days ago
$ find path/to/search -mtime -14 -mtime +7

The command in the first example resulted in just over 1600 file names being returned. I saw that most of these were Google Chrome browser application data files. Both the "History" and "Login Data" SQLite databases contained data leading to the identity of the computer user since the date the laptop was stolen (a roommate) and the dates of the activity suggested the computer had been in that person’s possession since shortly after the theft.

Telling Time

The date command can really be your friend in figuring out dates and date ranges. It is easier to demonstrate than explain:

GNU Date command, example 1
$ date
Mon Feb 23 12:41:41 PST 2015
$ date -d 'now'
Mon Feb 23 12:41:50 PST 2015
Note
The two commands above do the same thing.
GNU Date command, example 1
$ date -d 'yesterday'
Sun Feb 22 12:43:42 PST 2015
$ date -d 'tomorrow'
Tue Feb 24 12:43:49 PST 2015
Note
The date command understands simple english. Used thusly, it calculates based on 24 hour periods, not from the start of the day.
GNU Date command, example 1
$ date -d '1 day ago'
Sun Feb 22 12:48:57 PST 2015
$ date -d '1 year ago'
Sun Feb 23 12:49:14 PST 2014
$ date -d 'next week'
Mon Mar  2 12:49:53 PST 2015

Note: The info date command will show you many, many more useful invocations of the date command.

Determining Elapsed Days

You may recall that the find command takes an integer for its date range options, but none of the date commands I illustrated above yielded and integer show the number of days elapsed or until that date. If there is an option for date to yield such information, I have not discovered it. However, a simple shell script can be created to allow us to use the "plain language" of the date command to help us determine the integers required by find.

count_days.sh
# This is a simple script that does not test user input for correctness
# usage: count_days.sh date1 date2

# collect dates from command line and covert to epoch
first_date=$(date -d "$1" +%s)
secnd_date=$(date -d "$2" +%s)

# calculate the difference between the dates, in seconds
difference=$((secnd_date - first_date))

# calculate and print the number of days (86400 seconds per day)
echo $((difference / 86400))
Note
This script can be made executable with chmod +x count_days.sh or simply executed by calling it with bash: bash count_days.sh

Now, we can figure out the number of days elapsed using the same plain language conventions accepted by the date command. Be sure to enclose each date in parenthesis if the date string is more than one word.

count_days.sh
# How many days have elapsed since January 10
$ bash count_days.sh "jan 10" "now"
44

# How many days elapsed between two dates
$ bash count_days.sh "nov 27 2013" "Aug 5 2014"
250

# How many days will elapse between yesterday and 3 weeks from now
$ bash count_days.sh "yesterday" "3 weeks"
22

You get the idea. And I hope I’ve given you some ideas on how to use the find and date commands to your advantage in a preview or other forensic examination.


Wednesday, January 7, 2015

Getting Attached: Apple Messaging Attachments

I sometimes get questions about showing attachments in Apple iDevice messaging databases. The questions, however, seem to come at a time when I don’t have any databases on hand to study the issue. Well, this week I stumbled on the chats.db during an exam of a MacBook Air. The database contains iMessage and SMS messages, and can be configured to sync with the user’s other iDevices (iPhone, for example) through iCloud. So, I took a look at the database an determined a way to match the attachments with the messages.

The chats.db is found in the users directory in the Library/Messages folder.

Location of chats.db
Library/Messages/
Library/Messages/Attachments
Library/Messages/chat.db
Library/Messages/chat.db-shm
Library/Messages/chat.db-wal

As you can see, message attachments are located in the Attachments sub-folder. But how are they referenced in the chats.db, and how are they matched to the correct message? The database schema gives us the clues we need.

The chats.db table schema
CREATE TABLE _SqliteDatabaseProperties
    (key TEXT,
    value TEXT,
    UNIQUE(key));

CREATE TABLE chat
    (ROWID INTEGER PRIMARY KEY AUTOINCREMENT,
    guid TEXT UNIQUE NOT NULL,
    style INTEGER,
    state INTEGER,
    account_id TEXT,
    properties BLOB,
    chat_identifier TEXT,
    service_name TEXT,
    room_name TEXT,
    account_login TEXT,
    is_archived INTEGER DEFAULT 0,
    last_addressed_handle TEXT,
    display_name TEXT,
    group_id TEXT);

CREATE TABLE attachment
    (ROWID INTEGER PRIMARY KEY AUTOINCREMENT,
    guid TEXT UNIQUE NOT NULL,
    created_date INTEGER DEFAULT 0,
    start_date INTEGER DEFAULT 0,
    filename TEXT,
    uti TEXT,
    mime_type TEXT,
    transfer_state INTEGER DEFAULT 0,
    is_outgoing INTEGER DEFAULT 0,
    user_info BLOB,
    transfer_name TEXT,
    total_bytes INTEGER DEFAULT 0);

CREATE TABLE handle
    ( ROWID INTEGER PRIMARY KEY AUTOINCREMENT UNIQUE,
    id TEXT NOT NULL,
    country TEXT,
    service TEXT NOT NULL,
    uncanonicalized_id TEXT,
    UNIQUE (id,
    service) );

CREATE TABLE chat_handle_join
    ( chat_id INTEGER REFERENCES chat (ROWID) ON DELETE CASCADE,
    handle_id INTEGER REFERENCES handle (ROWID) ON DELETE CASCADE,
    UNIQUE(chat_id,
    handle_id));

CREATE TABLE message
    (ROWID INTEGER PRIMARY KEY AUTOINCREMENT,
    guid TEXT UNIQUE NOT NULL,
    text TEXT,
    replace INTEGER DEFAULT 0,
    service_center TEXT,
    handle_id INTEGER DEFAULT 0,
    subject TEXT,
    country TEXT,
    attributedBody BLOB,
    version INTEGER DEFAULT 0,
    type INTEGER DEFAULT 0,
    service TEXT,
    account TEXT,
    account_guid TEXT,
    error INTEGER DEFAULT 0,
    date INTEGER,
    date_read INTEGER,
    date_delivered INTEGER,
    is_delivered INTEGER DEFAULT 0,
    is_finished INTEGER DEFAULT 0,
    is_emote INTEGER DEFAULT 0,
    is_from_me INTEGER DEFAULT 0,
    is_empty INTEGER DEFAULT 0,
    is_delayed INTEGER DEFAULT 0,
    is_auto_reply INTEGER DEFAULT 0,
    is_prepared INTEGER DEFAULT 0,
    is_read INTEGER DEFAULT 0,
    is_system_message INTEGER DEFAULT 0,
    is_sent INTEGER DEFAULT 0,
    has_dd_results INTEGER DEFAULT 0,
    is_service_message INTEGER DEFAULT 0,
    is_forward INTEGER DEFAULT 0,
    was_downgraded INTEGER DEFAULT 0,
    is_archive INTEGER DEFAULT 0,
    cache_has_attachments INTEGER DEFAULT 0,
    cache_roomnames TEXT,
    was_data_detected INTEGER DEFAULT 0,
    was_deduplicated INTEGER DEFAULT 0,
    is_audio_message INTEGER DEFAULT 0,
    is_played INTEGER DEFAULT 0,
    date_played INTEGER,
    item_type INTEGER DEFAULT 0,
    other_handle INTEGER DEFAULT -1,
    group_title TEXT,
    group_action_type INTEGER DEFAULT 0,
    share_status INTEGER,
    share_direction INTEGER,
    is_expirable INTEGER DEFAULT 0,
    expire_state INTEGER DEFAULT 0,
    message_action_type INTEGER DEFAULT 0,
    message_source INTEGER DEFAULT 0);

CREATE TABLE chat_message_join
    ( chat_id INTEGER REFERENCES chat (ROWID) ON DELETE CASCADE,
    message_id INTEGER REFERENCES message (ROWID) ON DELETE CASCADE,
    PRIMARY KEY (chat_id,
    message_id));

CREATE TABLE message_attachment_join
    ( message_id INTEGER REFERENCES message (ROWID) ON DELETE CASCADE,
    attachment_id INTEGER REFERENCES attachment (ROWID) ON DELETE CASCADE,
    UNIQUE(message_id,
    attachment_id));

I’ll provide a summary of the database as I understand it. Messages are predictably stored in the message table. The message table lacks a reference to attachments, other than the fact that one exists: + cache_has_attachments INTEGER DEFAULT 0+. The default setting is zero, meaning no attachements are stored. A value of 1 indicates there is an attachment in the Attachments sub-folder. One other issue we find when examining the message table is that there is a reference to the remote party in the conversation (handle_id INTEGER DEFAULT 0), but not the extact address—email, account identifier, or phone number—that an investigator would desire. That information is stored in the handle table. It is up to us to figure out how to relate the tables together.

Can’t we all just get along?

The difficulty in examining SQLite databases is determining how they are intended to relate information. There is seldom anything in the database itself that explains its intended use. It can similar to stumbling upon raw building materials and trying to figure out what is being built. Sometimes it’s easy, other times, not so much. But with the chats.db database, three table schema entries give us a clue as to the database design.

SQLite table join hints
CREATE TABLE chat_handle_join
    ( chat_id INTEGER REFERENCES chat (ROWID) ON DELETE CASCADE,
    handle_id INTEGER REFERENCES handle (ROWID) ON DELETE CASCADE,
    UNIQUE(chat_id,
    handle_id));

CREATE TABLE chat_message_join
    ( chat_id INTEGER REFERENCES chat (ROWID) ON DELETE CASCADE,
    message_id INTEGER REFERENCES message (ROWID) ON DELETE CASCADE,
    PRIMARY KEY (chat_id,
    message_id));

CREATE TABLE message_attachment_join
    ( message_id INTEGER REFERENCES message (ROWID) ON DELETE CASCADE,
    attachment_id INTEGER REFERENCES attachment (ROWID) ON DELETE CASCADE,
    UNIQUE(message_id,
    attachment_id));

INFO: The message_attachment_join table shows us that the message_id column in the table refers to the message table rowid column. Likewise, the attachment_id refers to the attachment table rowid. Thus, the message_attachment_join table is used to match attachments to messages.

Each of the table names above ends in the word join. As used, the word join is just part of a table name, but it hints at a SQLite table operation called a JOIN. A join combines two tables into one, and in SQLite there are two basic joins: INNER and OUTER. Inner joins, which come three variations, result in a combined table that includes only rows matching the join criteria. That is, the combined table only includes records with rows from each table that have one or more matching column values. While these are the default type of JOIN in SQLite, we are interested in results that show all messages, not just those with attachments.

OUTER joins, by contrast, do not require the records from each table to have a matching column. This means we can have combined table that shows all message rows, and if properly joined to the attachment table, rows containing messages with attachments will show attachment details. Further, if we join the handle table to the message table, we have everything we might want for an investigation.

I will be using a LEFT OUTER JOIN, which is shortened in syntax to LEFT JOIN. The basic syntax is "SELECT column(s) FROM left_table LEFT JOIN right_table ON left_table.columnName = right_table.columnName". A LEFT JOIN returns all rows of the left_table regardless of matching rows in the right table. Where rows match in the right table, they are joined to the matching left table row.

Tip
It is easier to understand and troubleshoot SQL queries by reading them backwards: Predicate, then subject. For example, reading the query in the paragraph above as "FROM left_table LEFT JOIN right_table ON left_table.columnName = right_table.columnName SELECT column(s)" can lend clarity to the output.

Applying a LEFT JOIN to the chats table, we can create a "Super table" combining the message, attachment, and handle tables.

SELECT *
FROM message AS m
LEFT JOIN message_attachment_join AS maj ON message_id = m.rowid
LEFT JOIN attachment AS a ON a.rowid = maj.attachment_id
LEFT JOIN handle AS h ON h.rowid = m.handle_id
Tip
The "expr1 AS expr2" statement sets expr2 as an alias for expr1, saving keystrokes and making the lines easier to read. Thus message_attachement_join.attachment_id becomes maj.attachment_id.

Entirely accurate, but probably containing more information than we need, the above query results in the following columns:

Table 1. Columns

ROWID

guid

text

replace

service_center

handle_id

subject

country

attributedBody

version

type

service

account

account_guid

error

date

date_read

date_delivered

is_delivered

is_finished

is_emote

is_from_me

is_empty

is_delayed

is_auto_reply

is_prepared

is_read

is_system_message

is_sent

has_dd_results

is_service_message

is_forward

was_downgraded

is_archive

cache_has_attachments

cache_roomnames

was_data_detected

was_deduplicated

is_audio_message

is_played

date_played

item_type

other_handle

group_title

group_action_type

share_status

share_direction

is_expirable

expire_state

message_action_type

message_source

message_id

attachment_id

ROWID

guid

created_date

start_date

filename

uti

mime_type

transfer_state

is_outgoing

user_info

transfer_name

total_bytes

ROWID

id

country

service

uncanonicalized_id

Note
If you look carefully at the schema at the top of this article, and the column listing above, you will notice that the columns are those of all four tables combined and in the order they are referenced.

We can refine the output by identifying specific columns we wish to display from each row. We can use the DATETIME function to convert the Mac Absolute Time in the date column to local time (by first converting to Unix epoch by adding a few more than 978 million seconds) and interpret the is_from_me column from integer to text using a CASE statement.

SELECT
    m.rowid,
    DATETIME(date +978307200, 'unixepoch', 'localtime') AS date,
    id AS address,
    m.service,
    CASE is_from_me
        WHEN 0 THEN "Received"
        WHEN 1 THEN "Sent"
        ELSE is_from_me
    END AS type,
    text,
    CASE cache_has_attachments
        WHEN 0 THEN Null
        WHEN 1 THEN filename
    END AS attachment
FROM message AS m
LEFT JOIN message_attachment_join AS maj ON message_id = m.rowid
LEFT JOIN attachment AS a ON a.rowid = maj.attachment_id
LEFT JOIN handle AS h ON h.rowid = m.handle_id

With this query, we end up with an easy to read output containing interpreted values with the following columns:

Table 2. Columns

ROWID

date

address

service

type

text

attachment

Tip
Why include message table ROWID? Row id’s a generated automatically for each message added to the database. A break in sequence will show a record has been deleted. Since it is possible to recover deleted records from SQLite databases, it is a convenient way to alert the investigator more analysis is required. Further, in the case of attachments multiple attachments, there will be one row for each attachment in a message. A repeating ROWID indicates two or more attachments are present for the message.

I hope this discussion of SQLite JOIN operations as they relate to the Apple iOS chats.db will help you in your examination of SQLite databases.