Thursday, July 21, 2011

BlackBerry Text Message Parsing, AKA, Why I use Linux for Forensics

A little detour from my usual posts to explain why I use Linux for forensics, though my upbringing was in Windows-based tools like EnCase.  A colleague contacted me today with a little issue: He had found a BlackBerry text messaging backup file in CSV format (EDIT: This was actually a BlackBerry Messenger save file) on an external memory card, but the date code for each message was perplexing.  He asked me if I could help in interpret the code.  It looked like the following:
201010181287467321760
The full format of the CSV was "date, from(hexID), to(hexID), message."  It was obvious to my colleague and me that the first 8 digits of the date code was the date of the message in plain text, i.e., "20101018" or "2010-10-18."  My Unix roots made the remaining digits of the numeric string easy to identify: unixepoch in milliseconds, i.e., 1287467321760 milliseconds since 1-1-1970 00:00:00.

A quick verification with the date command, but truncating the date at seconds (i.e., dividing by 1000):
$ date -d @1287467321
Mon Oct 18 22:48:41 PDT 2010

The converted unixepoch date matches the plain text date. We seem to have interpreted the code correctly, and now we know the local time of the message as well.

But of course, using the date command is not why I find Linux so valuable.  It because of the ease with which I was able to convert the whole file, having discovered the meaning of the date code.  Remember the format of the file?  It was "date, from(hexID), to(hexID), message."

Consider three lines from the file as an example:
201010181287467321760,"6C31FB2C","0F315216",Hey
201010191287534544913, Oct 19 17:29:04 PDT 2010,"6C31FB2C","0F315216",Hey, you there?
201010191287534602157,"0F315216","6C31FB2C",Yeah, let's meet.

A quick, simple while loop to read each line of the Messenger save file,
$ cat backup.csv | while read line; do date=${line%%,*}; remainder=${line#*,}; echo "$(date -d @${date:8:10}),$remainder"; done
Mon Oct 18 22:48:41 PDT 2010,"6C31FB2C","0F315216",Hey
Tue Oct 19 17:29:04 PDT 2010,"6C31FB2C","0F315216",Hey, you there?
Tue Oct 19 17:30:02 PDT 2010,"0F315216","6C31FB2C",Yeah, let's meet.
...

Let me break that down:
cat backup.csv | while read line  #read a line of the file, assign to variable 'line'
do
  date=${line%%,*}  #read everything up to the first comma, assign to variable 'date'
  remainder=${line#*,}  #read everything beyond the first comma, assign to variable 'remainder'
  echo "$(date -d @${date:8:10}),$remainder" #convert digits 9-18 to local time, print localtime and the remainder of the line to stdout
done

The key to this solution is something I didn't learn in my initial studies of Bash, but I make extensive use of it now: variable expansion. I use the various expansions available in Bash 4 to assign portions each line to variables that I could then operate on and print the result. I won't discuss all available expansions here, but I will explain those I used:

Removing the longest match from the end:

Consider: Each line contained comma separated values. I really only needed to operated on the first value -- the date code. I prefer, as much as possible, to not call external tools, such as cut or awk, so as to not unnecessarily start external processes. Bash variable expansion makes this possible. The syntax ${var%%PATTERN} will remove the longest match to PATTERN from the variable 'var'.

So, in line 3, date=${line%%,*} assigns to the 'date' variable all of the contents of the 'line' variable up to the first comma. Thus, in the case of the first line, date="201010181287467321760".

Removing the shortest match from the beginning:

With the date code isolated in the 'date' variable, we still need to print the rest of the line once we convert the date. The syntax ${var#PATTERN} will remove the shortest match to PATTERN from the variable 'var'.

So, in line 4, remainder=${line#*,} assigns to the 'remainder' variable all of the contents of the 'line' variable after the first comma. Thus, in the case of the first line, remainder=""6C31FB2C","0F315216",Hey"

Returning a substring of a variable:

Finally, we need to isolate the unix epoch time from the plain text date in the date code now stored in the variable 'date'. We do this by indexing. The syntax ${var:OFFSET:LENGTH} will return a substring of the variable 'var' starting at OFFSET for the specified LENGTH. The first character in a variable is indexed at offset 0.

So, in line 5, ${date:8:10} returns 10 characters of the variable 'date' starting at the 9th character (remember, indexing starts at 0). Thus, we have now fed the unix epoch date string incorporated in the Messenger date code to the unix date command to be converted to local time in a human readable format.

Line five is a complex command, that indexes the 'date' variable, converts it in a sub-process with the date command, and then echo the result with the contents of the 'remainder' variable appended.

Where to go from here:

If you have some Bash skills, but want to advance them, I recommend the book "Pro Bash Programming: Scripting the GNU/Linux Shell" by Chris F.A. Johnson from Apress.

3 comments:

  1. great post - the content in the csv file is not BB text messaging but rather BlackBerry Messenger (BBM) messages. BBM is RIM's proprietary IM client.
    The ability to save BBM chat only applies to BBM app version 5.x and higher. Device user must elect to save this to either device memory or memory card. By default saving BBM chat history is turned off.

    Shafik Punja

    ReplyDelete
  2. Thank you Shafik! Your comment is one of the reasons I blog like this. I have no experience with BlackBerry devices or their files. I received the file and its description by email from a colleague with the goal of decoding the dates. My actual solution was a bit more involved, but I'm highlighting the basic decoding loop here.

    Thank you for providing more information about the file type and its purpose.

    ReplyDelete
  3. Hey Slo - your most welcome. More about me: http://www.ericjhuber.com/2011/04/interview-with-shafik-punja.html

    I would like to see your actual solution if possible. I can be reached at shafghp@gmail.com; twitter is @qubytelogic.

    I am also seeking your permission to cite your post in my training content.

    Thanks for the great posts on this blog...keep it up..

    Shafik

    ReplyDelete

Time Perspective

Telling time in forensic computing can be complicated. User interfaces hide the complexity, usually displaying time stamps in a human reada...