SQLite to the Rescue
One of the tasks I’m asked to perform is to geolocate mobile phone calls from Call Detail Reports (CDR). These usually arrive from a carrier as spread sheets: one with details of calls to and from a particular number, and one or more cell tower listings. I’ve tried a variety of ways to process these over time such as BASH scripting and python coding. But by far the easiest and most flexible way to process these records is by importing them into a SQLite database.
The long term difficulty in processing CDRs is that they change over time. It seems that every time I get new records to process, the format has changed which breaks previous code. It takes much more effort to recode a script than it does to write a SQL query on the fly, and I’m certainly no SQL guru. SQLite has enough built in functions to handle nearly any problem you might encounter.
Lets take some Sprint records I recently processed as an example. I was tasked with plotting the locations of the voice calls on a map. There were over 3,600 records for a 12 day period with text message details mixed with voice call details. Only the call details contained references to the tower records, however. Call records included five digit integers that represented the first and last cell towers the mobile phone used during the communication. Text messages contained only zeros in these columns.
The challenge was to retrieve the call records for mapping, ignoring the text messages that did not contain cell tower details. SQLite seemed the easiest way to accomplish this in light of the follow up requirement of looking up each cell tower integer in any one of four associated tower record spread sheets.
Creating the Database
The first step was to create a SQLite database. Fortunately, creating a database is a simple, straight forward process. I performed the work using in the SQLite command line program. However, GUI tools like the excellent SQLite Manager can accomplish the same thing and I recommend them if you are new to SQLite as they can be good teachers.
To create a new database, I simply provided the new database name when I opened the command line program. I called my database cdr.sqlite.
$ sqlite3 cdr.sqlite SQLite version 3.7.17 2013-05-20 00:56:22 Enter ".help" for instructions Enter SQL statements terminated with a ";" sqlite>
Next, I needed a table to hold the call records. I made this something easy to type, so I called it simply cdr. The columns I named for the columns found in the spreadsheet sent by Sprint.
sqlite> CREATE TABLE "cdr" ...> ( ...> "Calling_NBR" TEXT, ...> "Called_NBR" TEXT, ...> "Dialed_Digits" TEXT, ...> "Type" TEXT, ...> "Start_Date" TEXT, ...> "End_Date" TEXT, ...> "Duration" INTEGER, ...> "NEID" INTEGER, ...> "Repoll" TEXT, ...> "First_Cell" INTEGER, ...> "Last_Cell" INTEGER, ...> ); sqlite>
Note
|
In the SQLite command line program, all SQL queries must end in a semi-colon or the interpreter assumes you are adding lines to the statement. If you forget to put the semi-colon at the end of your command, you can enter it on the next line. |
To import the CDR data from the spreadsheet, simply export the data without the header row, in text file with comma separated values (CSV). In this case, I called the file "call_records.csv". I had to tell SQLite how the data was delimited (SQLite uses pipes by default), and import the CSV file into the "cdr" table.
sqlite> .separator "," sqlite> .import call_records.csv cdr
Note
|
The "dot" commands are special SQLite functions (use .help to view them all) and do not require a semi-colon. |
To import the cell tower details, I followed the same process: I created a table I called "towers" using the column headers from the spreadsheet as the column names of the table. Then I exported the cell tower data from each spreadseet to a CSV file and imported the CSVs into the tower table. While I won’t repeat the full process, I will display the table layout (schema) below.
sqlite> .schema towers CREATE TABLE "towers" ( "Cell" INTEGER, "Cascade" TEXT, "Switch" TEXT, "NEID" INTEGER, "Repoll" INTEGER, "Site_Name" TEXT, "Address1" TEXT, "Address2" TEXT, "City" TEXT, "County" TEXT, "State" TEXT, "Zip" TEXT, "Latitude" TEXT, "Longitude" TEXT, "BTS_Manufacturer" TEXT, "Sector" TEXT, "Azimuth" TEXT, "CDR_Status" TEXT ); sqlite>
Looking Up Records
Now it was time to lookup the call records in the tower tables to find that latitude and longitude of the tower used to initiate the call an place the mobile device in time and space. I expected it to be straight forward: take the 5-digit tower number from the first_call field of the cdr table, match it to the cell field in the towers table, and return the map coordinates. Easy peasy, right? The SQL equivalent of show me the latitude and longitude of the tower where the CDR first_cell integer matches the Tower cell integer.
sqlite> select latitude, longitude from towers, cdr where first_cell = cell; sqlite> # Ruh roh, raggy, no matches! sqlite> select first_cell from cdr where first_cell != 0 limit 5; 40385 10962 10962 20962 30392 sqlite> select cell, latitude, longitude from towers; <redacted> 385|34.046944|-118.448056 385|34.046944|-118.448056 385|34.046944|-118.448056 392|34.063806|-118.30366 392|34.063806|-118.30366 392|34.063806|-118.30366 962|37.657222|-122.094653 962|37.657222|-122.094653 962|37.657222|-122.094653 385|37.838333|-122.298611 385|37.838333|-122.298611 392|37.693|-122.0939 392|37.693|-122.0939 392|37.693|-122.0939 385|37.403633|-121.89436 385|37.403633|-121.89436 385|37.403633|-121.89436 <redacted> sqlite>
Whoa, the cdr first_cell and towers cell values do not jibe! And, as we can see, there is more than one entry in the tower table for each cell designator. Take cell 385 for example: there are three distinct groupings of tower 385 with three different map coordinates for each group. It turns out that cell towers are grouped by the switch they are part of, recorded in the CDR and tower records as the NEID. The appropriate cell tower record can be further reconciled by the sector number, or side of the tower from which the call originated. The first_cell value is actually a concatenation of the sector and the tower number. How did I figure all this out? The answer came from reading the documentation (RTFM) that came with the records and some analysis of the spreadsheets.
I’ll demonstrate below the values that make a tower record unique and that must be considered when matching call records to tower details. I’ll focus on tower 385
sqlite> .headers on sqlite> .mode columns sqlite> select cell, sector, neid, azimuth, latitude, longitude from towers ...> where cell = 385; Cell Sector NEID Azimuth Latitude Longitude ---------- ---------- ---------- ---------- ---------- ----------- 385 1 65 60 34.046944 -118.448056 385 2 65 200 34.046944 -118.448056 385 3 65 290 34.046944 -118.448056 385 1 512 55 37.838333 -122.298611 385 2 512 155 37.838333 -122.298611 385 1 95 0 37.403633 -121.89436 385 2 95 110 37.403633 -121.89436 385 3 95 190 37.403633 -121.89436 sqlite>
Now it is easy to see that the three different groupings of tower 385 are a result of that tower designator being used in three different switches, or NEIDs. Further, each tower can be resolved to a sector, which corresponds to a unique asimuth or direction the cell tower antenna points.
SQLite Substrings
The remaining problem in querying this data is the configuration of the first_cell value in the call details. Recall that it is the sector concatentated to the tower number. I needed a way to take the first digit from the integer and assign it to a sector value, and use the remaining four digits as the tower designator. Fortunately, SQLite has a built-in substring function to make this easy.
The substr(X,Y,Z) function returns a substring of input string X that begins with the Y-th character and which is Z characters long. If Z is omitted then substr(X,Y) returns all characters through the end of the string X beginning with the Y-th. The left-most character of X is number 1. If Y is negative then the first character of the substring is found by counting from the right rather than the left. If Z is negative then the abs(Z) characters preceding the Y-th character are returned. If X is a string then characters indices refer to actual UTF-8 characters. If X is a BLOB then the indices refer to bytes.
— SQLite
From the SQLite documentation, we see that the substr function takes 2-3 arguments and returns a substring of the of the input string based according to those documents. To return the sector, I needed to take the first digit from the first_cell string in this manner: substr(first_cell, 1, 1). To return the tower identification, I needed to skip the first digit and return the rest of the string thusly: substr(first_cell, 2).
Note
|
I did not need to specify the third argument in the second substr() expression because I wanted the entirety of the string past the first digit. |
Finally, I needed to include the NEID from the call detail records to ensure I’ve looked up the correct tower. Putting it all together, we can see how to created the values we need from the call records to find the matching tower details. I’ve added a second query to demonstrate, using the ltrim() function to strip the leading zeros from the cell column.
sqlite> select substr(first_cell, 1, 1) as sector, ...> substr(first_cell, 2) as cell, neid ...> from cdr where first_cell != 0 limit 5; sector cell NEID ---------- ---------- ---------- 4 0385 95 1 0962 169 1 0962 169 2 0962 169 3 0392 512 sqlite> select substr(first_cell, 1, 1) as sector, ...> ltrim(substr(first_cell, 2), 0) as cell, neid ...> from cdr where first_cell != 0 limit 5; sector cell NEID ---------- ---------- ---------- 4 385 95 1 962 169 1 962 169 2 962 169 3 392 512 sqlite>
Putting it All Together
As usual, the explanation is more step intensive that the actual work. The whole process can be done in one query, but I wanted to break it down so that it would be easier to recognize the elements of the query. To make it more legible, I’ll write it across several lines.
sqlite> select start_date as Date, calling_nbr as Number, latitude, longitude ...> from cdr, towers ...> where substr(first_cell, 1, 1) = towers.sector and ...> ltrim(substr(first_cell, 2), 0) = cell and ...> cdr.neid = towers.neid limit 5; Date Number Latitude Longitude ------------------- ---------- ---------- ----------- 2012-12-10 07:36:39 ########## 37.657222 -122.094653 2012-12-10 08:24:21 ########## 37.657222 -122.094653 2012-12-10 08:26:09 ########## 37.657222 -122.094653 2012-12-10 09:59:40 ########## 37.693 -122.0939 2012-12-10 10:00:26 ########## 37.705128 -122.047417 sqlite>
This can be converted to a CSV file suitable for mapping through a website like gpsvisualizer.com[GPS Visualizer] or a program like GPSBabel. First, I change the output mode to CSV and I change the columns names to comply with the mapping software’s requirements, printing a sample to ensure I have the format I am seeking. Then I output the data to a file for import to the mapping program.
sqlite> .mode csv sqlite> select start_date as name, calling_nbr as desc, latitude, longitude ...> from cdr, towers ...> where substr(first_cell, 1, 1) = towers.sector and ...> ltrim(substr(first_cell, 2),0) = cell and ...> cdr.neid = towers.neid limit 5; name,desc,Latitude,Longitude "2012-12-10 07:36:39",##########,37.657222,-122.094653 "2012-12-10 08:24:21",##########,37.657222,-122.094653 "2012-12-10 08:26:09",##########,37.657222,-122.094653 "2012-12-10 09:59:40",##########,37.693,-122.0939 "2012-12-10 10:00:26",##########,37.705128,-122.047417 sqlite> .output call_map.csv sqlite> select start_date as name, calling_nbr as desc, latitude, longitude ...> from cdr, towers ...> where substr(first_cell, 1, 1) = towers.sector and ...> ltrim(substr(first_cell, 2),0) = cell and ...> cdr.neid = towers.neid; sqlite>
So, in its simplest form, you can see this is not necessarily a difficult process. It can be distilled into three basic steps:
-
Review the records and determine the relationships between them
-
Import the data into a SQLite database
-
Query the database for the output needed
Though there is commercial mapping software available, the software I’ve seen either lacks flexibility to deal with differences in records or in output. Further they usually require configuration that can take as long or longer than importing the data into SQLite and writing the specific query you need for your investigation. If you have the software and a happy with it, use it. If you find it is lacking the flexibility you need, consider doing the work by hand. You’ll be better for it!
Note
|
In the interest of full disclosure, the results demonstrated here are over simplified. For the real casework, I interpreted the call direction with a case statement to select the called number or calling number as appropriate. The result was a Google Earth map with waypoints named for the date and time of the call. Clicking the waypoint showed call details, e.g., (To: ()-## for 37 secs, azimuth 270). |