I studied some top search engine results and reviewed some browser history and crafted the following GNU extended regular expression:
[?&](k|p|q|query)=[a-zA-Z0-9+_%-]+
This search, run against strings output of files, found search queries for Google, Yahoo!, Bing, Ask, Aol, Faceboot, YouTube, Vimeo and some x-rated sites as well as app content such as Twitter. Search results appear (depending on what you feed and how you configure GNU grep) similar to:
https://www.google.com/search?q=you+found+me
http://m.youtube.com/results?q=some%20video%20i%20like
https://m.facebook.com/search/?query=that%20guy%20
An added benefit to this expression is that it also hits on additional page results, Google images page refreshes, etc. With little command line wiz-bangery, it's even possible to sort and count the results to get a histogram of searches:
strings History.plist | egrep -o '[?&](k|p|q|query)=[a-zA-Z0-9+_%-]+' | sed 's/.*=//' | sort | uniq -c | sort -nr
I'll explain the command above:
- strings History.plist # extract ascii strings from the iPhone Safari History.plist
- egrep -o '[?&](k|p|q|query)=[a-zA-Z0-9+_%-]+' # grep for the regular expression described above
- sed 's/.*=//' # strip off the query tag at the front of the user typed query
- sort # sort the results alphabetically
- uniq -c # count the matching lines
- sort -nr # reverse sort, placing the most frequent query terms first.
Results of the command look similar to the following:
21 I+search+for+this+a+lot
11 this%20one%20a%20little%less
2 why+would+anyone+read+linux+sleuthing
1 testing%20one%20two
The expression could be run against all logical files in a device and against unallocated space, if applicable. I only demonstrate it using the History.plist because it's easy illustrate.
I post this short article both because I want to remember this regular expression (the whole reason for my blog in the first place) and to solicit favorite search box/engine regular expressions you might have. Please share them in a comment if you get a chance. Happy searching!
No comments:
Post a Comment