Today’s post is from Emma Barnes – our resident data and CRO queen over at Branded3 – I strongly suggest following Emma on Twitter for awesome SEO, data, Pokemon and gaming related insights
Hello keen keyword miners, my name is Emma and I usually review video games, but sometimes I dabble in SEO.
Today I am going to talk about keyword prospecting. The tools you will need are:
- Google Analytics Account
- Some Regular Expressions (RegEx)
- Creative Thinking
What will I learn?
How to find good keywords from Google Analytics without having to sift through loads of crap manually.
Don’t worry, this is in no way scary! I am not a programmer and even I could figure it out. A word of warning: I really like Pokémon. It is not however, necessary to understand Pokémon to understand this post.
OK, first of all:
What are Regular Expressions?
I will be referring to these as ‘RegEx’ throughout this blog post. RegEx is a way of telling a programme to match strings of text. RegEx is not a programming language by itself, and it takes many dialects depending on what framework you’re using it with. So essentially: it doesn’t really matter if you know what RegEx is if you’re going to follow my tutorial. I will use whatever works for the ‘Matching RegExp’ section of an Analytics and teach you how to do the same.
Now, let’s start keyword mining, shall we?
The website I am going to use data for is JulianKay.com. Julian has kindly given me access to his Analytics account and helped me a lot with understanding how RegEx works. He also encouraged the drawings used throughout.
Step One:
Log into Analytics. Go to Traffic Sources: Search: Overview and set the secondary dimension as ‘keyword’. This will show you all the keywords anyone has landed on your site via from any search engine, for both organic and paid traffic. You can adjust this to just show your search engine of choice or just to show organic visits, however I think that all possible keywords are valuable and we’re going to sift crap out anyway, so why the hell not. I advise you choose whatever time frame is most relevant – for JulianKay.com, I’ll be using the last 12 months worth of data.
Starting Keyword Count: 4250
Pro Tip:
If you want to get the best possible results, make sure you turn that annoying slider all the way up to ‘higher precision’ – this way you get way more keywords. Unless it’ll kill your computer. If it’ll kill your computer, I can forgive you to setting it to ‘faster processing’
Step Two:
Familiarise yourself with the advanced filtering in Analytics. We will be using this a lot.
Step Three:
Filter out your brand terms. If you’re looking for new keywords there’s no point in having to look at terms that you’ll probably always rank for.
You should do this by using an ‘exclude’ filter. To find your brand terms just have a look at what keywords people are using to find your website (specifically your website, rather than any website that offers the same things as your website). In this case, it’s terms like that include the phrase ‘Julian’ and also the misspellings ’Julain’ and ‘Julien’. What these all have in common is that they start with ‘Jul’, however if we do a simple ‘include Jul’ filter, we end up with phrases that include the word ‘July’, so in this case the filter I used was a simple one:
Exclude Keyword Matching RegExp julian|julain|julien
Come and greet your new friend, Pipe symbol: |
This is the first piece of RegEx I will teach you. The pipe symbol (|) simply means ‘or’ so instead of doing 3 include filters, we simply have one filter. You can add more terms that you consider to be brand terms.
What if my brand term is the same as a big keyword?
So, if you’re an exact match domain, this is a little different. Say you own ‘exact-match-keyword.com’ and the keywords you target are things like:
exact match keyword (which happens to be your brand name)
exact match keyword UK
cheap exact match keyword
where can I go to buy exact match keywords in West London?
If you ran the exclude filter for ‘exact match keyword’ and ‘exact-match-keyword’ you would lose a lot of valuable keyword data.
So, in this case I would recommend you only filter out phrases that include the website name ‘exact-match-keyword’ and the EXACT phrase ‘exact-match-keyword’ which would be done using the following:
Exclude Keyword Matching RegExp exact-match-keyword|^exact match keyword$
Please welcome Carat (^) and Dollar ($)
The Carat (^) symbol means “a string must start with the following”.
The Dollar ($) means “a string must close with the preceding”.
So combined means “the string must exactly match what is between these”. This is a very useful combination. I also use it to find whether I’ve got traffic from specific keywords.
For example, Julian Kay gets a lot of traffic from queries that include ‘F#’ (he is a developer after all) but does he get any traffic from that exact term? Instead of scrolling through a ‘containing’ list which might be quite long (imagine if only one person had found his site through that term, nightmare), you can use ^F#$ to see if he got any visits from that term. Sorry Jules, you didn’t.
So, we’ve now done our brand filtering
Keyword Count 4250 4156
Step Four: Exclude very silly keywords
I’ve worked with a variety of clients and a lot of them get traffic from the term ’2′. How do they manage it?! Keywords like this are very silly, so let’s blast them using this:
Exclude Keyword Matching RegExp ^.?$
More new friends: Full Stop (.) and Question Mark (?)!
Essentially the above means ‘exclude anything that is zero or one character long’.
The Full Stop (.) means ‘any single character’ and the Question Mark (?) means ‘one or zero of the previous’. We have to put these inside our exact match brackets (as such) otherwise Analytics won’t know what we’re applying it to.
This can be extended to as many characters as you think to be a ‘stupidly short’ length via:
Exclude Keyword Matching RegExp ^.{1,x}?$
Where x is the maximum character length you’d consider silly. The addition of the curly brackets means ‘I want between 1 and x characters’ so it excludes anything of character length 1 and x.
Keyword Count 4250 4156 4155 (He got traffic from the keyword ’1′ somehow)
Step Five:
Eliminating Crap that no-one really searches for without using the Google Keyword Tool for Adwords
There are keywords that people do search for, but are no good for keyword prospecting because they contain special characters. For example, people often do searches that involve “phrase matching” and +inclusion, and these use special characters. But there’s not much point in me returning to Jules and telling him, hey you should try to rank for “black ink for writing’” including these terms. Maybe black ink for writing by itself might be okay though.
What you should really decide is:
Am I writing an inclusion or exclusion request? Or should I take my chances and download all the data and use Excel to filter it?
- Do you have very few inbound keywords? If so, consider skipping this step and using Excel to filter out crap like this.
- Do you expect your keywords to just come from the English language? If so, you should probably only allow Alphanumeric Characters
- Do you expect your keywords to include Unicode characters such as é (say you ranked well for a lot of French terms or the word Pokémon for example)? If this is the case, you should probably use an exclusion string.
To include only Alphanumeric characters in RegEx:
Include Keyword Matching RegExp ^[a-zA-Z0-9\s]*$
Wh-what!? What just happened?!
The above just tells it ‘allow any keyword as long as the characters within are between a and z (lowercase), A and Z (UPPERCASE), 0 and 9 (numerical) or a space’.
The Square brackets ([]) group that set together and the asterisk (*) means the string can be whatever length you want.
And \s means ‘whitespace character’ such as a space or a tab.
However, this might match Pokemon, but not Pokémon, as é is not within the group specified.
You can also add ‘safe’ characters to the list. Julian Kay gets a lot of searches that include ‘.exe’ and ‘f#’ terms, so I’d like to see those keywords. I’ve modified the string as such:
Include Keyword Matching RegExp ^[a-zA-Z0-9\s\.#]*$
And hooray, he gets data on F# and .exe type Keywords again!
But Emma, what is \.?
Ah, the Backslash (\) character allows special characters (such as Full stop (.)) escape their special meaning and take on a more literal one. If I didn’t use \. it would allow any wildcard character which is the exact opposite of what I want!
To Exclude Special Characters
Exclude Keyword Matching RegExp .*[!#\$%&\(\)\*\+,-\./:;\?@\[\\\]\^_`\{\|\}~¢£¥¿¬½¼¡«»¦µ±°••²€„…†‡ˆ‰Š‹Œ‘’“”–—˜™š›œ¨©®¯³´¸¹¾.Ø÷ø”‘].*This is a finite list, so if any characters that aren’t within the list are used these will slip through the net, but will save your previous Pokémon related keywords or keywords with French characters in them.
For JulianKay.com we will be using the inclusion string rather than the exclusion string.
Keyword Total: 4250 4156 4155 3673
Step Six:
Segregate your long-tail from your head terms
I hope you know what kind of keywords you’re after here. If you are after keywords with a massive search volume, you’ll probably not want anything over a certain word count. If you’re doing this for content-generation purposes, you will probably want all the queries that come in OVER a certain word length.
Either, way, I have the answer:
Include Keyword Matching RegExp ^(\w+\s*){x,y}$
This allows you to find keywords that are between x and y words long. Very helpful for narrowing a search.
Today’s new friend is \w – he indicates that there is a word present
I want to looks for meaty head terms (oh, the innuendo that was definitely intended) so I’ll be using ^(\w+\s*){1,3}$ as my RegEx of choice here. This of course can be adjusted to your needs as required.
Keyword Total: 4250 4156 4155 3673 1160
Yes! We’ve cut down 3090 crap keywords!
Optional Step - Filter for:
Include Visits Greater Than/Less Than X
There are two reasons to do this:
- To source keywords you didn’t even know you have (use a less than filter). If you get even one visit from it you can probably rank for it somewhere. Say, Julian, did you know you got one visit from all black pencils and the visitor looked at 3 pages? Maybe you should write a post about black pencils if you think it’s relevant.
- To find variations on keywords you knew you did have (use a more than filter) 60 visits from the term photographing Jupiter and 40 from stargazing live York? They look like they could be neat terms.
Step Seven:
Export and enjoy
You’re done. I can’t help you with much more, you’re on your own from here, but it’s a little less scary than it was before, right?
I strongly recommend that you run your remaining keywords through the Google keyword tool to grab search volumes for all these cool keywords.
Some people I’d like to thank:
Julian Kay for helping me make the RegEx look prettier and letting me use his website as an example
Alan Ng for helping me when my RegEx doesn’t work
Joe Griffiths who kindly let me put up this blog post.
Resources I used and would recommend
http://www.regular-expressions.info/ – Really quick reference tables
http://services.google.com/analytics/breeze/en/v5/regex_ga_v15_ad1/ – This is the Google Analytics Conversion University Tutorial that got me into RegeEx in the first place
Your favourite search engine – someone on a geeky forum will have an answer if your RegEx isn’t working.







8 Comments
Emma Barnes
June 22, 2012 at 12:37 pmJulian Kay
June 22, 2012 at 12:43 pmEmma Barnes
June 22, 2012 at 1:56 pmgudipudi
June 23, 2012 at 11:56 pmEmma Barnes
July 2, 2012 at 2:49 pmAj Banda
August 24, 2012 at 7:19 amEmma
September 4, 2012 at 3:24 pm