 |
|
 |
| |
|
GOOGLE EXPLAINED !!
|
|
Forget what you think you know about ‘Search Engines’
|
|
|
First and foremost the majority of what people think they know about search engines is outdated. The days of all
those search engines vying for your attention are gone and Google reigns supreme (no offence intended to
Yahoo and the others). The main reason for this, and the single biggest differentiation between Google and the
old search engines, is that they were PAID DIRECTORIES and Google is a WEBCRAWLER.
|
|
|
THE OLD DAYS
|
|
The search engines that were so prolific in the 1990’s were a vastly different animal to what we see today.
Websites were submitted to these directories, and would be indexed according to their nominated keywords.
For a small fee (try $200+ USD) the usual waiting time for entry of 6 months could be reduced to around 2 weeks.
These engines worked on the basis that if a keyword being searched for appeared in either the nominated keywords
for a site, or in the body of the text, then that site was a result and would be listed along with the other however
many thousands also matched. Order was determined by various unsavoury methods such as; number of times of occurrence,
proximity of occurrences, font size of occurrence etc. Needless to say many industrious website administrators used
tactics such as filling a page with invisible text containing keywords repeated endlessly, or putting as many related
keywords together as they could, to increase their ranking in the search results.
This led to the effective waste of a perfectly good idea, as pornography and scam websites took up all the top results.
That’s why Google did it differently.
|
|
|
WHAT'S A WEBCRAWLER ANYWAY ?
|
|
Instead of relying on a site to be submitted (although you can still do this) Google actively browses the web itself.
It uses its spider/robot (named Googlebot) to look up a webpage, find all the links therein and add these to the list
for Googlebot to visit. It then passes a copy of the page to the Google indexer for storing. Googlebot then moves on
to the next page on the list to be viewed. This way it crawls through the web from page to page using links giving it
its name.
The Google indexer stores the full text of the page passed to it from Googlebot, This index is sorted alphabetically
by search term, with each index entry storing a list of documents in which the term appears and the location within the
text where it occurs. This data structure allows rapid access to documents that contain user query terms.
The 3rd part of Google is the Query Processor. When a searcher enters a word or phrase to search for, the ‘engine’
requests all matching documents from the Indexer. The Google Results Formatter then considers over a hundred factors
in determining which documents are most relevant to a query, including the popularity of the page, the position and
size of the search terms within the page, and the proximity of the search terms to one another on the page.
The process which Google uses to order these results is called PageRank, and this is the key to increasing your
position on Google.
|
|
|
PAGE RANK
|
|
PageRank is a numeric value that represents how important a page is on the web. Google figures that when one page links
to another page, it is effectively casting a vote for the other page. The more votes that are cast for a page, the more
important the page must be. Also, the importance of the page that is casting the vote determines how important the vote
itself is. Google calculates a page's importance from the votes cast for it. How important each vote is, is taken into
account when a page's PageRank is calculated.
Not all links are counted by Google. For instance, they filter out links from known link farms. Some links can cause
a site to be penalized by Google. They rightly figure that webmasters cannot control which sites link to their sites,
but they can control which sites they link out to. For this reason, links into a site cannot harm the site, but links
from a site can be harmful if they link to penalized sites. So be careful which sites you link to. If a site has
PageRank 0, it is usually a penalty, and it would be unwise to link to it.
PageRank (PR) for your site is calculated like such:
PR(yoursite) = 0.15 + 0.85 ( PR(Site1) / TotalLinks(Site1) + PR(Site2) / TotalLinks(Site2) + …. + PR(SiteN) / TotalLinks(SiteN) )
In English…
The PR for each page linking to you is divided by the total number of links on that page. This is then added to the PR
received from all the other pages linking to your site. The 0.15 + 0.85 (...) at the start is just a dampening factor.
This means that each page hands out a little less (85%) of its PageRank in equal portions to each of the sites it links to. This would imply that it would be better to be a single link on a page with low PageRank than be one of hundreds of links on a high PageRank page. This is not the case however, as PageRank is believed to be not a linear distribution, but rather logarithmic. A page of PR 2 is not one better than a page of PR 1 in this case, but several times more important (it is believed to be possibly somewhere between 5 and 6).
|
|
|
INTERNAL LINKS
|
|
The maximum PageRank in a site equals the number of pages in the site times 1. The maximum is increased by inbound
links from other sites and decreased by outbound links to other sites. We are talking about the overall PageRank in the
site and not the PageRank of any individual page. Poor internal linkages can cause a site to fall short of its maximum
but no kind of internal link structure can cause a site to exceed it. The only way to increase the maximum is to add
more inbound links and/or increase the number of pages in the site.
Whilst it is thoroughly recommend to create and add new pages to increase a site's total PageRank so that it can be
channelled to specific pages, there are certain types of pages that should not be added. These are pages that are all
identical or very nearly identical and are known as cookie-cutters. Google considers them to be spam and they can
trigger an alarm that causes the pages, and possibly the entire site, to be penalized. Pages full of good content are a
must.
Now that a website has a set amount of maximum PageRank associated with it, how do we make this work for us?
|
|
|
CHANNELLING PAGERANK
|
|
Well if a website has 100 pages, and thus 100 PageRank, then it’s clear that each page has PR 1. This means in Google’s
eyes, each page in the site is equally important. That’s good right?
Unfortunately, no. What is much more preferable, is that the index or home page of the site rates higher than all the
other content, as this is the front end and central nexus for all the information contained within the site. Having this
index page with a PageRank closer to 100 means that it will appear higher up in search results and will be far more
relevant to each and every visitor than any individual page from the website could ever be.
As an example, a company selling 99 different items online might have a home page and 99 pages for the items. It is
better in the eyes of the company to have their index page appear in the top 10 search results on Google, than have
all 100 pages appear in the 1,000 + region of the search results. Once a visitor clicks the link to the home page they
are then 1 step away from what they wanted, one item out of the 99 available.
From this, the strategy to achieve optimum PageRank for the index page and thus increase the position in the search
results is to have all the individual pages of a website link back to the index page, with the index page linking to
each of these pages where physically possible. At the same time, links between sub pages on the site should be minimised
in order to channel all of those pages PageRank to the index page.
|
|
|
OPTIMAL LINKING ADDRESS
|
|
To Google www.domain.com , domain.com , www.domain.com/index.html and domain.com/index.html are different addresses
and, therefore, different pages. If all the sites linking to yours use a variety of addresses then Google will have
several, low ranked pages in its directory, instead of one high ranked page.
As such when Interactive Solutions completes a client’s website project, the client receives amongst other things an
Optimal Linking Address. This is the address used in the sites design to navigate to the main index page. This is the
same address you should request other websites to use when linking to your site so as to maximise all PageRank votes
for your site.
|
|
|
BE CAREFUL WHAT YOU SAY
|
|
Well this should probably be “Be Careful What THEY Say” because a lesser known factor that Google rates highly in
determining a page’s worth is the link-text. Link-text is the actual text that you click on a site to navigate to
another site. This link-text becomes a property of the site linked to, that is, your site. If a searcher enters in a
word that is contained within link-text associated with your site, this is a huge plus in the Google rank calculations.
Unfortunately a lot of website administrators are hard pressed to provide any kind of link-text other than
www.sitename.com or a simple “here”. However with a little bit of coercion you might be able to get them to use
something a little more beneficial to your cause, such as keywords you associated with your site.
|
|
|
LIFE AFTER LINKS
|
|
So links are the be all and end all you may be thinking? Well although they do feature prominently and indeed play a
substantial role in assigning a page its worth, there are other factors. Whilst a lot of these factors are beyond the
control of a lowly web designer, the ones that are possible to influence are reasonably easy to do.
The text on a webpage is rightfully a key indicator of a pages worth. If the content contains many and frequent
instances of a searched-for keyword, the likelihood of that page being what the searcher is after is very good.
Further, if that page also contains related words (Google is a learning technology) to the search terms, the chance
the site is a worthy result increases. But as stated before, the days when a website administrator could just load a
page with keywords are gone. The algorithms Google uses are far too smart to slip something like that in, and indeed,
the algorithms are very effective in weeding out many kinds of tricks like this, and as a result Google does a great
job of filtering out spam sites and sites with worthless content.
|
|
|
FINALLY TO SEARCH ENGINE OPTIMISATION!
|
|
The part that everyone is interested in! How to beat all the other sites and get to number one!
Well heres the truth. Apart from having good ‘copy’ and sound site architecture, Google classes anything else done
to improve a site’s ranking as ‘unethical’!
As far as can be determined from Google’s information pages, the actions discussed above are about the extent of what
they will allow in trying to increase a pages ranking. Obviously the above only involves developing the site in a format
that is most beneficial to Google’s parsing methods, and is such about all they consider fair.
It is also the opinion of Interactive Solutions that this system is a fair and balanced one. Any stretching of the
rules will just lead back to the bad old days of overly aggressive Search Engine Optimisation companies charging
businesses thousands of dollars to be number one.
With each new update of Google and its algorithms, the rules become more stringent, and more tricks of SEO’s get
filtered out. Although this initially means that any businesses unlucky enough to have employed these tactics on
their website will have their PR set to 0, or even be kicked from the Google directory completely, after Google once
again updates itself they will be allowed back in, but only if they now conform to the new regulations.
To this end Interactive Solutions has taken the stance that seeing as we are using Google’s technology for free,
and that contravening their rules will resort to unworthy sites being ranked higher than they should be, we should
be abiding by whatever rules they deem necessary to keep the directory fair.
If we always adhere to the guidelines Google has set forth, then at no time should any of our client’s sites be
dropped or removed completely from the rankings. Obviously our clients do not want that, and neither do we.
Interactive Solutions will never engage in any activities to increase ranking other than those covered above.
Optimising copy and designing a sound site architecture is all that we will provide for clients.
|
|
|
Sources:
|
|
http://www.google.com/webmasters/seo.html
http://www.webworkshop.net/pagerank.html
http://www.webworkshop.net/seo_and_google.html
|
| |
|