After reading my first blog article about writing own site statistic - English users may already create main part of their statistic module. But if you have site's content in Russian, Ukrainian on any other Cyrillic language - it's not enough.
If you need to track Cyrillic languages you find that main Russian search engine - Yandex will have many surprises. Let's cover that in details. Each search engine that uses no Cyrillic (correct if I'm wrong) encodes each symbol in query using Quoted printable symbols
To use that in ASP.NET you need to use:
word = HttpUtility.UrlDecode(word);
put that in normally-working statistic system that deals only with english - you will have almost its localisation for russian.Almost.
Check out following urls
Query for google and yandex. They are quite different, aren't they?
The problem is - they use different encoding for query parameters. Google and Yahoo uses UTF - which is right decision for any search engine, Yandex uses win-1251 encoding (like all other Russian search engines).
To handle that in our statistic system (which services are upcoming by the way - currently it's used only for our projects) I added to table with each search engine, encoding which is used by the system.
It has integer type, as each encoding has its own internationally known code:
- utf - 65001
- win-1251 - 1251
- dos - 866
So for each search engine we use that code in following way:
using System.Text;
…
word = HttpUtility.UrlDecode(word, Encoding.GetEncoding(encodingID));
where encodingID is code of encoding, used by proper search engine.
After I did that - I thought everything is set. But Yandex was so kind that made a surprise for us.
Everyone wants to be on SERP's first page - but, unfortunately, sometimes it's too hard. Anyway, there are users that haven't found anything interesting on the first page - so they go to next (All For Promotion was found by altavista's query on the 59's page)
When you do that in yandex - you will see that query changed its form. Text parameter is gone. Actually it became substring of qs parameter. More over, for qs's encoding KOI8-R encoding is used (20866 code). To handle that are code transforms to following:
word = GetQueryParam(HttpUtility.UrlDecode( Request.UrlReferrer.AbsoluteUri, Encoding.GetEncoding(encodingID)),wordParam, false);
...
if(domain.IndexOf("yandex") >= 0 && GetQueryParam(Request.UrlReferrer.AbsoluteUri, wordParam) == null)
{
word =GetQueryParam(HttpUtility.UrlDecode(Request.UrlReferrer.AbsoluteUri, Encoding.GetEncoding(20866)), wordParam, false);
word = HttpUtility.UrlDecode(word, Encoding.GetEncoding(20866)) ;
}
And the GetQueryParam must be changed for that:
public string GetQueryParam(string url, string param)
{
return GetQueryParam(url, param, true);
}
public static string GetQueryParam(string url, string param, bool strong)
{
int add=2;
int pos;
pos=url.IndexOf('?'+param+'=');
if(pos<0)
pos=url.IndexOf('&'+param+'=');
if(pos<0 && strong)
return null;
if(pos<0)
{
pos=url.IndexOf(param+'=');
add=1;
}
if(pos<0)
return null;
int end;
end = url.IndexOf('&',pos+1);
string SearchWord ="";
if(end < 0)
SearchWord = url.Substring(pos + param.Length + add, url.Length - pos - param.Length - add);
else SearchWord = url.Substring(pos + param.Length + add, end - pos - add - param.Length);
return SearchWord;
}
Other Russian search engines don't do such a tricks - so there's no problem with that.
By, this moment you already got module for you future statistic system.
All you need - is database of all search engines, encodings they use - and write up everything to database.
If you have Microsoft SQL 2000 server - you may use our quick online database manager.
If you have any questions / suggestions – feel free to send me an email to ravemans@yandex.ru
Anton Raichuk aka RAVEman
Chief System Architect
Efex Consulting