HomePeopleRAVEmanBlogsSite Statistic. Yandex quest.
Welcome, Guest Log inRegistration
RAVEman Latest Blogs
 
Site Statistic. Yandex quest. 30, November, 2006   Posted By RAVEman Series of articles about writing own Site statistic. In this blog i will cover problems i had with Cyrillic language, and cover handling Yandex urls.... Comments (9)
Site Statistic implementing. Parsing Referrers. 28, November, 2006   Posted By RAVEman Analyzing queries from site's referrers links is very important for any promotion guy. I explain everything in high details.... Comments (6)
 
Site Statistic. Yandex quest.
 
Author: RAVEman Date Posted: 30, November, 2006 3:23 AM Viewed: 57 times

After reading my first blog article about writing own site statistic - English users may already create main part of their statistic module. But if you have site's content in Russian, Ukrainian on any other Cyrillic language - it's not enough.

If you need to track Cyrillic languages you find that main Russian search engine - Yandex will have many surprises. Let's cover that in details. Each search engine that uses no Cyrillic (correct if I'm wrong) encodes each symbol in query using Quoted printable symbols

To use that in ASP.NET you need to use: word = HttpUtility.UrlDecode(word); put that in normally-working statistic system that deals only with english - you will have almost its localisation for russian.Almost.

Check out following urls Query for google and yandex. They are quite different, aren't they?

The problem is - they use different encoding for query parameters. Google and Yahoo uses UTF - which is right decision for any search engine, Yandex uses win-1251 encoding (like all other Russian search engines).

To handle that in our statistic system (which services are upcoming by the way - currently it's used only for our projects) I added to table with each search engine, encoding which is used by the system.
It has integer type, as each encoding has its own internationally known code:

  • utf - 65001
  • win-1251 - 1251
  • dos - 866

So for each search engine we use that code in following way:


using System.Text;	
…
word = HttpUtility.UrlDecode(word, Encoding.GetEncoding(encodingID));

where encodingID is code of encoding, used by proper search engine.

After I did that - I thought everything is set. But Yandex was so kind that made a surprise for us.

Everyone wants to be on SERP's first page - but, unfortunately, sometimes it's too hard. Anyway, there are users that haven't found anything interesting on the first page - so they go to next (All For Promotion was found by altavista's query on the 59's page)

When you do that in yandex - you will see that query changed its form. Text parameter is gone. Actually it became substring of qs parameter. More over, for qs's encoding KOI8-R encoding is used (20866 code). To handle that are code transforms to following:


word = GetQueryParam(HttpUtility.UrlDecode(  Request.UrlReferrer.AbsoluteUri, Encoding.GetEncoding(encodingID)),wordParam, false);
...
if(domain.IndexOf("yandex") >= 0 && GetQueryParam(Request.UrlReferrer.AbsoluteUri, wordParam) == null)
      {
          word =GetQueryParam(HttpUtility.UrlDecode(Request.UrlReferrer.AbsoluteUri, Encoding.GetEncoding(20866)), wordParam, false);
          word = HttpUtility.UrlDecode(word, Encoding.GetEncoding(20866)) ;
}

And the GetQueryParam must be changed for that:

public string GetQueryParam(string url, string param)
{
    return GetQueryParam(url, param, true);
}

public static string GetQueryParam(string url, string param, bool strong)
{
	int add=2;
	int pos;
	pos=url.IndexOf('?'+param+'=');

	if(pos<0)
		pos=url.IndexOf('&'+param+'=');

	if(pos<0 && strong)
		return null;
	if(pos<0)
	{
		pos=url.IndexOf(param+'=');
		add=1;
	}
	if(pos<0)
		return null;
	int end;
	end = url.IndexOf('&',pos+1);

	string SearchWord ="";

	if(end < 0)
		SearchWord = url.Substring(pos + param.Length + add, url.Length - pos - param.Length - add);
	else SearchWord = url.Substring(pos + param.Length + add, end - pos - add - param.Length);

	return SearchWord;
}

Other Russian search engines don't do such a tricks - so there's no problem with that. By, this moment you already got module for you future statistic system. All you need - is database of all search engines, encodings they use - and write up everything to database. If you have Microsoft SQL 2000 server - you may use our quick online database manager.

If you have any questions / suggestions – feel free to send me an email to ravemans@yandex.ru

Anton Raichuk aka RAVEman
Chief System Architect
Efex Consulting



Back to Blogs Listing
 


Blog Comments
 
Received from Guest at 8/20/2008 3:08:57 PM ideate unpleasingness moph hyetology poduran tenthredo trigonoid starring Wallace, Stephen
http://www.cs.ucl.ac.uk/intelligent_systems/
Giuseppe Maria Cambini
http://www.webguideriyadh.com/
Mountain Lake Park Local News: Topix
http://www.centrebouddhisteparis.org/En_Anglais/Sangharakshita_en_anglais/Nietzsche_and_Superman/nietzsche_and_superman.html
Nazworthy Acres
http://www.angelfire.com/nc3/ENGINE7/
Received from Guest at 8/19/2008 1:23:55 AM disbutton phaenanthery merdivorous acheulean grumose fretworked rebewail valleylet Diurnal Enuresis Support Website
http://www.weather.com/weather/local/USPA1612
Cornelious, Eve
http://www.dioceseofmandeville.com/
Lucky Charm Rabbitry
http://www.geocities.com/h_kellie/index.html
Cantium Brass
http://users.rcn.com/greenbrg/gilmore/
Received from Guest at 2/9/2008 1:41:06 AM http://airline-ticket-9-868.kanikul.net http://airline-45-406.kanikul.net http://airline-2-475.kanikul.net http://airline-18-136.kanikul.net http://airline-21-733.kanikul.net http://airline-7-246.kanikul.net http://airline-8-862.kanikul.net http://airline-43-334.kanikul.net http://airline-ticket-1-74.kanikul.net http://airline-36-690.kanikul.net http://airline-12-507.kanikul.net http://airline-40-209.kanikul.net http://airline-25-176.kanikul.net http://airline-42-574.kanikul.net http://airline-35-586.kanikul.net http://airline-ticket-4-786.kanikul.net http://airline-44-77.kanikul.net
http://airline-16-132.kanikul.net
Received from Guest at 2/9/2008 1:41:02 AM http://airline-11-5.kanikul.net http://airline-ticket-2-363.kanikul.net http://airline-41-415.kanikul.net http://airline-2-475.kanikul.net http://airline-ticket-9-868.kanikul.net http://airline-ticket-1-74.kanikul.net http://airline-40-209.kanikul.net http://airline-121.kanikul.net http://airline-14-402.kanikul.net http://airline-ticket-5-857.kanikul.net http://airline-3-798.kanikul.net http://airline-32-841.kanikul.net http://airline-25-176.kanikul.net http://airline-27-843.kanikul.net http://airline-13-168.kanikul.net
http://airline-29-601.kanikul.net
Received from Guest at 4/27/2007 7:04:46 PM
Received from Guest at 4/27/2007 7:04:41 PM
Received from Guest at 4/27/2007 7:04:36 PM
Received from Guest at 4/27/2007 7:04:33 PM
Received from Guest at 4/27/2007 7:04:28 PM
 


Add new Comment
 
Author: Guest Date Posted: 8/20/2008 11:28:35 PM