Wikivoyage/Lounge/Archive/2013-10

Latest comment: 11 years ago by Andyrom75 in topic Google PageRank issues

Google PageRank issues

Tracked in Phabricator:
Bug 52688

Three of the Wikivoyage projects - en, ro, and pl - have a Google PageRank of 0, which means they will almost never show up in Google search results. This is probably because Google considers the sites to be mirrors of Wikitravel. Anyone have any idea for how to fix this problem? Kaldari (talk) 18:53, 23 August 2013 (UTC)

There has been significant discussion of this issue on voy:Wikivoyage:Search Expedition and the corresponding talk page. voy:User:JamesA indicated that having access to the Google Webmaster Tools would help significantly in tracking down the problem (voy:Wikivoyage talk:Search Expedition#Google Webmaster Tools) but I think James had indicated that the request to have access to those tools was denied by WMF - hopefully someone else can provide further details. -- Ryan • (talk) • 19:52, 23 August 2013 (UTC)
If I remember correctly Sumana from tech at the WMF was going to look into this. I will ping her. Doc James (talk · contribs · email) 23:48, 23 August 2013 (UTC)
Don't believe all the yarns that Google spins about "no-follow".
A significant self-help move we can make tomorrow is to stop signalling to their robots that we are only an inferior derivative work every time they spider us. We should immediately junk the two free hyperlinks of relevant anchor text we give those nice, friendly folk at InternetBrands on the vast majority of our article pages (and on their associated discussion pages). We should immediately replace this kind of hyperlinked text
"This article is derived from the article Dresden on wikitravel.org/en in its revision as of 08:30, 6 July 2012 (UTC)
with this minimally compliant but legal attribution:
"This article is derived from the Dresden article on wikitravel.org/en in its revision as of 08:30, 6 July 2012 (UTC).
There was (and is) absolutely nothing in Wikitravel's licensing regime that mandated hyperlinking when giving attribution! --W. Franke-mailtalk 20:08, 24 August 2013 (UTC)
And since we have moved over the entire history of edits attribution is given that way as well. We probably do not need to mention WT at all. Doc James (talk · contribs · email) 23:55, 24 August 2013 (UTC)
I tend to agree, but I thought Legal was involved in the decision to put a notice at the bottom of each page. LtPowers (talk) 17:53, 25 August 2013 (UTC)
If that is indeed the case, then "Legal" need to URGENTLY re-visit their decision, paying particular attention to the significance (or not) of including actual working anchor text to produce all these SEO significant hyperlinks to our erstwhile antagonist. --W. Franke-mailtalk 18:55, 25 August 2013 (UTC)
If you download a PDF version ("Print/export": "Download as PDF" in the menu to the left), then Wikitravel isn't mentioned at all. Is this an error? If a clickable link is required, then I'm a bit troubled about voy:Wikivoyage:Offline reader Expedition as you can't include a clickable link in a paper copy of Wikivoyage. --Stefan2 (talk) 19:50, 25 August 2013 (UTC)
I also agree that losing the links would be an excellent starting point, though we will need some attribution text on the PDFs. --Nick talk 14:01, 26 August 2013 (UTC)
PageRank is not about a website, it is about a particular page. What page have a PageRank of 0? I just compared Tokyo/Roppongi: 2 for WV, and 4 for WT even though it is largely out-of-date and somewhat spammy. I agree the hyperlinks should be removed. Nicolas1981 (talk) 03:05, 25 August 2013 (UTC)
I have noticed there is a large delay in ranking. For example "new" WV page Jihlava (created in April) has no rank yet. The same for Nuclear tourism (April) and Lower Saxon Wadden Sea National Park (June). Good idea with removing the WT hyperlinks. --Danapit (talk) 09:33, 25 August 2013 (UTC)
For the records, this problem is also covered in bugzilla:52688. --AKlapper (WMF) (talk) 13:19, 9 September 2013 (UTC)

BUMP There are some technical tricks that our webmasters can do to mitigate our Google duplicate penalty (and I'm certainly not going to even outline them here in plain text - if anyone is interested they can phone, and those are, presumably the subject of the rather woolly bugzilla report) but there are also two things that we can do to help ourselves IMMEDIATELY and I have seen no cogent opposition to:

1) Stop signalling to search engine robots that we are only an inferior derivative work every time they spider us. We should immediately junk the two free hyperlinks of relevant anchor text we give those nice, friendly folk at InternetBrands on the vast majority of our article pages (and on their associated discussion pages). We should IMMEDIATELY replace this kind of hyperlinked text

"This article is derived from the article Dresden on wikitravel.org/en in its revision as of 08:30, 6 July 2012 (UTC)

with this minimally compliant but legal attribution:

"This article is derived from the Dresden article on wikitravel.org/en in its revision as of 08:30, 6 July 2012 (UTC).

2) Delete the literally thousands of "outline" articles that consist of only an opening "lede paragraph" in the form "x is a city/region in y" and a skeleton of (empty) standard sections. (Oh, I forgot, now thanks to the bot brigade these mostly now have a fine new banner and a link to a relevant Wikipedia article). We are not a Gazetteer and getting rid of roughly 20% of our articles and starting them again (as and when there is someone to take an interest in developing them properly), without duplicate text and the WV attribution to completely non-useful content, would go a long way towards signalling that we are a different (rather than a derivative) site.

I'd do this myself tomorrow but after the better part of a decade of editing at Wikitravel and its' better successor, I don't even have autopatroller status. --W. Franke-mailtalk 18:13, 9 September 2013 (UTC)

Your English WT editing history under this username dates to 2007, in which year you made a grand total of 51 edits. You then apparently disappeared until this time last year when the move to WV began. That's hardly "the better part of a decade". LtPowers (talk) 20:40, 9 September 2013 (UTC)
Even if it were true that I always logged on and never used an IP address to edit (which it isn't - almost all of the various Glasgow IP edits have been mine over the years, I'd bet, and I've made more than 1000 edits in the last 30 days and more than 1000 in the 30 days before that - none of them using a bot or tool like AWB), how precisely is this relevant to improving our search engine ranking and, consequently, our readership? Try and keep your club's personal vendetta within reasonable bounds. State clearly whether the two things I propose (and others have agreed with) is a "good idea" or a "bad idea" and why, please. --W. Franke-mailtalk 23:34, 9 September 2013 (UTC)
In it:voy we have deleted long time ago all the skeleton article (the magnitudo is ~1.000), now (apart few hidden exceptions that may still exists) we have only articles with a minimum piece of information, and it's not allowed to create skeleton articles.
As written in the last summit, regarding the very small articles (but not empty), I'm rewriting them from scratch (and adding where I can more information) in order to recreate those article but putting the credit into the subject of the change/creation instead of the "clickable footnote". For the longer one, I've open a bug to turn the link into a plain text.
Now regarding the ranking I have the opposite problem, in some cases google is indexing also the page that I haven't created yet, but just mentioning (with a red link) in existing article. I've noticed that this cases are the ones where an article with that name exist in en:w but not in it:w. It could be good to develope those articles but I can't do all at once... :-( --Andyrom75 (talk) 09:33, 10 September 2013 (UTC)
I very much agree with both Frank's points. To the topic of the skeletal articles we've had a discussion, which did not lead to any actions yet. I am not sure if we've reached any consensus there... --Danapit (talk) 15:45, 10 September 2013 (UTC)
Congratulations to the Italian Wikivoyage in achieving good organic search engine results and, presumably, consequently getting lots of eyeballs to view the efforts of your editors.
It seems you are proving how bad this duplicate penalty can be by not suffering from it because you have been pro-active and not dragged your feet on simple SEO housekeeping!
Now I know I'm going to get another brickbat in my direction, but I'm very disappointed that the only response to my proposal from LtPowers seems to be to try and shoot or discredit the messenger. Why the "movers and shakers" at the English Wikivoyage seem not to understand that the number one strategic task for the English Wikivoyage is to increase the readership continues to baffle me. We've done some great work - assisted by our German Wikivoyage colleagues - with introducing dynamic maps and folks are furiously sharpening up the prose of our articles and keeping them up to date, but all this is somewhat in vain if nobody can find us in the search engines. So, LtPowers and other English language Wikivoyagers, please state clearly whether the two things I propose (and others have agreed with) is a "good idea" or a "bad idea" and why, please. --W. Franke-mailtalk 00:25, 11 September 2013 (UTC)
Quite honestly you do not really need the "movers and shakers" to drive this change forward, and "they" don't necessarily have the power either (there are legal and technical issues here). Furthermore, there have been various fierce disagreements over dynamic maps, spelling, Wikipedia links, Wikidata and external links (yay washing dirty laundry!), so why group them as one homogenous bunch, and accept that everyone has their own viewpoints which may or may not concur with each other. Which does make for a lot of frustrating inertia but cannot be put at the feet of a singular "club". This also points to a larger problem for the entire Wikivoyage community, as it is hard enough making decisions in-house at the English Wikivoyage, let alone across 15 different languages, and it would do well for the Thematic Organisation to kind of guide the way.
I think the above discussion shows that everyone is agreeable to removing the link in the credits, but are understandably reluctant due to the legal aspects. It seems the primary way to move forward would be to directly contact legal, which has been done so in bugzilla:52688#c4, so @Philippe (WMF): has to weigh in on the issue (who/where exactly?). Secondly, the code for mw:extension:CreditsSource must be changed, and Andyrom75 has kindly filed bugzilla:53942. The necessary fixes will then have to be merged in.
As to the skeleton articles, I thought there was already a standing agreement to delete them on sight. Tag them as speedy delete? -- torty3 (talk) 02:08, 11 September 2013 (UTC)
It's not all doom and gloom and ferocious duplicate penalties by the search engines. Search for a string like "travel guide Nelson, England" and our new (WT never had one) English Wikivoyage article on that destination will probably pop up in number one position in the organic search results. (That new article doesn't tell the robots at the footer that it's a derivative work, of course).
As I wrote earlier, unless the "legal team" made a secret agreement to scatter hyperlinks to WT around our guides, I simply do not believe that there are any reasons in law why we cannot remove those hyperlinks today and still preserve legal attribution! --W. Franke-mailtalk 10:15, 11 September 2013 (UTC)
I repeat that there is a technical step involved - so it is not as easy as it appears and cannot be done immediately/today. Someone has to go and rewrite the CreditsSource.php, submit it, get it approved and finally merged into a Mediawiki update, say a week at minimum. And this extension affects all Wikivoyages or at least those with import history, hence a general consensus would be needed from en, de, it, ru, pt, es and others. With that much happy red tape, I hope you understand that any coder would probably prefer to have a straight answer from legal before doing anything, especially with trigger happy notices like [1]. A direct answer would end all doubt, and a positive one would surely lead to community support.
"travemunde travel guide" and "mitzpe ramon travel guide" search terms look to be doing well, both existing pre-import, though the aim would be for "travemunde" and "mitzpe ramon" themselves to rank. -- torty3 (talk) 11:26, 11 September 2013 (UTC)
Thank you for taking the time, torty3, to explain what is causing the delay - it's much appreciated!
I know that you are well respected in the en.wv community and an ace coder. I also think you know how important it is that readers should actually be able to find us in Google. Can we nominate you as a plenipotentiary extraordinaire to liaise with other language versions and actually get these hyperlinks removed?
I think you know that 95% of searchers don't go past the first 3 results. I think you also know that the results for "travemunde travel guide" and "mitzpe ramon travel guide" search terms will vary according to your IP, time of day, google domain and server that you are using, search history and other variables. That said, although the WT articles always appeared higher up the search result page by more than 4 places when I tried some tests, they did do much better than most WV "legacy attributed and hyperlinked articles" and I think I know the reason. In both cases, they are currently wearing Star nomination templates which changes the lede substantially as far as the Google spider is concerned. This gives the clue to another thing we should think about doing: try and drop the formulaic "x is a city in y" introductions in most ledes which almost always duplicates the WT intro. --W. Franke-mailtalk 12:05, 11 September 2013 (UTC)
No thank you, my plate is pretty full. I just wanted to point out what I see is the quickest way to expedite the process, and that it is quite a bit work for high risk and reward, which also involves corralling someone (meaning not me) to fix the code in bugzilla:53942. I actually think the better search results come from the intensive amount of work that was put into the articles to differentiate them, rather than the lead paragraph alone, though rewriting the intros would be a good start. Shouldn't the star nom template be placed at the bottom then? -- torty3 (talk) 07:47, 12 September 2013 (UTC)
Although we must have an official legal answer, according to our Terms of Use, we (correctly and fairly) need to give credit to the source of the inserted information, but do not say explicitely how to do. So, my personal interpretation, is that a permanent link on the subject of the change it's ok, also because WT it's not the only free information source, but just for example the most famous one is wikipedia, or another are the "foreign" versions of wikivoyage. When I translate an en:voy article I credit en:voy for that change (sometimes I've missed it... sorry :-P). It would be ridiculous (IMHO) to add the full list on the footnote of the article of all the free source site through CreditsSource. The history page would exist for a reason, ot not? :-) Here some example: voy:it:Isole Fær Øer for WT+it:w and voy:it:Ghana for it:w+en:voy. --Andyrom75 (talk) 12:51, 12 September 2013 (UTC)
I've just seen another interesting way to import and credit the article from WT. See the history of voy:fr:Praha. --Andyrom75 (talk) 14:44, 12 September 2013 (UTC)
Like everyone above, I'd be very keen to see the WT attribution changed if the suggested version is permissible. Plainly, we want to distance ourselves from that site as much as possible.
The prospect of deleting large numbers of articles ( even if they are stubs) leaves me somewhat conflicted. I would much rather we encouraged people to improve these articles, rather than jump to deletion as a first option. However, if it can be proved that removing these stubs has a substantial impact on our search rankings, the benefits would probably outweigh the negatives.
As Frank suggested, I think moving away from the WT "X is city in Y" style of introductions would be a good place to start diversifying our articles. Not only does it separate us from other repositories but I think it also adds more room for the written 'style' and 'flair' that is so important to Wikivoyage. --Nick talk 02:02, 13 September 2013 (UTC)
The exact ranking algorithm use by google is unkown (at least to me :-P), but for sure I can say two things: 1) an empty page can't be ranked, and even if ranked it would go on the lasts result pages not the firsts 2) a huge amount of empty pages have a negative impact on the user perception, would you buy a 1.000 pages book where hundres of pages are blank?
There's an impact on ranking? I don't know it for sure. But several times in the past google has penalized site categories that have tried to leverage with "flaw" of their algorithm. For example: those sites that has a huge hidden text (same foreground color of the background, or with microscopic dimension of the font) that was put there just for increase the "hit rate". Or the web-farm (consisting in thousands...maybe more... web site) that against money they will include huge amount of links to a target site in order to escalate the ranking. Sometimes they also penalize manually some spam sites (but I don't think it's the case).
For sure it's better to add information on the existing stub, but I tend to be realistic and no one in the short term will do it, so we've deleted them all. If we want one article back we would spend less than 1 minute to recreate it. So in my opinion the first facts are more than enough, but I may understand that for others would need other element before taking a decision. --Andyrom75 (talk) 13:36, 13 September 2013 (UTC)

Polite reminder to everyone: the legal team can't possibly follow every conversation going on on-wiki, though we try :) Please email legal@wikimedia.org if you want to flag our attention. The #1 reason legal hasn't weighed in here is because we didn't know about the discussion until today. On bugzilla, I personally follow bugzilla fairly religiously, but not everyone does, so unless you cc me directly bugzilla isn't a reliable way to get in touch with legal/LCA. -LVilla (WMF) (talk) 23:14, 17 September 2013 (UTC)

LVilla, thank you for your feedback and great you were able to find bugzilla:53942 anyway. Looking forward to hear the legal opinion on this issue. --Danapit (talk) 05:55, 19 September 2013 (UTC)
LVilla, thanks a lot for jumping in this thread and sorry for not involving you earlier. As highlighted by Danapit, we need a legal bless for the changed requested on bugzilla:53942, so please take a look on the consideration stated in the lines above; they may help to see our interpretation of the ToU.
Generally speaking, I would share my thought on our current google ranking. WT is an older site, so it's normal that has a better ranking. I've taken a quick look on the source sites from where the visitor comes from (HTTP_REFERER). Our primary sources are linked to the big WMF-familiy network, while in WT people are coming almost from everywhere. This is also because during all these years a lot of people has written many journalistic article and/or forum post about WT.
An idea to improve our visibility is to work outside WMF not just inside where we are already the #1 site. I suggest a "judo-approach" :-D that consist on using the strength of your opponent. Search through google all the main forum (or article where we can reply) where they discuss and link WT. Add an honest post where we state the difference between the two site and, most important, add a link! As an example, look at this one. This kind of activity should be done as an "interlingual expedition". In the long term this would be highly beneficial also to the ranking improvement. --Andyrom75 (talk) 07:32, 20 September 2013 (UTC)

Hi, everyone. Under certain rules that apply to lawyers, we are ethically obligated to represent only the Foundation. We cannot unfortunately give legal advice to the community at large, particularly in cases like this one that may be complex and where we may not know all that facts. With that in mind, we'd urge you to carefully consider any changes you make, on this issue or any other - you should avoid placing yourself in situations beyond your individual tolerance for legal risk. I have written about the responsibility of contributors to the Wikimedia projects elsewhere, which you may wish to review.

As a more general matter, I understand that some are finding the situation a bit vexing. That said, I would urge the WV community to focus their efforts on improving the content of WV rather than spending time worrying about another site and who links to it. Other techniques suggested in this thread (such as killing stub articles that are identical) as well as elsewhere (such as ensuring that appropriate Wikipedia articles link to WV) seem to us to be constructive in the short run and more likely to succeed with Google's algorithms in the long run. I apologize that I cannot be of more help here. Geoffbrigham (talk) 16:31, 28 September 2013 (UTC)

Geoff just to avoid misunderstanding, this means that we won't have a legal answer on the subject of this bugzilla request? --Andyrom75 (talk) 17:34, 30 September 2013 (UTC)
Correct, under legal ethics rules, I cannot give legal advice except to WMF. But, to be honest, from my personal point of view, I also feel that further focus on these sorts of details, rather than the overall quality of the product we are providing to readers, and the legitimate accessibility of that information, is not that productive. Geoffbrigham (talk) 16:49, 6 October 2013 (UTC)
The Great Wall of China is made by a huge amount of tiny bricks (in comparison of its dimension), so you shouldn't judge the single brick... Do you know any lawyer that has some spare time to help us? PS Because of my job I know how a lawyer is unwilling to take a legal position ;-) --Andyrom75 (talk) 07:47, 8 October 2013 (UTC)