by Alex Yumashev ·
Nov 29 2014
TL:DR; Moving a Blogger.com-hosted blog away from Blogger has caused our site being slapped by Panda because of huge duplicate content "issues" without us even knowing it.
Actually, I think same will happen with any other 3rd-party blog hosting platform like Tumblr or Wordpress.com.
...
Our site has apparently been partially penalized for "duplicate" content even though we did nothing wrong. This is a trap anyone call fall into - a 100% technical issue that you should watch out for.
First, how you detect an algorithmic penalty. A common way would be to look at the analytics graph and spot a traffic drop next to an algorithm update:

But as you know, latest versions of Panda can hit only a part of your website. This is way harder to spot on the pageviews graph, right?
In case only a part of a website has been hit, the easiest way to tell if a page has been penalized would be googling for exact page titles (or unique phrases from the page text). If you're not in the top 10 - you're probably penalized.
That is how I have discovered that some of my blog posts have been slapped: googling for exact titles showed nothing. Even though I had a bunch of hugely popular posts that went viral on HackerNews, Slashdot, Reddit, Techcrunch etc.
Like, take this post for example. Googling for the post title brings up hundreds of pages LINKING TO THIS POST but not the post itself. Weird, right?
Something was clearly wrong. I discovered that nearly 70% of my posts cannot be googled by their title. 30% were doing OK. After digging into this, it turned out, that the "penalized" posts are actually the posts that I have moved from Blogger.com (aka "blogspot" - a blog hosting platform owned by Google and used by hundreds of thousands of blogs).
Now here's what has happened in our particular case, step by step.
1. I had a blog at "blog.jitbit.com", hosted by blogspot (aka "blogger.com"). The blog was doing OK. Like I said, I even had a bunch of very successful posts that went viral.
2. We decided to move the blog from a subdomain to a subdirectory. This is SEO 101. "If you have a blog, have it at domain.com/blog not blog.domain.com". This way your main domain gets all the authority signals and perks that your blog earns for you.
3. Blogger.com does not offer any 301-functionality so first I needed to move "blog.jitbit.com" to my own server - to a self-hosted blogging engine and then set up a redirect from "blog.jitbit.com/*" to "jitbit.com/blog/*" . That is why I exported all the posts from Blogger.com to an XML file and imported it into my server's database.
4. Finally, I edited the DNS and pointed "blog.jitbit.com" to our own server instead of pointing to Blogger.com's "ghs.google.com".
So far so good, right?
5. Blogspot detects the DNS change and spawns a duplicate blog at "blogname.blogspot.com"! I did not see that coming. I would've killed the blogspot-blog if I knew this... This is where the trick is. Bear with me, blogspot-users: as soon as Blogspot detects your DNS change from #4, the 301-redirect from "YourBlogName.blogspot.com" to "blog.domain.com" stops working (this redirect is done by Blogger.com if you use a custom domain). And now you have 2 exact copies of your blog on the Internet: "YourBlogName.blogspot.com" hosted by Blogger and "blog.domain.com" hosted on your own server.
So in my particular case, I now have a blog at "jitbitsoftware.blogspot.com" that is actually a partial copy of my current blog.
"So what?" you might say " who cares about some '.blogspot.com' blog with no incoming links and zero visitors?" Well...
Like I said, Blogger is owned by Google and Google treats blogspot-blogs differently. First of all, Google indexes these blogs instantly, everyone knows that. It does not "crawl" these blogs, it just takes the data from blogger.com databases. That is one of the benefits of running a blog on blogspot by the way. Even if a blog has ZERO incoming links - Google still knows about it and has it in the index (that is why there are so many blogspot-blogs involved in crappy blackhat "crash and burn" schemes). Google even knows the "internal" data like the "Blog-ID", the "Author-ID", the Google-account behind it... It even automatically adds this blog to your webmaster-tools account!.
So if a duplicate blog comes up somewhere else on the internet - Google automatically assumes that the blogspot-one is the original (because it's been in the index longer).
As soon as you have exported the blog and changed the DNS - KILL the blogspot blog right away. Delete it. It was just stupid of me not to do this.
This is hard, involves a lot of handwork, but can be done. In short: you need to set up "rel=canonical" tags for every post at the "blogspot.com" blog, since blogspot does not allow 301 redirects.
<b:include data='blog' name='all-head-content'/> form the HEAD element<b:if cond='data:blog.pageType != "item"'> <link href='https://NEW-ROOT-URL/' rel='canonical'/> <b:else/> <b:if cond='data:blog.pageTitle == "[Blog Name]: [Post Title]"'><link href='https://NEW-POST-URL/' rel='canonical'/></b:if> </b:if>Replace [Blog Name] with your blog name (I suggest you change the name in the blogspot settings to some easy value, like "tmp"). Replace [Post Title] with the post title. And repeat this line for every blog post you have. Yeah, I know, bummer. :(
Clearly, that's an indication that Panda can penalize websites partially. Moreover, the penalty does not really affect the rest of the website traffic. Which is good news overall, but makes it harder to spot the penalty in the first place.
Another takeaway is that Panda is still not a part of the main algorithm. You still have to wait for another Panda update (or "data-refresh") to recover from it. I've had my changes out there for weeks, explicitly submitted them to Google's index via the "Fetch as google bot" tool, and Google already knows where the "original" website is (easily checked by using "site:" and "info:" operators) - but still no recovery.