Deduplication by content
improved
We used to dedupe only by intent. This would find the same intents expressed with different words or at different places on the site. However, it doesn't capture all duplication in search, browse or other category pages. You can also have different intents which have the same content.
To see why, imagine you have 10 red Volkswagen Golf 7 GTI 2018 listings and no other 2018 golfs. You could have pages about
golf 7 2018 for sale
,
buy vw gti 2018
,
used red golf 2018
or
golf gti 2018
and despite some small difference in H1, Title and URL all of the non-template content on the page would be identical. Google will have to put in some work to work out which page to rank.
Now, we identify all pages with the same listings, work out which page should be the canonical and pass that over to you for redirection just like we do pages with the same intent.
For instance:
  • king jumpsuits
    in clothing in the Netherlands
image