What Google actually penalized in August 2025

The August 2025 spam update targeted what Google calls "scaled content abuse": pages mass-generated with little differentiation between them. The update wasn't subtle. According to an analysis by ALM Corp covering 150+ sites, 87% of sites running mass-produced AI content saw negative ranking impacts.

The pattern Google went after is specific: near-duplicate templates where the only difference between pages is a swapped variable (a competitor name, a city, a product category). When two pages share more than 70% of their content, Google treats them as the same page. At scale, that's a spam signal.

The threshold to watch: pages with less than 30% unique content relative to sibling pages are at high risk of deindexation. This isn't speculation. It's the pattern visible across sites that got hit.

Let's be honest: programmatic SEO is in Google's crosshairs. The update specifically mentioned "scaled content" in its documentation. But that doesn't mean all programmatic pages are penalized. It means lazy programmatic pages are penalized.

The difference between spam PSEO and quality PSEO

The distinction isn't "programmatic vs. manual." It's "real data vs. invented content."

Spam PSEO looks like this: one template, variables swapped for each page. "Best CRM for [industry]" where the industry name is the only thing that changes. Content is 95% identical across pages. No factual data unique to each page. Entirely AI-generated without review.

Quality PSEO looks different: real pricing data scraped from competitor websites. Actual feature lists verified against source pages. Specific use cases backed by factual differences. Human review before anything goes live. Structured data that tells Google exactly what each page contains.

This isn't theoretical. Look at Zapier: 6.3M visits/month across 70K+ programmatic pages, all still healthy post-update. Or ClickUp, whose comparison pages have ranked consistently for years. These pages survive because every single one contains unique, verifiable data.

The core principle: if you can't point to a factual difference between two pages beyond the competitor name in the heading, Google can't either. And Google has been very clear about what happens next.

5 technical signals that protect programmatic pages

Here's what separates comparison pages that rank from pages that get flagged. These aren't opinions. They're the patterns we've observed across thousands of programmatic pages that survived recent updates.

  1. Real scraped data, not generated content

    The data on each page comes from the competitor's actual website: real pricing, real feature descriptions, real product details. Every fact is traceable to its source. This is fundamentally different from prompting an LLM with "write a comparison between X and Y" and publishing whatever comes back.

    When your pricing table says "$49/mo for the Pro plan," that number came from a scrape, not a hallucination. Google's systems can verify factual claims against the web. Real data passes that check. Made-up data doesn't.

  2. Human review before publication

    No page goes live without a human looking at it. Low-confidence data gets flagged for manual review. The user corrects, enriches, and approves before deployment.

    This is the anti-hallucination guardrail and the anti-spam guardrail. A human in the loop means every page has been intentionally published, not blindly generated. Google's quality raters specifically look for evidence of editorial oversight.

  3. Structural differentiation between pages

    Each comparison page has genuinely different data: different pricing tiers, different feature sets, different use cases, different strengths and weaknesses. The differentiation lives in the data, not in rewording the same paragraph five different ways.

    When you compare Notion vs. Coda and Notion vs. Confluence, the pages should look different because the competitors are different. If your tool only swaps the name in the H1, you have a template problem, not a content strategy.

  4. JSON-LD structured data

    Every page carries proper schema markup: FAQPage, Product, BreadcrumbList. This tells Google exactly what the page is, what data it contains, and how it relates to other pages on your site.

    Structured data doesn't just help with rich snippets. It signals structural quality. Spam pages almost never have proper schema because adding it requires understanding what the page actually contains. It's a quality signal by proxy.

  5. GSC monitoring as a quality feedback loop

    If a page drops in ranking, you know immediately and can fix it. Monitoring isn't a nice-to-have feature. It's a quality mechanism. It means you treat every page as a living asset, not a fire-and-forget template.

    Spam sites don't monitor individual page performance because they don't care about individual page quality. Quality-first sites do. Google's systems reward sites that maintain and improve their content over time.

Selvio deploys comparison pages with built-in quality checks: real scraped data, human review workflow, JSON-LD schema, and GSC monitoring on every page.

Start free →

A real example: surviving a Google update

Theory is useful. Results are better. Here are two real projects running programmatic comparison pages at scale, both of which survived every 2025 update.

BrandSearch: 10,000+ programmatic pages deployed, generating 250,000 impressions/month. Every page is built on scraped competitor data, carries JSON-LD schema, and follows a logical internal linking structure. Traffic held steady through both the March and August 2025 updates.

DiscordGate: 2,000 pages driving 25,000 visitors/month and ranking for over 50,000 keywords. Same playbook: real data, proper schema, clean internal linking.

What do these projects have in common? None of their pages are templates with swapped variables. Every page has unique factual content derived from actual sources. They invested in data quality upfront, and it paid off when Google tightened its filters.

The pattern is clear: programmatic pages built on real data and proper infrastructure don't just survive Google updates. They benefit from them, because the competition gets wiped out.

How to audit your own programmatic pages

Whether you use Selvio or not, here's a checklist you can run against your existing programmatic pages right now. If you can't check every box, your pages are at risk.

If you're missing two or more of these, you're in the risk zone. The good news: these are all fixable. The bad news: every day you wait, Google's classifiers are getting better at finding thin programmatic content.

Programmatic SEO isn't dead. Spam PSEO is.

The difference comes down to data quality and monitoring infrastructure. Real competitor data, human review, structured schema, and active GSC monitoring. That's what separates pages that rank from pages that get deindexed.

Deploy your first 3 pages free →

No credit card required. Pages live on your domain.