Opened 10 years ago

Closed 10 years ago

#334 closed defect (fixed)

Washington Post scraper shouldn't include " - washingtonpost.com" in title

Reported by: dstillman Owned by: simon
Priority: minor Milestone:
Component: ingester Version:
Keywords: Cc:

Description


Change History (1)

comment:1 Changed 10 years ago by simon

  • Resolution set to fixed
  • Status changed from new to closed

(In [734]) closes #334, Washington Post scraper shouldn't include " - washingtonpost.com" in title
closes #313, Blacklist known ad sites from scraper detection
closes #306, some New York Times ads prevent page from being recognized
closes #308, attachment import bug

currently, the ad site blacklist is located at the top of ingester/browser.js. at some point, we may want to switch this to a database table.

Note: See TracTickets for help on using tickets.