Opened 10 years ago
Closed 10 years ago
#334 closed defect (fixed)
Washington Post scraper shouldn't include " - washingtonpost.com" in title
| Reported by: | dstillman | Owned by: | simon |
|---|---|---|---|
| Priority: | minor | Milestone: | |
| Component: | ingester | Version: | |
| Keywords: | Cc: |
Description
Change History (1)
comment:1 Changed 10 years ago by simon
- Resolution set to fixed
- Status changed from new to closed
Note: See
TracTickets for help on using
tickets.
(In [734]) closes #334, Washington Post scraper shouldn't include " - washingtonpost.com" in title
closes #313, Blacklist known ad sites from scraper detection
closes #306, some New York Times ads prevent page from being recognized
closes #308, attachment import bug
currently, the ad site blacklist is located at the top of ingester/browser.js. at some point, we may want to switch this to a database table.