Opened 10 years ago
Closed 10 years ago
#313 closed enhancement (fixed)
Blacklist known ad sites from scraper detection
| Reported by: | dstillman | Owned by: | simon |
|---|---|---|---|
| Priority: | minor | Milestone: | |
| Component: | ingester | Version: | |
| Keywords: | Cc: |
Description
I noticed in the debug output that Zotero was running detect code (COinS, etc.) on ad.doubleclick.net and its ilk. I don't know how resource-intensive this is relative to what it would take anyway to run the regexes to blacklist the sites (which would probably want to happen through the same DB so that it was repository-based), but since they often run multiple times per page, it might be wise to do something about it.
Change History (1)
comment:1 Changed 10 years ago by simon
- Resolution set to fixed
- Status changed from new to closed
Note: See
TracTickets for help on using
tickets.
(In [734]) closes #334, Washington Post scraper shouldn't include " - washingtonpost.com" in title
closes #313, Blacklist known ad sites from scraper detection
closes #306, some New York Times ads prevent page from being recognized
closes #308, attachment import bug
currently, the ad site blacklist is located at the top of ingester/browser.js. at some point, we may want to switch this to a database table.