Opened 10 years ago

Closed 10 years ago

#68 closed defect (fixed)

figure out way to have scrapers work for gated resources behind proxies

Reported by: dcohen Owned by: simon
Priority: critical Milestone: 1.0 Beta 1
Component: ingester Version: 1.0
Keywords: Cc:

Description

the url, e.g., for GMU's gated resources remains mutex.gmu.edu/login so there's no way to know what resource is being used

Change History (2)

comment:1 Changed 10 years ago by simon

(In [273]) Addresses #68, figure out way to have scrapers work for gated resources behind proxies. We can now access pages through an EZProxy. We need to know what alternatives to EZProxy exist in order to support them. Also, fixes some spacing issues in browser.js.

comment:2 Changed 10 years ago by simon

  • Resolution set to fixed
  • Status changed from new to closed

(In [308]) closes #68, figure out way to have scrapers work for gated resources behind proxies. most institutions use EZProxy for their proxy needs (or a more transparent proxy, which we support natively). this implementation is significantly better than the old one, which refused to work after you'd already logged in once, and is also simpler, because it's stateless. it has to observe every HTTP request, but there's no noticeable speed hit. it also still doesn't work when there's a link from one gated site to another gated site, but as far as i can tell, this only happens on the Gale Group site.

Note: See TracTickets for help on using tickets.