Opened 9 years ago
Closed 8 years ago
#743 closed enhancement (fixed)
Support non-EZproxy proxies
| Reported by: | dstillman | Owned by: | simon |
|---|---|---|---|
| Priority: | major | Milestone: | 1.0.8 |
| Component: | ingester | Version: | 1.0 |
| Keywords: | Cc: | stakats, mikowitz, erazlogo |
Description
We should offer more generalized proxy support. It seems many of the library proxy systems Zotero doesn't currently work with do use predictable URLs:
The first is some sort of (Juniper?) VPN system. I'm not sure what the second proxy server is.
The first one, at least, seems to be a common system that we should add automatic support for, but we should have some way of supporting unknown proxies with consistent URLs, even if it's as hacky as a hidden/advanced pref that takes a regex (or, if we're going to support more complicated URLs like the VPN one and not just modified domain names, both a regex and a substitution string--that is, the first two parameters of the JS replace() method) that we can provide people on the forums if necessary. And since people might use more than one library, we probably need to support multiple space-delimited sets in the pref.
It may be worth trying to do something quick and inelegant for 1.0. Simon, I could try to work on it if you don't think you'll have time. We could wait until 1.5, but I think this is making Zotero fairly unusable for a pretty large contingent of people.
Change History (19)
comment:1 Changed 9 years ago by dstillman
comment:2 Changed 9 years ago by stakats
- Milestone changed from 1.0 RC 4 to 1.0.2
- Priority changed from major to critical
And in some cases, there is nothing (e.g. no "0-") prepended. For example:
We ought to roll this out ASAP, even if inelegantly. Basically, we need two strings, a prefix and a suffix. Users could manually enter them in preferences dialog. If we don't want to mess with XUL for the moment, they can be hidden prefs.
comment:3 Changed 9 years ago by simon
For 1.0, I think we should just loosen the regexps. I already loosened the regexps on most of the translators to allow for domain suffixes a while ago. It looks like all that's necessary is to allow domain prefixes as well. Some of the newer translators may also need refinements to their detectCode, if it's not already sufficiently specific.
Sean, does that second link not work? Or is it just an example? The current JSTOR regexp should be matching it.
For 1.5, I'd love to have intelligent proxy support, so that the user can flip a switch and requests to journal sites are automatically routed through his/her library proxy when s/he is off campus. This solution would fix this bug and #604. It would also be useful for links to journals from blogs, etc. (a situation I run into relatively frequently), and would provide another incentive to use Zotero.
comment:4 Changed 9 years ago by simon
On second thought, it looks like there are a lot more translators than there were when I made those changes, and going through each individually might be a lot of work, and not particularly worthwhile if we plan on implementing something more sophisticated in 1.5. We could loosen only the major databases, implement the preference, or some combination of the two. What do you think?
comment:5 Changed 9 years ago by stakats
I think loosening the regexps is a fine idea until we can come up with something more sophisticated. What regexp do you propose to handle the prefix most gracefully? Michael can then just zip through scrapers.sql and update the resources most likely to be proxied (e.g. LexisNexis, JSTOR, MUSE, etc.) with better support for prefixes and suffixes.
comment:6 Changed 9 years ago by stakats
Would the following do the trick or is it going to introduce other problems? For example:
change
^https?://serials\.abc-clio\.com[^/]*/active/go/ABC-Clio-Serials_v4
to
^https?://[^/]*serials\.abc-clio\.com[^/]*/active/go/ABC-Clio-Serials_v4
Please advise.
comment:7 Changed 9 years ago by simon
That should be fine. The odds that you'll introduce problems are low to begin with, since very few URLs besides ABC-CLIO will contain the domain "serials.abc-clio.com," although you should make sure that the detectCode actually does something with the DOM to make sure that it can scrape the page. (In this case, it does.)
comment:8 Changed 9 years ago by simon
Also, we should remove the caret from front of the beginning of the regexps in order to match URLs like:
https://www.myuu.nl/http://www.springerlink.com/content/q1j7651n41584r76/
comment:9 Changed 9 years ago by erazlogo
This still doesn't seem to be fixed for 1.0.2. Concordia's url is the same as Example 2 and I'm teaching Zotero in my research seminar this semester so I need this to work for my students. Would you object if I edit some of the main translators to make them work? If there are no objections, how flexible should the new regexp be?
For example, original url:
https?:(?:www\.|ocrpdf-sandbox\.)jstor\.org[/]*/(?:view|browse/[/]+/[/]+\?|search/|cgi-bin/jstor/viewitem)
Option 1:
https?:(?:0-www\.|www\.|ocrpdf-sandbox\.)jstor\.org[/]*/(?:view|browse/[/]+/[/]+\?|search/|cgi-bin/jstor/viewitem)
Option 2 (as suggested in various comments above):
https?:[/]*jstor\.org[/]*/(?:view|browse/[/]+/[/]+\?|search/|cgi-bin/jstor/viewitem)
Thanks!
comment:11 Changed 9 years ago by mikowitz
If no one minds/has already done so, I'm going to go ahead and start adding Elena and Simon's change ideas to some of our major databases as a test. If anyone's already doing/done this, let me know.
comment:12 Changed 9 years ago by erazlogo
- Cc erazlogo added
comment:13 Changed 9 years ago by erazlogo
michael -- that would be great, thanks! let me know if the more general regexp works--that would be preferable i think.
comment:14 Changed 9 years ago by mikowitz
comment:15 Changed 9 years ago by erazlogo
comment:16 Changed 9 years ago by dstillman
comment:17 Changed 9 years ago by dstillman
- Priority changed from critical to major
comment:18 Changed 9 years ago by simon
comment:19 Changed 8 years ago by simon
- Resolution set to fixed
- Status changed from new to closed
Looks like the proxy server from Example 2 might be common, too, since another user posted with an identically modified URL:
http://forums.zotero.org/discussion/1232/?Focus=4981#Comment_4981