Opened 9 years ago

Closed 7 years ago

#698 closed enhancement (fixed)

Migration away from VBA

Reported by: stakats Owned by: simon
Priority: major Milestone: 2.0 Final
Component: word integration Version: 1.5
Keywords: Cc:

Description

Microsoft has pushed back the release of Office 2008 to January, so we've got some breathing room here.

Change History (30)

comment:1 Changed 9 years ago by simon

I find it somewhat terrifying that we'll be able to support OOo for Mac, Linux, and OS X and Word for Windows with a single codebase, but we'll need a second one to support Word for Mac. Then again, I suppose it's necessary.

Is it even possible to do this before we have the new version of Word on hand? It would be nice if we could use a real API, but there has never been a publicly available Mac Office SDK, and whatever there is now probably still requires compiling CFM code. We can always resort to AppleScript, but that's just as icky.

comment:2 Changed 9 years ago by simon

I've now played with a beta version of Word 2008, and, from what I can tell, there's no way to make a toolbar button call anything external. This means that we probably need to launch another app to host a toolbar for formatting purposes, then use AppleEvents to deal with fields, etc.

Leopard's Scripting Bridge would make development in a language besides AppleScript pretty easy. The documentation there shows Objective-C, but there are also docs on using Scripting Bridge with Python and Ruby through a Cocoa bridge. It's possible the Cocoa bridges for Python and Ruby might some day be abandoned, but a Python module could theoretically be portable and operate in OOo.

The one disadvantage to using Scripting Bridge is that the new plug-in would require Leopard. It seems reasonable to me to assume that anyone who bothers to upgrade Word would also bother to upgrade their OS. Alternatively, Python has some AppleEvent support, but, from what I can see, it's not completely developed, and it certainly lacks the support of Scripting Bridge.

Is there any reason not to proceed with this approach? I probably won't have much time to work with on it until winter break, but we should be able to get something out before Office 2008 is released. Pages support would be nice, too...

comment:3 Changed 9 years ago by stakats

If Leopard allows for significantly easier and more portable development (and it sounds like it does), I see no problem making it a requirement for Office 2008.

comment:4 Changed 9 years ago by dstillman

I don't have much of a problem with requiring Leopard for those reasons, but is it possible to use the script menu for pre-Leopard compatibility, even if it's gross?

http://www.mactech.com/vba-transition-guide/index-006.html

comment:5 Changed 9 years ago by simon

The idea is to use Scripting Bridge to, e.g., add Zotero fields to Word documents. So, to avoid using it, we'd have to code the plug-in in plain AppleScript or the older Python AppleEvent APIs, neither of which seem particularly appealing options. AppleScript, while better than VBA, is still not a very nice language to code in, and the older Python APIs will probably be deprecated in favor of Scripting Bridge.

However, you bring up a good point: we could use the Word 2008 script menu instead of hosting our own toolbar. While not quite as usable as a Zotero toolbar, it doesn't require launching a separate application, which is a clear advantage. But this isn't something that we decide on now, because the code to call the scripts, however we implement it, should not be a very large portion of the project.

comment:6 Changed 9 years ago by dstillman

Yeah, I understand--I just meant that, if pre-Leopard compatibility was a concern, AppleScript and the script menu would be an option, but reading back, you said that, obviously.

I agree that pre-Leopard Office 2008 upgrades will be rare, so Scripting Bridge seems fine.

comment:7 Changed 9 years ago by dstillman

Simon, have you looked at the citation support in Word 2008? How does it compare? And is it scriptable?

http://www.appleinsider.com/articles/07/11/14/road_to_mac_office_2008_word_08_vs_pages_3_0.html&page=3

comment:8 Changed 9 years ago by simon

It looks like it's basically the same as in Word 2007 for Windows, and, as in Word 2007, it is very limited. It can't format multiple sources by the same author in the same year properly, it only supports Chicago/MLA/APA/Turabian, and each style is a 10,000+ line XSLT file. Furthermore, at least in the beta I've used, it doesn't look to be scriptable at all.

comment:9 Changed 9 years ago by bdarcus

But note: Word 2007 (at least, and I would hope 2008) has a citation API, and OOXML has a citation field specified in the spec. Is there not a way to use this infrastructure, but bypass the (XSLT) formatting engine?

I have the contact email for the MS project lead for the citation feature, and would be happy to contact her if that might be helpful (or give you her email address so you can do it on your own). Let me know.

Also, the hardware requirements are fairly steep for Leopard, It won't even install on my machines (aside from the new office machine I just got, which has it pre-installed). I would not assume people using Word 2008 are also using Leopard. It might be worth doing some research before making a final decision?

Finally, it would also be nice if there was some way to make it easier to integrate Zotero with other Mac word-processors. I've long thought citation processioning on OS X ought to be a system service.

comment:10 follow-up: Changed 9 years ago by simon

As I stated above, at least in the beta I've played with, Office 2008 does not expose any kind of interface to the citation generation tool. If we plan to use Zotero as a citation store and CSL as a citation language, however, neither the frontend nor the backend are useful to us, so I don't think this possibility is really worth pursuing.

While there may be a small percentage of users who upgrade to Office 2008 but don't upgrade to Leopard, there aren't many alternatives to using Leopard's Scripting Bridge. py-appscript is really the only appealing one. Its syntax actually seems nicer than what Scripting Bridge would give us, it has better documentation, and development seems fairly active. By relying on a third-party module, however, we increase the chances that we'll have to migrate again in the near future. Sean, do you have any thoughts on this?

A Zotero service is a good idea, but, without fields or equivalents, functionality will be effectively limited to what one can already achieve through Quick Copy.

comment:11 in reply to: ↑ 10 Changed 9 years ago by bdarcus

Replying to simon:

As I stated above, at least in the beta I've played with, Office 2008 does not expose any kind of interface to the citation generation tool.

Hmm ... so this discussion about the Word 2007 object model does not apply to Word 2008? I wonder if they pulled it along with VBA??

If we plan to use Zotero as a citation store and CSL as a citation language, however, neither the frontend nor the backend are useful to us, so I don't think this possibility is really worth pursuing.

You are assuming the processing has to happen within Zotero, but I don't think that's necessarily the case. What if instead Zotero passed the raw data to embed in the OOXML file, and the formatting happened within Word (or OOo), bypassing the XSLT-based system?

Notwithstanding any perfectly legitimate "just need to get it working" constraints, I am just saying you might consider different alternatives as you look longer term. The current integration is already severely hampered by some faulty assumptions; that, for example, multiple users won't collaborate on the same document, that a single user won't use different machines with different Zotero instances, etc. These aren't hypothetical concerns.

As far as I can see, the only way likely to really solve these problems is to embed the data in the document and to do the formatting within the editor (and perhaps along with the smart identifiers I've mentioned previously).

This is basically what John McCaskey was working on, though I know nothing about how far he got, or how well it worked.

A Zotero service is a good idea, but, without fields or equivalents, functionality will be effectively limited to what one can already achieve through Quick Copy.

I'm talking something more generic than a "Zotero" service, since different bib projects (for example, BibDesk) are having discussions about integrating with different editors (Word, Pages, Mellel, Nisus, etc.). Perhaps a smart generic service can greatly simplify this?

I've been told (by someone at Apple, where I submitted an enhancement request for this BTW) that Cocoa can support something like fields. I don't remember the details, but it essentially involved attached key/value data to chunks of text IIRC.

comment:12 follow-up: Changed 9 years ago by simon

We have decided to go with appscript instead of Scripting Bridge for the time being, both for compatibility reasons, and because it has much better documentation, so the new 2008 plug-in will work with 10.3+.

Hmm ... so this discussion about the Word 2007 object model does not apply to Word 2008? I wonder if they pulled it along with VBA??

Unlikely that they pulled it; more likely that it just never got implemented. It may turn out to be in the final release, but these objects aren't in the AS dictionary for the beta I used.

You are assuming the processing has to happen within Zotero, but I don't think that's necessarily the case. What if instead Zotero passed the raw data to embed in the OOXML file, and the formatting happened within Word (or OOo), bypassing the XSLT-based system?

I don't know how we plan to format in Word while bypassing the XSLT-based system, given that the only interface we have with Word is AppleScript.

Notwithstanding any perfectly legitimate "just need to get it working" constraints, I am just saying you might consider different alternatives as you look longer term. The current integration is already severely hampered by some faulty assumptions; that, for example, multiple users won't collaborate on the same document, that a single user won't use different machines with different Zotero instances, etc. These aren't hypothetical concerns.

They also aren't design flaws so much as unimplemented features.

As far as I can see, the only way likely to really solve these problems is to embed the data in the document and to do the formatting within the editor (and perhaps along with the smart identifiers I've mentioned previously).


This is basically what John McCaskey was working on, though I know nothing about how far he got, or how well it worked.

What prevents us from embedding the data in the document and doing the formatting within Zotero?

I'm talking something more generic than a "Zotero" service, since different bib projects (for example, BibDesk) are having discussions about integrating with different editors (Word, Pages, Mellel, Nisus, etc.). Perhaps a smart generic service can greatly simplify this?


I've been told (by someone at Apple, where I submitted an enhancement request for this BTW) that Cocoa can support something like fields. I don't remember the details, but it essentially involved attached key/value data to chunks of text IIRC.

I'm not sure about the details of this, but I'm writing the new plug-in with all word processor interaction modularized, so theoretically, extending the plug-in to work with Cocoa "fields" shouldn't be impossible.

comment:13 in reply to: ↑ 12 Changed 9 years ago by bdarcus

Replying to simon:

What prevents us from embedding the data in the document and doing the formatting within Zotero?

Nothing; just depends on your priorities.

If all applications continue to do formatting within the bibliographic application and the word-processor continues to in essence to be a dumb container, it just raises the question of how you ensure interoperability among different applications. In the real world, I would expect people collaborating on documents might use different word-processors and bibliographic applications.

To some degree this gets solved with a) CSL, and b) freely available implementations in different languages that can be easily adapted to different contexts. But I always thought one key advantage of BibTeX's ecosystem was that there was a really strict separation of data from presentation processing. As a result, BibTeX GUI databases only ever deal with managing the data, and users always know their stuff will just work (within, of course, that rather limited world of TeX).

comment:14 Changed 9 years ago by bdarcus

Simon, you should take a look at this thread about serious performance issues with large documents. Perhaps those can be resolved along with the move away from VBA?

comment:15 Changed 9 years ago by stakats

Looking at this problem from the Windows side of things, is there no reasonable alternative to VBA? No way to use Python, for example? If there were some even remotely decent way to reuse the Python code on the Windows side of things, it would be great. On the other hand, I am hoping that the amount of development time spent on the plugin will in the long run decline dramatically once we iron out its remaining bugs.

comment:16 Changed 9 years ago by simon

Theoretically, it's possible to use COM from Python to interact with Word, but this would require that all users install Python, and I'm not sure how we'd launch the scripts from the Office toolbar. Once we get Mac Office working, I'll do some further investigation.

comment:17 Changed 9 years ago by stakats

As you've suggested, it seems like maintaining multiple versions of the Word plugin in different languages would be a complete nightmare. What do you think about just doing it in Python across the board? Dan S. pointed me to an interesting project that packages a subset of Python into an application (http://blog.magnetk.com/2008/03/26/high-leverage-development/). Perhaps we could do something like this for Windows (or indeed all of our platforms)? Requiring some version of Python on all machines is probably the only way we can get good equivalent functionality without a lot of duplication of effort, right?

comment:18 Changed 9 years ago by bdarcus

If you guys are rewriting the code, I'd like to suggest you do it with a view towards the future, which means a) developing a generic API or service, and b) using support more robust (global, URI) identification and linking now. For example:

Let's say a citation is an ordered list of one-or-more references. A reference, then, has the following properties:

  • identifier (URI form; see below)
  • prefix
  • suffix
  • locators (a list of key values)
  • local_style (an itemized list of options)

On using the current ids within the URI context, what about some way to encode the database id and record id as a URI? You then have that more generic, more robust, approach built in, but can build up the infrastructure over time. You could even include an "alternate_uri" or some such, which would allow it to work across multiple database instances (which Zotero does not now).

comment:19 Changed 9 years ago by bdarcus

Just to be clear: by the generic API, I mean encouraging its implementation in word-processors like OOo and Word, so that there can be one way to write code for this.

comment:20 Changed 9 years ago by stakats

We're pretty close to such a system already in what Zotero embeds in Word fields, e.g.
{ ADDIN ZOTERO_ITEM {"citationItems":[{"itemID":12345,"locator":"22","position":3}]}}

Rather than prefix and suffix, Zotero instead uses "custom" which allows complete editing of the reference, and in my opinion it's a more useful solution even though it effectively flattens the citation's data (though the other parameters are still there in the field code and could revert the citation to its original form). Replacing itemID with something more like "http://zotero.org/bdarcus/12345" is certainly doable and would ultimate allow for greater portability of references.

Found another helpful post on Python development, though it makes the Windows case look a little scary. Perhaps in Office for Windows we could use VBA's support for calling Python? http://www.wooji-juice.com/blog/high-leverage-why-yes.html

comment:21 Changed 9 years ago by bdarcus

Sean: I'm asking you to identify an explicit strategy to get from point a to point b now rather than later. I'm not so sure saying that at some point one can use an optional URI in ItemID is that reassuring. I think you need a target ID that is ALWAYS a URI.

comment:22 Changed 8 years ago by dstillman

VBA Returns to Future Versions of Office for Mac

"...the team recognizes that VBA-language support is important to a select group of customers who rely on sharing macros across platforms. The Mac BU is always working to meet customers’ needs and already is hard at work on the next version of Office for Mac."

http://www.microsoft.com/presspass/press/2008/may08/05-13MacBU2008PR.mspx?rss_fdn=Press%20Releases

Thanks, Microsoft. That's really helpful.

comment:23 Changed 8 years ago by stakats

Trac is probably not the appropriate forum to express my profanity-laced reaction. You can use your imagination.

comment:24 Changed 8 years ago by stakats

So how does this announcement affect our strategy? It seems like we have a few options here:

1) Not support Office 2008 for the Mac. Continue with VBA and OOo Basic for other platforms, including Office 2004 for the Mac and whatever Mac version ultimately brings back VBA.

2) Port the Mac plugin to Applescript. Continue with VBA and OOo Basic for other platforms.

  1. Port the Mac plugin to Python. Continue with VBA and OOo Basic for other platforms. Python here provides some flexibility in case we want to move other plugins off of VBA / OOo Basic.
  1. Port all plugins to Python. Would allow for a single code base but introduces the problem of delivering a Python interpreter to other platforms, especially Windows.

comment:25 Changed 8 years ago by dstillman

From the same press release: "The response has been amazing — since we launched in January, the velocity of sales for Office 2008 is nearly three times what we saw after the launch of Office 2004."

This probably speaks more to rising Mac sales than anything else, but it also means Option 1 is probably out.

Option 2 sounds painful, with no potential additional upside.

There is another option other than 3 and 4:

5) Use Python for Word 2008 on OS X and OpenOffice on all platforms, since OpenOffice already ships with Python, and use VBA on Windows Word, where it seems to work fairly well and be the course of least resistance. Older/newer Mac Offices could use whichever worked better.

We should also look at the VBA support in OO 3.0 and see if it addresses any of the issues we were running into there, since there would obviously be some extra development work to make the Python plugin work in OO. If the VBA support is much improved, that could support Option 3.

Obviously, this depends on what's left to be done in the plugins, how hard it would be to maintain two different versions, and how much more pleasant it would be to go all-Python, but bundling Python on Windows seems like a road we may not want to go down if we have a choice.

In a related note, I don't see any indication that they added the ability to launch AppleScripts from buttons in Office 2008 SP1, so we're probably still stuck with the awkward Scripts menu or a separate floating application to host the buttons.

comment:26 Changed 8 years ago by bdarcus

I wonder if another possibility to look out for is whether MS enhances integration of the .Net runtime (upon which IronPython can run) into Office?

Remember too that OOo 3.0 is getting an RDF API and associated field to park generated content, which should allow solving some current problems.

comment:27 Changed 8 years ago by simon

(In [3017]) references #698, Migration away from VBA

Adds a Python/py-appscript-based plug-in for Word 2008. To get this to work, you'll need to copy the Zotero directory (not its contents) to ~/Microsoft User Data/Word Script Menu Items and install py-appscript (sudo easy_install appscript)

Some caveats:

  • Requires Word be installed at /Applications/Microsoft Office 2008/Microsoft Word 2008.app (this is fixable, but I'm still determining the best way to solve it)
  • Still need to figure out what to do with items that have been deleted from the DB (right now, we just ignore them)
  • Sometimes, Python.app launches with the script, which seems to slow execution time

comment:28 Changed 8 years ago by stakats

Small correction: Zotero directory should be copied to ~/Documents/Microsoft User Data/Word Script Menu Items

comment:29 Changed 8 years ago by simon

(In [3350]) closes #892, OO Plugin ibid not working
closes #940, Overflow error with many references in OOo
references #698, Migration away from VBA

Installer for the MacWord plug-in (with appscript for 10.5 bundled), and extension for OOo. There still appear to be some bugs in OOo bookmarks, which I'm looking into.

comment:30 Changed 7 years ago by simon

  • Resolution set to fixed
  • Status changed from new to closed

(In [4947]) Integration megacommit, part 2: Zotero code

Closes #884, final period missing when a citation is first added in note styles
Closes #1298, issues with footnotes and citations in OOo
Closes #1069, Use async HTTP calls for integration requests
Closes #1027, User-customizable integration port number
Closes #698, Migration away from VBA
Closes #1085, Migrate VBA plug-in to new XML-based API
Closes #792, Auto-updating of OO plugins

Note: See TracTickets for help on using tickets.