Opened 10 years ago
Closed 10 years ago
#272 closed defect (fixed)
problems with Library of Congress ingest
| Reported by: | dcohen | Owned by: | simon |
|---|---|---|---|
| Priority: | minor | Milestone: | 1.0 Beta 2 |
| Component: | ingester | Version: | 1.0 |
| Keywords: | Cc: |
Description
David Wilson writes:
"My first test from the Library of Congress has brought up some problems with
either the LOC data or the zotero data collection.
- Title not collected properly title shown is "Du ", (from 'Dublin' ??)
also problems with accented characters in name.
record
LC Control Number: 2006373974
Type of Material: Book (Print, Microform, Electronic, etc.)
Brief Description: Ó Cathasaigh, Tomás.
Táin bó Cúailnge and early Irish law / Tomás Ó Cathasaigh.
Dublin : Faculty of Celtic Studies, University College Dublin, 2005.
23 p. ; 21 cm.
ISBN: 1905254059
see zotero1.jpg attached
- This has a similar problem. Title is show as "D ".
LC Control Number: 43044475
Type of Material: Book (Print, Microform, Electronic, etc.)
Brief Description: Táin bó Cúailnge. [from old catalog]
The Táin bó Cúailnge, from the Yellow book of Lecan,
Dublin, School of Irish learning [etc.] 1912.
vi p., 2 l., [3]-126 p. 25 cm."
I suspect this is a MARC record problem, but it's worth taking a look at.
Attachments (1)
Change History (3)
Changed 10 years ago by dcohen
comment:1 Changed 10 years ago by simon
- Status changed from new to assigned
no, this really is my bug, but it's going to be painful to deal with. i _think_ that this is happening because, when you have characters that use more than 8 bits, you can no longer use plain old string lengths to work with them. so, i'm going to have to loop through the record and figure out how many bytes each character takes. i hope this doesn't slow things down too much.
comment:2 Changed 10 years ago by simon
- Resolution set to fixed
- Status changed from assigned to closed
weird characters from LOC ingest