Opened 10 years ago

Closed 10 years ago

#272 closed defect (fixed)

problems with Library of Congress ingest

Reported by: dcohen Owned by: simon
Priority: minor Milestone: 1.0 Beta 2
Component: ingester Version: 1.0
Keywords: Cc:

Description

David Wilson writes:

"My first test from the Library of Congress has brought up some problems with
either the LOC data or the zotero data collection.

  1. Title not collected properly title shown is "Du ", (from 'Dublin' ??)

also problems with accented characters in name.
record

LC Control Number: 2006373974
Type of Material: Book (Print, Microform, Electronic, etc.)
Brief Description: Ó Cathasaigh, Tomás.

Táin bó Cúailnge and early Irish law / Tomás Ó Cathasaigh.
Dublin : Faculty of Celtic Studies, University College Dublin, 2005.
23 p. ; 21 cm.

ISBN: 1905254059

see zotero1.jpg attached

  1. This has a similar problem. Title is show as "D ".

LC Control Number: 43044475
Type of Material: Book (Print, Microform, Electronic, etc.)
Brief Description: Táin bó Cúailnge. [from old catalog]

The Táin bó Cúailnge, from the Yellow book of Lecan,
Dublin, School of Irish learning [etc.] 1912.
vi p., 2 l., [3]-126 p. 25 cm."

I suspect this is a MARC record problem, but it's worth taking a look at.

Attachments (1)

zotero1.jpg (33.1 KB) - added by dcohen 10 years ago.
weird characters from LOC ingest

Download all attachments as: .zip

Change History (3)

Changed 10 years ago by dcohen

weird characters from LOC ingest

comment:1 Changed 10 years ago by simon

  • Status changed from new to assigned

no, this really is my bug, but it's going to be painful to deal with. i _think_ that this is happening because, when you have characters that use more than 8 bits, you can no longer use plain old string lengths to work with them. so, i'm going to have to loop through the record and figure out how many bytes each character takes. i hope this doesn't slow things down too much.

comment:2 Changed 10 years ago by simon

  • Resolution set to fixed
  • Status changed from assigned to closed

(In [638]) closes #272, problems with Library of Congress ingest

Note: See TracTickets for help on using tickets.