[pmwiki-users] bibliographies revisited

christian.ridderstrom at gmail.com christian.ridderstrom at gmail.com
Sun Oct 1 11:13:18 CDT 2006


On Thu, 28 Sep 2006, John Rankin wrote:

> >I don't have time to follow up on this properly right now, so this is 
> >just a quick reply.

Ok, I've now finally had time to look at your web pages and have time to 
actually read the thread and reply. Actually, I've already sent in a few 
replies (taken from an initial version of this post). It's still a bit 
long, so when possible we should try and split it up into sub-threads for 
further discussions.

----

I would like to begin with an overall question to keep in mind regarding 
your suggested plan/solution:

	Does it scale?

The scaling I consider here concerns questions such as:

* Will it work well in practice when the bibliographic database
  (bib-db) grows to hundreds of entries?

* Can different authors maintain and use their own bib-db:s?

* Can an author maintain and use several separate bib-db:s?

Achieving scalability *is* difficult, so I'm not saying that we must 
require all of the above answers to be yes. We should however be clear on 
which questions are answered with a 'no, not initially', and which will be 
'no, never'.

I am especially worried that a choice of user interface and storing 
mechanism now will make it difficult to later to scale up the system.


	Compatibility and interoperability with BibTeX?

In an ideal world, it would be possible to take a bunch of BibTeX files 
and import them to a wiki where they are then used and possibly modified. 
It would also be possible to go the other way and export (parts of) the 
bibliographic information from the wiki back to the same bunch of BibTeX. 
Phrased differently, the wiki would allow an online storage mechanism for 
the bibliographic information that people could use collaboratively. We 
should consider questions such as:

* To what extent will it be possible to import .bib-files?
  Restrictions on citation keys?

* Will it be possible to go back and forth between .bib-files?

* How do we maintain correspondence between .bib-files and bibliographic 
  information on the wiki?

----

And now to reply to your post:

> >> > 1. what is the downside to storing each citation in its own wiki
> >> >    page, where the unique reference id becomes the page name?
> >> 
> >> I've never done much in the way of BibTex or bibliographies, so my 
> >> comments should be discounted accordingly.  However, I know that 
> >> sometimes it's a pain if an author is limited to a one-to-one 
> >> correspondence between pages and data.
> >
> >My reaction is that I'd probably be very annoyed if I was forced to use 
> >a one-to-one mapping. My experience is that it's a PITA to do it that 
> >way.
> >
> >Using *virtual* pages on the other hand might be a good (parallel) 
> >solution.
> 
> I don't understand this comment, I'm afraid.

I simply mean that we could have a system where the bibliography database 
(bib.db) is stored in one wiki page (or .bib-file), but *pretend* that 
each citation entry resides on a separate page. Let's assume that the 
bib.db is actually stored in the page Site.BibDB. Also assume we have a 
virtual group called Bibliography. By going to the page 
	Bibliography.Ridderstrom2003-LLC
the system will extract the entry in the bibliography database that 
corresponds to Ridderstrom2003-LLC and generate the page on demand. Note 
that we could easily use a separate URI argument (eg ?bibstyle=unsrt) to 
specify a specific style for the citation.

	Note: The key I used for my thesis is actually 
		Ridderstrom:2003:LLC
	so I wouldn't be able to refer to my thesis using a page name.

In a similar fashion, going to a page that does not correspond to an entry 
in the Bibliography database could allow for a simple way to let the user 
add a new entry to the database without having to edit the database 
directly.

So in essence I mean that we could use the virtual pages as a mechanism 
for the user interface. However, the issue with restrictions on citation 
keys make me reluctant to use this interface. I would literally have to 
change hundreds of keys if I were to re-use my biblographic database.

> >The rest I'll hopefully have time to read and give comments on
> >tomorrow.
> 
> While commenting, perhaps give thought to how much of a PITA
> this would be. There appears to be a *lot* more development to
> support a multiple citations per page solution, as you lose 
> the ability to re-use a whole lot of existing code.

Hmm... are you sure about that? I suggest we discuss that separately - 
I've posted about it in a new thread.

> Why, exactly, would it be a PITA? Is removing this pain worth increasing 
> the budget by 10%, 20% ... 100% ... whatever it takes?

Primarily it is my gut feeling... and from the gut, quite a bit of extra 
work on the *backend* would be worth it. My instinct is that you wouldn't 
regret it. On the other hand, if you don't do it, I think it is likely 
that you will regret it.

Looking at it differently, my instinct is that you should keep the backend 
very similar to BibTeX files, but consider the pardigm of "one page, one 
citation" to be part of the user interface.

As an aside, if you are interested I could send you copies of the BibTeX 
files that I used during my PhD. They were useful for much more than just 
maintaining a database of bibliographic information. For instance, I used 
to add abstracts for each entry, as well as my private notes after reading 
a paper. That let me use the database to also remember what the paper was 
about. I also kept my information in different BibTeX files (4-5 files?). 
One of these files was sort of a "macro"-file, that defined stuff like the 
names of journals or conferences. This was useful in order to make sure 
that journal names etc was written in a consistent manner in the 
bibliography. Something else I used to annote in each entry was the 
affiliation(s) of the author(s). This was useful to see what work was 
being done by which groups.

Reading the above, I realize that having an abstract/summary is not a 
reason for not storing a single citation per page. I also realize that 
(:pagelist:) could be used efficiently to get a list of all entries that 
for instance correspond to a certain affiliated group. (Note that this in 
practice almost requires you to use macros for the affiliation name - 
otherwise you get into all the problems of misspellings:-(

So the reason I can think of right now is that sometimes you really wish 
to be able to edit a lot of entries at once (in the same edit form). A 
very simple example is if I maintain a list of URIs in the bibliography 
database. In this case I think that I'd prefer keeping a whole bunch of 
URIs in a single page. If I had to click up a different page for each URI, 
I'd probably get annoyed quickly. Or let's say I'm editing more complex 
entries, and I have to click between *lots* of different pages just to get 
consistent naming etc. (A typical case is that you want to make sure the 
name of an author is spelled correctly and consistently, and perhaps 
consistency wether using the full first name or just an initial).

I also think that it might get very annoying to have the change history of 
modifications to a bibliography database spread out over lots of different 
pages. Let's say you are editing the database (kept on many pages) in 
order to get consistent naming, and then you decide -oops- that it was 
wrong. You now have to revert the history of each and every one of those 
pages.

For the URIs, a similar thing might be that a domain name changes, and you 
now have to update all the URIs pointing to that domain. With all the URIs 
in a single page, the change is relatively easy - and nicely kept in the 
history. With a single URI per page, it'll get tedious.


The most important point is however the compatibility with BibTeX files, 
i.e. being able to go back and forth between the bibliographic information 
on the wiki and what you have as local BibTeX files. Please note that it 
also is important to be able to have a few different BibTeX files - I 
don't think it would be enough to get a correspondence to a single BibTeX 
file, and not just because you want to have a separate (BibTeX) file that 
defines macros for consistent naming.


I can see that the easy way right now is to store a single citation on a 
single wiki page. This would quickly allow to create a wiki system with a 
single bibliography database for the entire wiki (and hence all authors).

Assuming we have an import mechanism from a BibTeX file, I can also see 
how you could automatically populate the pages in the group Bibliography/. 
We do however have to think about the reverse process. If the citations in 
Bibliography/ comes from two different BibTeX files, I'd want to be able 
to update the BibTeX files with the relevant citations from Bibliography/. 
This requires that each entry "knows" which BibTeX file it belongs to (or 
something like that). We could of course maintain a correspondence between 
one group and one BibTeX file, but then we lose some of the advantages of 
having Bibliography/ as a site-wide bibliographic database.

That loss might of course be inevitable, as authors probably want their 
own bibliographic databases. I've got pages in a separate group on 
pmwiki.org, and I'd probably expect to be able to have a separate 
bibliographic database for those pages. Perhaps even more than one.

> Is multiple citations per page a "must have on day one" or
> a "could be added later" requirement?

I would say it is a "must be able to have later on, or it'll bite me" in 
the sense that I'll be restricted. My gut also says that being able to 
synchronize between a set of local BibTeX files, and a set of bibliography 
databases on the wiki is necessary (later on).

(As an aside, needing different bibliography groups might be a motivation 
for having a hierarchical group structure - then each group could have 
their on Bibliography/ sub-group if that is desired).

> In particular, check
> http://www.wikipublisher.org/wiki/index.php?n=Bibliographies.Plan
> 
> This is attempts to define a practical work plan of small chunks, in a 
> logical build sequence. Feel encouraged to add to or resequence the 
> plan, but please don't remove any of the items. My feeling is that the 
> first 3 stages are about 2:1:2 in relative size.

Your plan is probably workable, but I'm worried the system will have some 
serious limiations. As for the limitations, that could be ok if you are 
ready to consider the system a useful first prototype, i.e. something 
that's testing the waters and getting you familiar with what users need. I 
would however be prepared to do a complete re-design of the whole thing 
after having some practical experience with using it.

But it also very much depends on what you are trying to achieve. Our goals 
are probably different, and maybe my aims are set much to high after being 
used to the full power of BibTeX?

I think that the "safe" plan would be slightly different. For instance, 
I'd start with being able to import *and* export a *set* of .bib-files as 
I expect that functionality can become difficult to add later on.

I am not completely convinced that the best way in the long run is to have 
one or more bibliography groups, but I agree that it is a good starting 
point - certainly good enough for a usable prototype system. The main 
advantage is that we get a lot of functionality for free. I am worried 
about restrictions on scalability from that system though...

Anyway, getting in to implementation details before agreeing on the scope 
is probably a bad idea... I would however at least investigate the idea of 
storing the bibliographic databases in one or more .bib-files. It might be 
possible to re-use a lot of existing code for many of the tasks.

best regards
/Christian

-- 
Christian Ridderström, +46-8-768 39 44               http://www.md.kth.se/~chr


More information about the pmwiki-users mailing list