danaeris: (Default)
[personal profile] danaeris
title author journal volume number month year pages abstract
^(.+)(.+)
[Error: Irreparable invalid markup ('<newline\d>') in entry. Owner must fix manually. Raw contents below.]

title author journal volume number month year pages abstract
^(.+)<NewLine1>(.+)<NewLine\d>(.+)[,>]\s*Vol\.\s*(\d+),\s*No\.\s*(\d+).+\((.+),\s*(\d\d\d\d)\),\s*p{1,2}\.([\d|\-|\s]+)\.{0,1}.+Abstract(.+)$

A lot of the above looks like the syntax used in grep to me.

Is this a particular language, or should I just go with the grep commands?

In case you're wondering... the above is a regular expression from the regexps file for cb2Bib, a program that attempts to parse citations or articles in order to sort out the different data categories, and then spit out BibTex/store the data in a database as individual fields that can be sorted usefully etc. It's invaluable in citation analysis... if I can write a regular expression for the citation style the journal we're looking at uses, Harvard Style.

I haven't done any programming or shell type stuff in a long time. I think I can figure this out, but it ain't gonna be easy for me, given how long it's been.

Date: 2007-10-18 01:01 pm (UTC)
nathanjw: (Default)
From: [personal profile] nathanjw
Looks like a Perl regular expression. Vanilla grep doesn't have the \d syntax (digits), for example, but Perl does.

Profile

danaeris: (Default)
danaeris

August 2022

S M T W T F S
 123456
78910111213
14 151617181920
21222324252627
28293031   

Most Popular Tags

Page Summary

Style Credit

Expand Cut Tags

No cut tags
Page generated Jan. 22nd, 2026 09:46 am
Powered by Dreamwidth Studios