Friday, September 08, 2006

Well there's a shock. Who would have though it?

And on a Friday afternoon too. In truth, I've no idea how the timetable for releasing a report like that is decided. It could just be a coincidence.

There's some interesting stuff in the full report (pdf) which'll hopefully make it into a post shortly.

Unfortunately, it appears that each page of the PDF is an individual image rather than text so I can't get the Select Tool to copy the sections I want. Bah. Unless anyone can offer a workaround for that, it looks like there's going to be a fair old bit of "click, read, click back, type, click, read, click back, type..." going on. I'm cynical enough to wonder whether these reports are posted that way intentionally. Bah. This may take some time.

Tags: , ,

7 comments:

Davide Simonetti said...

Have you tried putting the pages into an OCR package and converting it to text. That often works, although you then have to correct a few mistakes later. I'll give it a go with a page and let you know what happens.

CuriousHamster said...

Thanks Davide. I've actually just about done the stuff I need for the first post. (What a way to spend a Friday evening. Proper geekage!)

It does look like there's a wealth of stuff in there which deserves to be given a wider airing though.

Davide Simonetti said...

It works...well sort of, it's a bit involved but much better than typing the damned thing. What I did was copy the page to the clipboard (I just did the front page as a test), then in photoshop select "new" and then make the size 300 dpi and paste in the page. Then save the jpg and run OCR package if you have one. It should send the result to your word processor (Microsoft Word in my case).

If you don't have an OCR package I can help. Let me know which pages you want because there are 151 of them. I'll need a way to get in touch with you other than comments.

Hope that's useful.

CuriousHamster said...

Thanks again. I'm afraid I don't have an OCR package so I might take you up on the offer for a second post I might do at some point. I've finished the stuff for the first post now. Too tired to actually write the post to go with it tonight though.

My email address is in the right sidebar under Blog Stuff. I keep meaning to put it somewhere more visible.

email[DOT]garry[AT]gmail[DOT]com

Thanks again. Much appreciated.

Davide Simonetti said...

Cheers, I'll email you.

NotSaussure said...

Sorry, Garry; I hadn't read the latest comments when I emailed you yesterday. As I explained (and I'm posting this in case anyone else needs to do this sort of conversion job), I eventually found ABBYY FineReader, which is a free 15-day fully functional download that reads the .pdf file as a batch and converts it to Word -- the process took about 10 minutes when I eventually worked out how to use it -- duh, use the Wizard!

You should by now have received a zipped version. Many thanks for all your work in ploughing through the report and analysing it.

Niels said...

On a friday. Just before the 11th of September. When no-one in their right mind would criticise the "war" on "terror", for fear of being labelled somewhere between grossly insensitive and flat-out evil.

Coincidence. Yes. Maybe.