Thursday, May 14, 2015

Uploading to The Winnower from Blogger: A real time tutorial

So I have a memory like a sieve except that it's profoundly less useful in the kitchen. And because I know from painful experience that anything I don't document never happened, I am going to help myself and you by creating in real time a tutorial to upload blog posts from Blogger to The Winnower, should you be so inclined. Why do it? DOI is one reason.

Those of you who were smart enough to begin your blog's existence on Wordpress can use a fancy plugin for your API. Those of us who now have too much inertia on Blogger to relocate must do a little more work and use some intermediate steps, but it really isn't that hard. What's more, the intermediate steps offer an opportunity for proofreading and fine-tuning that you might like to do anyway. Let's do it!

Initial steps

First up, I assume you have created an account on The Winnower.  Then select the SUBMIT option in the top menu. At that point the step-by-step instructions can be found in the link in the top-right corner of the page, Submit Your Paper:

You'll get a new window pop up; skip on down to the Word instructions:

Here you will want to download the Word template and also read through all the formatting guidelines before moving on:

So now let's shift to the template document just downloaded. Aha! This looks familiar! I couldn't for the life of me recall how I'd managed to have on my laptop the neatly formatted docx file of an earlier post I'd uploaded to The Winnower last December. The memories are beginning to flow. What a relief!

Okay, so now I need to get myself offline for a bit and copy-paste today's upload into the template. I don't recall hitting any major obstacles previously but if I do I shall be sure to report them here in a mo.

I'm back. The first thing I had to do was decide the source of the material I'm going to copy-paste. I keep PDF copies of all my posts so I began by trying that, but it's not ideal because of all the extraneous words in the menus, etc. when you print to PDF from a browser. Makes it hard to get a clean text selection over anything more than a paragraph. Thus, I elected to go back into Blogger API and open the post I'm uploading, as if I am going to edit that post. The Blogger API is pretty simple although it has a habit of hiding irrelevant HTML if you've pasted text from formatted sources. In any event, the Blogger API makes it convenient to separately copy-paste the title and post body. I edited the author and address information by hand.

The abstract is my first opportunity to revise what's in the blog post for The Winnower. The post I'm uploading had two prior incarnations: a preliminary draft with a call for input, followed by a first completed version. An introduction to the post resides only in the first version but it needs to be updated to reflect the evolution in subsequent posts. Okay, then. Snip, snip, splat, done. A new abstract is created!


Here's an important point. Notice what I did and didn't do with the Poldrack reference in my abstract. * The first post I uploaded to The Winnower contained a bazillion references. It was a literature review! I went back and forth with Josh about the citation recommendations in his guidelines. I was beta-testing the upload procedure so I tried to play along, except that the way I'd used simple hyperlinks (mostly to PubMed IDs) in the blog post meant that I would have many days' work to conform to the guidelines. Once the blog was uploaded and all the links worked I decided that it was good enough to proceed with the links alone. Check it out, see if you agree. I am assuming that PubMed and the NIH will be around a while so I'm counting on PMIDs to be resolvable indefinitely. Of course, I see the irony in not using a DOI instead of the PMID here given one of the goals of this upload, but I tend to use PMIDs a lot more than DOIs to this point. If in doubt, do both. Otherwise, I'm going to give my personal opinion that it is better to have got your post uploaded with PMIDs, links or whatever you've got than to procrastinate for wont of conforming to some standard. Sure, DOIs and full citations for every reference would be lovely, but we're busy here! The Winnower papers work perfectly well with hyperlinks so provided your links have some survivability to them I think you're good to go. **

* I just noticed that links aren't given differential formatting automatically in the abstract of The Winnower articles, making them hard to see. But you can mouse-over them and they reveal themselves. There are four links in the abstract to my first uploaded post. See if you can find them. In today's test I shall add bold, italics and underline to the abstract template and see if I can emphasize the links...

** Links in the manuscript body (that is, anything outside of the abstract) format automatically to reveal themselves, so just insert links as usual in either the Blogger API or in Word and you're good to go.

Now it's time to copy-paste the body of the post between the [manuscript] anchors of the template. Pretty easy stuff, except that again I'm going to take the opportunity to tweak some things. I'm going to move the abbreviations up to the top, and I need to address the issue of highlighting and other formatting that resides in the blog post.

I'm busily copy-pasting from the Blogger API into the template. I'm grabbing entire chunks of text, figures 'n all. I simply select the "Match Destination Formatting" option that Word gives me whenever a new pile of stuff is dropped wholesale into the template. Some tabs are vanishing so there's a little bit of work comparing the format of the published blog post to the template draft, but so far so good. Though it should be noted that I expect a lot of my reformatted text to appear differently once it's converted to HTML and uploaded to The Winnower, so I'm not spending very much time making my document purdy.

Figures and tables

I'm also being pretty cavalier with figures and tables. In this post the only figures are actually tables of variables. The table extends slightly more than one page. Rather than spend goodness knows how much time creating a new table or making everything fit on a single page, I'm again going for expediency. So it's Word > print to PDF > export as PNG > insert into template. Sorry, Josh. We're online here, so all that really matters is that the final document is readable. If I've wasted a third of someone's screen then I apologize. I refer you to my earlier opinion regarding overcoming procrastination.

The upload

I'm nearly there! The template document is complete and I've double-checked that the abstract lies between the [abstract] anchors and everything else resides within the [manuscript] anchors. It's upload time.

The next step is to export an .htm version out of Word. At this point my document is called checklist_to_Winnower.docx rather than word_template.docx. I'm going to assume that The Winnower's servers like to discriminate between uploads. That said, I don't actually think it matters what you call your .docx or the subsequent .htm file. So I'm just selecting "Save as Web Page..." per the instructions: ***

*** The option to "Save entire file into HTML" creates a number of .xml that aren't needed for the upload. Thus, you can opt for "Save only display information into HTML" when exporting from Word.  

Note the requirement to select the "Web Options..." and then ensure you have the Unicode (UTF-8) option selected:

Word immediately opens the .htm file for me but as a skeptic I will quickly review the file in my browser. The first part looks okay except for a deliberate mistake :-/ Anyone spot it?

Yes, that's right. My post is presently entitled "Windows Template." So just a few moments while I correct that oversight... Done. I have to export to .htm, of course, so I'll spend a few minutes checking the rest of the export first. I rather like the idea of enforced proofreading. This isn't quite enforced but the hassle of re-exporting certainly encourages it here. Hands up who has reviewed a paper where one of your earliest thoughts is along the lines of "WTF? Do they expect me to proofread it then review it? Can I be a co-author?" Spend the time, people. 99 percent done, halfway there.

Okay, so I've scan-read the .htm file (I don't take my own advice) and I'm ready to proceed. I notice a new folder containing the images and some .xml files. The instructions tell me I'll be needing the image files, at least, during the upload. Here goes.

The first page of the upload is self-explanatory, but you do need to come up with up to five keywords. Make them count. Somewhat usefully, the interface shows you all the keywords that people have used before although they're not categorized.

The next page asks me to upload my document, or I could perhaps use a URL to the .htm file residing in my browser. I haven't tried the latter before so I'll use the DOC button, as I think I did last December. Then I just have to find and upload the .htm file and the three image files in the post:

Having checked the two radio buttons and selected PROCESS UPLOADED FILES you will definitely want to preview your paper. My draft is well spaced. Too well spaced, in fact. A few diagnostics shows that a new line after an anchor, a header, a sub-header, a figure or a paragraph is sufficient to insert a half line space. A blank line results in 1.5 lines in the uploaded version. So I'm going to go back and remove the blank lines between my paragraphs, etc. in the Word document. It makes the Word version look crap but I'm confident the formatting will work properly once The Winnower gets a hold of it. This is what it now looks like as .htm with blank lines removed:

It looks clunky as .htm (and in Word) but once uploaded the formatting gets sorted out quite nicely.

I've previewed my paper and it's good enough. I notice the links in the abstract are displaying bold and italicized but not underlined (two out of three ain't bad!) which discriminates them nicely from the plain text:

While it's not perfect, the theme of this post is to "git r' dun" so I'm going to resist the temptation to get anal at this late stage and instead select the PROCESS UPLOADED FILES button to release the beast.

Oh, this is cool. On the third of the four upload pages you are given the option to provide "additional assets" to your paper. I was expecting to be asked for my credit card and CVV code. But no! It's a place I can drop a PDF version of the checklist, to compliment the one that already resides on Dropbox. Nice.

All that remains to do is notify anyone I want to review the paper - no thanks, I'll keep it a secret ;-) - and.... Voila! Success!!!!

The new paper is now live right here. I've checked that the links and the downloadable PDF work properly but I'll leave any more strenuous proofreading and corrections for a later date. Likewise, I'll wait on assigning a DOI because I don't see the button that was there when I used it a month ago on another post. For now, then, it's a wrap.

  1. So I spent some of the weekend proofreading and getting the new paper ready for assignment of the DOI. I spotted that one of my keywords, "fMRI" had vanished and checked with Josh. Turns out capital letters aren't yet permitted and so the term vanished. Josh reinstated "fmri" for me because I was unable to do it myself. A note to this effect will be forthcoming on the instructions, I'm told.

    To revise the draft version I had to select both the .htm file and image001.jpg simultaneously (Command button on a Mac). If you upload only your .htm file and don't re-upload the image files then you end up with link text and no actual image(s). I then deleted the draft "Additional Assets" and uploaded a new version. And then I hit the Archive button to assign a DOI and fix the post for eternity.