Tools of the Trade: How to Automatically Import and Organize Your PDFs In Your Digital Library
If you find yourself downloading and organizing a lot of PDF documents, particularly for research purposes, you can save yourself a considerable amount of time by automating parts of this process. After receiving a few curious responses to a post about this yesterday on Twitter, I decided to write up this short tutorial.
Overview & Goal
The goal of this process is simple: set up a process where PDF documents that you download are automatically imported into citation software and organized for your review. This is helpful if-like me-a lot of your research information comes in the form of academic articles, NGO and government reports, legal filings, and other types of specialized information that tends to be circulated in PDF format. If you are interested in how I handle other types of more dynamic textual information (such as news articles and blog posts) or other types of media (including videos and audio files), let me know and I’ll share my approach to those things, too.
Before we get into the process itself, I need to touch on a few key points to make sure we’re on the same page.
First, let’s talk about citation software. Every citation app is just a database with predefined fields that allows you to organize your own library, as well as a few integrations that allow you to get that data out of the software in a structured way (i.e. creating a bibliography in a document). So whether you’re using Zotero, Endnote, Mendeley, Paperpile (cloud based), Papers (now Read Cube), Bookends (Mac only), or BibTex, it’s doing more or less the same thing. I’m going to assume that you have some familiarity with one or more of these software packages, or at least understand the general purpose and benefits of these types of apps. For this workflow, I use Endnote on a Mac, though you can set up this process with some (not all) other apps on a PC or Linux.
Second, let’s talk about automation. The description below barely counts as automation, because the process I am showing you here involves very few steps and includes a manual touchpoint that can be eliminated. I’ll mention it once we get there. But for the idea behind this process to make sense, it helps to have the mindset of paying attention to the way that repetitive tasks can be organized in a way to take advantage of the features you already have in the software you use. So while I can try to get you started in this post, be aware that this is not the limit of what you can do, but just the beginning. In fact, this is only a fraction of the process I use for myself, because I also automate other parts of this process, too.
Academic Citation Workflow
The short version of this process is, we are going to set up a folder just of PDF documents we want to put into Endnote, tell Endnote to automatically import those documents, then also tell Endnote to pull out those documents for which Endnote was not able to automatically fill in citation details.
Let’s get started.
(1) Create Your PDF “Watch Folder”
The first step is to create a folder and call it something obvious like “Endnote Imports”. I prefer to keep this in my downloads folder rather than my desktop because I really can’t deal with a messy computer desktop, but you can put the folder anywhere you like, including in a shared folder on Dropbox or Google Drive if you’re working with a team. To make this work most efficiently, you may want to check your browser and make sure that your downloads are going to your downloads folder (or whichever folder your “Endnote Imports” folder is in), because that will save you a lot of time later. Do not set the “Endnote Imports” folder to be the destination for all downloads, or you’re going to create issues down the line.
(2) Tell Endnote to Watch your “Endnote Imports” Folder
In the “Preferences” pane in Endnote, go to the options for “PDF Handling” about halfway down the list. You will notice that the second of the two options is called “PDF Auto Import Folder.” Click on “Select Folder”, navigate to your “Endnote Imports” folder, be sure to click all the way inside that folder (don’t just highlight it), and select open. The pathname in Endnote should now read something like /Users/yourname/Downloads/Endnote Imports. With this set up, Endnote is now going to watch that folder, and if a PDF is added to the folder, Endnote will automatically import it. Click “Save” and close out of the preferences.
A quick note about Zotero since so many people use it. Zotero, as far as I can tell, does not have the ability to create a watched folder, which is just one of the many reasons I don’t use it. But I digress.
(3) Add Documents to Your “Endnote Imports” Folder
At this point, you should be downloading PDFs to the same folder in which the watched folder resides, which makes things really easy. For instance, I downloaded two sample PDF files, an academic article and IOM’s World Migration Report, but during a typical week, I download many more than just two articles. I have built into my week to go into my downloads folder every so often and move those documents that I want to add to my library into the Endnote Imports folder. This is one way of avoiding the issue where Endnote imports everything indiscriminately.
Once those documents are added to the Endnote Imports folder, Endnote will notice these documents and import them into your library. So easy! Endnote itself will also then create a folder called “Imported” inside of your watched folder and move those PDF documents there so that you know (and so it knows) what it has done. You may want to return to this file from time to time and empty it because it can become quite large over time.
Pro Tip: I also automate PDF downloads and organization using some simple macros, and I also automate the emptying of my "Imported" folder by setting a simple macro to run once each week on Friday afternoon.
But we’re not done yet! There’s one more important step.
(4) Identifying Documents with No Metadata
The value of most (good) citation software is that it is able to automatically update academic articles with information indexed online. That’s why, when we look at the two documents I imported above, we see that the academic article has all the information automatically filled in, which is great. But what about older academic articles, PDFs that have no machine-readable text in them, and non-academic reports that are not indexed? If you are only dumping all of these documents into Endnote and stopping there, you’re going to end up with a library with a bunch of unsearchable (and unusable) crud in it.
The way I deal with this is I create a special folder called a Smart Group that pulls in only those documents that need further attention. You’ll notice that Endnote was not able to add metadata to the IOM World Migration Report. I don’t know how other apps work, but when Endnote runs into this problem, it will add less-than and more-than brackets around the original file name. This is great because it gives us an easy way to identify these documents.
To create a Smart Group in Endnote for only those documents with no metadata attached, go to Groups > Create New Smart Group. I title this Smart Group “_Needs Metadata!”. Notice that I add an underscore to the beginning of the title. If you do this in any program or file menu, the underscore will force that file/document/group to the top of the list, which is where I want it because I know I need to come back to it.
Next, set the search criteria to “Title” and set the option to “Field Begins With”. Then type a less-than character into the text area. And that’s it. This will tell Endnote to automatically put any citation with a title that starts with a “<” into a Smart Group, and we know from above that this will catch those PDFs with no metadata.
Once that is set up, you can go back in when you have time and add metadata manually. Again, I tend to do this once a week on Friday afternoons. (You’ll notice that I chunk a bunch of more menial tasks for Friday afternoons.)
That’s All Folks
And that’s it! Once you set this up, you don’t have to manually import and organize new documents in your Endnote program or library. You now just have two manual touchpoints: making sure that you add your download documents to the Endnote imports folder every so often, and adding meta-data for documents that don’t have meta-data (which you would have to do anyway, but this just makes it easier to figure out which ones need your attention). I hope this was helpful and answers the few questions I received yesterday about this post.