La versione italiana di questo post è qui.
Ten years are a lot.
In January 2008 I had just turned 23. I just graduated in Math, and I had decided to completely change my field of study, embracing the world of digital libraries. I was about to leave for a master’s course that was held in Oslo, Tallinn and Parma, and I would start working a few years later. In these ten years I changed many things: I called “home” many places, moved many times: I broke up with my then fianceè, I started another beautiful relationship, I bought a house. I lost my faith, I became vegetarian (the two things are not related, maybe). Among other things, I have read a lot of books, and I have tracked them all on aNobii (it’s the original Goodreads).
My personal and professional life revolves around books, and I can’t think of myself without automatically thinking about the books that I read. It’s simply a matter of identity. Books are bricks: a literal and literary construction of the self, through the words of others.
I very much like the idea of tracking my discrete building of myself. It amuses me and I have greater patience and skill than your average reader in data curation and visualization with RAW.
So ten years ago I registered in Anobii, and since then I have always tracked all the books I read. And here, I’m analyzing them.
It works like this: the following charts are about the last ten years, meaning the books I read from January 2008 to December 2017. For these books, I have the end-of-reading date (if I finished them) or some chronological data.
I decided to put both the finished books and even those I did not finish: those that I left, those that I lost on the train, those that I consulted, those that “yes, I’ll finish you one day”, those I-would-but-I-will-not-touch-anymore (I always try to read cover to cover). The complete/incomplete binary division simplifies things, just like fiction/nonfiction (the latter assumes a long separate explanation, which I’ll save for a later post): fiction is novels and short stories, nonfiction is everything else. Poetry there is not.
When I speak of read books, therefore, I mean all those I had in my hands and somehow I read; when I specify finished I mean really finished.
The data is clean enough, but I do not exclude that there will be errors here and there, and in ten years I have not always been perfect in recording everything.
Here we go.
I read 467 books: I finished 332, and nonfinished 135, that is 28%. The average of the finished books is 33.2 books a year, which is almost three books per month (but obviously the distribution of reading is very different, as can be guessed from chart 3.1).
The vertical axis represents the number of pages in the book, the horizontal axis the end-of-reading date. It is not easy to find patterns visually, because it seems to me that in 3.1 the fiction/nonfiction balance is quite the same: the data on the whole ten years is 61% nonfiction, 29% is fiction. Perhaps it should be analyzed year by year.
Chart 3.2 is about the reading device: ebook vs paper. Paper wins, of course, with a whopping 87%. On the other hand, I only buy used and 2nd-hand books, and I use the ebook reader basically for English and nonfiction books.
From the 64 books I read digitally (see chart 4 on the side, in blue), four-on-five are in fact nonfiction.
Of these four, half are in Italian half are in English. I do not read English on paper, apart from some rare exceptions.
The temporal distribution of the 3.2 chart validates a behavior that I already noticed: if I read a ebook, I keep on reading ebooks. If I stop, for whatever reason, I can forget the ebook reader for months (among other things, I actually broke a pair of ereaders). But I know quite well my behavior with ebooks: big spikes during summer, then months of inactivity, then I start again. 2012 was the year of great falling in love, for example, 2013 much less.
I also analyzed publication dates. Mind you: these dates tend to be inaccurate, because they are the dates printed on the book, and this means they are the publication date of the precise edition of the books I’m reading. It’s not the publication date of the first edition of the same book. Unfortunately this is an everlasting problem of librarianship, and it is not easy to get the “original” dates.
Counting all this, it seems quite evident to me that there is a certain attention to books and authors of the past: in a whole year of reading (columns) there are books published in different decades, with a prevalence of books published after 2000 or 2010 (the first two rows).
This aspect can be understood even better by looking at the authors. I reconciled them with Wikidata: chart 5 is theGantt of the authors, with dates of birth and possible date of death (or nothing if I do not have the data). We notice an identical distribution of dead and living authors (52% and 48%, to be precise), but more complete data are needed, and the authors are not all present in Wikidata. As I read only books, it makes sense that my horizon is more moved to the past (what in publishing is often called “the back catalog”) rather than to the latest publications.
Exploring this further (back catalog/latest publications), just looking at the last decade (i.e. the books released after 2008) I see that I do not often read a book in the same year that it comes out: it happens 3–4 times a year, just over 1 in 10 books. I think it’s because I wait for a book to come out of the shelf bookshops and end up in stalls and 2nd hand bookstores, and as of today a coupleof years are enough, maybe even less. It is not so much that I do not read novelties, therefore, as I read them later.
Most frequent publishers are, in order: Adelphi, Einaudi, Mondadori, Codice, Franco Maria Ricci.
The first 7 publishers are equivalent, in number of books, to the last 140: an almost perfect definition of power law. In fact, Adelphi crush them all with 112 books, followed by Einaudi (37), Mondadori (33), Codice (14), Franco Maria Ricci (13).
Adelphi does not surprise me too much (even if it is three times bigger than the second publisher), and not even Einaudi and Mondadori, since they are huge publishers with an endless and beautiful catalog. Codice is a publishing house that I love, especially for their nonfiction. Franco Maria Ricci is the publisher of a marvelous series directed by Jorge Luis Borges, which is the only bibliophile collection I make.
But I do not know how to read the data in terms of total number and above all of distribution. Is it normal to read so few publishers? On the one hand there is an extraordinary preponderance of very few publishers (in fact, Adelphi above all, which is actually a quarter of the books I have read). On the other hand, 139 smaller editors are really many. The real point (and recurrent in this exploration) is that such an analysis should be compared with those of other readers, to know the habits of similar and non-similar readers. Not having an average reader as a benchmark does not help to understand if this type of reading so characterized is “normal” or not.
Adelphi deserves a brief individual analysis. It’s probably the most beautiful, even if not the most important, publisher in Europe. Their catalog is so astounding that artist eva k. barbarossa decided to dedicate few years of her life to read all their catalog. I’m less ambitious and committed, but I’ve been reading their books all my life, and I always will.
There are 112 Adelphi books, for a total of 26091 pages. It is definitely the publishing house that I know best, and that I enjoy exploring: I know their authors, their themes, their quirks, the intertwingled network of internal references and connections. Adelphi is my fetish, and I buy them as much as I can (I own at least thrice as many Adelphi as I read).
I also tracked editorial series by Adelphi. The most present in my library are the Biblioteca Adelphi and the Piccola Biblioteca Adelphi, with 39 and 36 books respectively. Below, Fabula (only fiction), Gli Adelphi (the paperbacks, made exclusively of reprints), up to more specific things like the Saggi (essays), the Narrativa contemporanea (fiction) until you get to the beautiful Adelphiana and La collana dei casi, among my absolute favorites.
I have read 363 authors, out of 467 books in total. If we unpack multiple authors (anthologies, collections) we arrive at 569.
The author I have read most in the last ten years is Roberto Calasso, with seven books. Following, we have David Foster Wallace, Guido Ceronetti (with his translations from the Bible), Oliver Sacks, Roberto Bolaño and George R. R. Martin.
In terms of number of pages, the winner is withut doubt his majesty George R.R. Martin, with the endless (in every possible meaning) A Song of Ice and Fire, which I read all in 2012. It was a beautiful summer in Bologna, and I always went for lunch to Gelatauro. The “lunch” consisted of a Sicilian focaccia filled with gelato (flavours were strictly 1. chocolate with orange, 2. Sicilian cannoli, 3. a heavenly cream of pistachio and almond), and sat there, reading for forty-wonderful-minutes dripping on the Kindle. In those months, I often thought that the twenty-years-ago-Andrea-kid would have been very proud of the nerdy adult he was to become.
Anyway, back from memory lane. Some considerations regarding the higher rankings:
- It’s quite strange to me to see Roberto Calasso in the very first place: I love many of his minor works, but I never got to the end of some of the major ones (L’impuro folle, or Folie Baudelaire). But he publishes quite often, I have almost all of his books, so there he is. He’s the mind behind Adelphi, after all.
- David Foster Wallace is, perhaps in a slightly stereotypical way, one of my favorite authors: every year, for a while, I read something about him (I have practically finished all his essays and nonfiction, which are then the only ones things that interest me, plus the biography of DT Max, and Lipsky’s book). I love him, and I miss him a lot. I started reading him only after the death of Aaron Swartz (of whom he was, ça va sans dire, the favorite author) and I decided not to be ashamed of the cliche and I put myself in line.
- Oliver Sacks: slowly, without haste, I just want to read everything from Sacks, which is one of those few authors for the life, that can be read and re-read, forever. There’s all the time in the world.
- Roberto Bolaño. You never talk about Bolaño, Bolaño is to be read, Bolaño is to be cried. When Bolaño writes, sometimes, you dream of heaven as Mexico at sunset, an endless sunsets that goes into the night, and than sunrise arrives and then night again and then sunsets, a single sunrise-sunset, a dual ouroborian night, cyclically, forever, and you dream of find him there, good ol’ Bolaño, drinking mezcal smoking in the desert, the wind entagled in the hair, the gaze of someone who saw it all, and you dream to run and fight with him, fight with him the whole night, like Jakob fought with the angel, and in the last sunrise, with his head in your hands, you dream of crying ‘Why, it’s not right, please come back to me’).
- Guido Ceronetti should be read more calmly, since I too want to become an angry misanthrope who rails against modernity and translates Qohelet for fun.
This is very painful: there are only 21 women out of 363 total authors, just over 5%. I get a perfect 10% if I also include women within multiple authors, but the disproportion is evident. The only excuse is that 4 of these authors are also among those I read most (Simone Weil, Hannah Arendt, Cristina Campo, Licia Troisi), but the rest is a plethora of men.
Do I unconsciously prefer men to women? Are book authors disproportionately men? Are the topics that interest me the undisputed domain of males? Almost certainly, an intersection of all this. Unfortunately I do not have a benchmark of the publishing industry (even just year by year), just to see what we are talking about. I know it’s a low figure, but I do not know how much low it is, if above or below the average of other readers. Cultural offer is certainly part of the problem.
I am quite boring also regarding nationality of authors: “the West” is overwhelming winning with Italy and the United States. Then a bit of UK, France, “other” Europe, and the rest is statistical noise.
I have no data for sexual orientation, nor for the color of the skin, but even here I can almost name you, one by one, the authors of whom I know both. Black authors, as far as I know, are just a couple: Ta-nehisi Coates, Malcom Gladwell (now reading James Baldwin). So: mostly white, mostly male, mostly European, I’m a bit ashamed of myself.
With the page numbers you can do a more quantitative analysis: on 467 books read, the page average is 271.38, while the median (that is, the value that divides the distribution in half) is 221. Standard deviation 187.25. In fact, it is almost a classic bell-shaped curve, but much skewed to the left: a small but not insignificant portion of books with over 500 pages, up to leviathans still further away.
To make a comparison, we can see a distribution taken from library data: for example all the books lent in a month by libraries in Rome. The distribution looks quite similar:
Even the time distribution tells us that, in fact, almost every year I read books over 600 pages, and every year over 500. It was something that I expected: I always want to read big, thick book, maybe in the summer, when there’s plenty of time on the beach (swimming is for losers). But, as expected, the bulk of the books is between 100 and 250 pages, which is a much more canonical dimension.
This is an analysis I’ve made in my free time, without proper methodology, and has no scientific value: it’s perhaps more of a therapeutic exploration of the self than anything else. The data I have gathered is incomplete, and I had to make a lot of choices and approximations. I started this work with a list of all the books I’ve ever read, but the data was even more dirty and incomplete, so I turned to a more coherent and complete sample. Reconciling external data (for example from Wikidata) is helpful but adds a lot of complexity.
According to this analysis, the easy conclusion is: if I pick a random book of my shelf, I have one chance in two to get a book by a male author, certainly white, either American or Italian, born between 1900 and 1980. Of this subset, two out of three are nonfiction.
Does it make sense to lose all this time to reach such a trivial conclusion? I don’t know yet, but I did it anyway.
I am convinced that it would be very nice to have similar analysis made by publishers, booksellers, libraries: when we talk about reading statistics, we always refer to govenrment agencies that tell us of a population that reads less and less … without knowing what they read, when they read it, how, and why.
Every reader is a library, in the sense that every reader is unique in her set of books, in the order in which she reads them. Every reader literally has a “book DNA”, a book-fingerprint that makes her unique. If we had these fingerprints (maybe over time) it would be very easy to compare them with one another. And perhaps to better understand something about how people read, the reasons why they do it, what is missing in our cultural offer.
Tracking this data allows us to understand some habits, but also to picture a reality in detail: this picture, in the future, may be a reference to understand the success of reading campaigns, for example, or the impact of school libraries, or the impact of smartphones on people’s cultural consumption. If we do not have detailed data, we do not know where we are now, and we will not know where we will be in the future.
How I did this
As already mentioned, I have always used Anobii to keep track of the books.
- I then exported the list in a CSV file and I started to add missing data (fiction/nonfiction, for example, I had to put by hand)
- I used OpenRefine to clean the data, and to reconcile them with Wikidata, thus obtaining from a simple author also its date of birth, death, sex, nationality. That is, to understand: through the name of the author you can automatically search on Wikidata, which will suggest one or more answers. When all the suggestions have been confirmed, it is possible to ask Wikidata to import some data into our csv (such as dates of birth, sex, nationality). The more Wikidata will grow, the better we will be able to play these games
- I used Google Docs to make the simplest charts, and RAW for the most complex ones (it’s based on D3.js). It’s a great visualization tool and I encourage you to try it.
Anyone who wants to play with my own data can look here.
This was a lot of fun but also quite time consuming. I appreciate claps and sharing as a reward, thanks.