Ten years are a lot.

In January 2008 I had just turned 23. I just graduated in Math, and I had decided to completely change my field of study, embracing the world of digital libraries. I was about to leave for a master’s course that was held in Oslo, Tallinn and Parma, and I would start working a few years later. In these ten years I changed many things: I called “home” many places, moved many times: I broke up with my then fianceè, I started another beautiful relationship, I bought a house. I lost my faith, I became vegetarian (the two things are not related, maybe). Among other things, I have read a lot of books, and I have tracked them all on aNobii (it’s the original Goodreads).

My personal and professional life revolves around books, and I can’t think of myself without automatically thinking about the books that I read. It’s simply a matter of identity. Books are bricks: a literal and literary construction of the self, through the words of others.
I very much like the idea of ​​tracking my discrete building of myself. It amuses me and I have greater patience and skill than your average reader in data curation and visualization with RAW.

So ten years ago I registered in Anobii, and since then I have always tracked all the books I read. And here, I’m analyzing them.


Here we go.

1. Blue finished, yellow not finished. The size of the balls is a function of the number of pages.
2. Red ones are finished, blue ones started (this second number is incomplete).

I read 467 books: I finished 332, and nonfinished 135, that is 28%. The average of the finished books is 33.2 books a year, which is almost three books per month (but obviously the distribution of reading is very different, as can be guessed from chart 3.1).

3.1 Reds are fiction, blue nonfiction.
3.2. Green paper, purple ebook.

The vertical axis represents the number of pages in the book, the horizontal axis the end-of-reading date. It is not easy to find patterns visually, because it seems to me that in 3.1 the fiction/nonfiction balance is quite the same: the data on the whole ten years is 61% nonfiction, 29% is fiction. Perhaps it should be analyzed year by year.


4. Blue ebook, red paper.

From the 64 books I read digitally (see chart 4 on the side, in blue), four-on-five are in fact nonfiction.

Of these four, half are in Italian half are in English. I do not read English on paper, apart from some rare exceptions.
The temporal distribution of the 3.2 chart validates a behavior that I already noticed: if I read a ebook, I keep on reading ebooks. If I stop, for whatever reason, I can forget the ebook reader for months (among other things, I actually broke a pair of ereaders). But I know quite well my behavior with ebooks: big spikes during summer, then months of inactivity, then I start again. 2012 was the year of great falling in love, for example, 2013 much less.

Publication dates

Counting all this, it seems quite evident to me that there is a certain attention to books and authors of the past: in a whole year of reading (columns) there are books published in different decades, with a prevalence of books published after 2000 or 2010 (the first two rows).

5. Red are fiction, blue nonfiction.

This aspect can be understood even better by looking at the authors. I reconciled them with Wikidata: chart 5 is theGantt of the authors, with dates of birth and possible date of death (or nothing if I do not have the data). We notice an identical distribution of dead and living authors (52% and 48%, to be precise), but more complete data are needed, and the authors are not all present in Wikidata. As I read only books, it makes sense that my horizon is more moved to the past (what in publishing is often called “the back catalog”) rather than to the latest publications.

6. Authors. Le donne in blu, in rosso gli uomini. Dots are living authors, that is the date of birth. Empty lines are authors for which I did not find dates. Here you can find a bigger chart..
7. On the X axis you can find end-of-reading dates, on the Y axis date of publication.

Exploring this further (back catalog/latest publications), just looking at the last decade (i.e. the books released after 2008) I see that I do not often read a book in the same year that it comes out: it happens 3–4 times a year, just over 1 in 10 books. I think it’s because I wait for a book to come out of the shelf bookshops and end up in stalls and 2nd hand bookstores, and as of today a coupleof years are enough, maybe even less. It is not so much that I do not read novelties, therefore, as I read them later.


8. Publishers.

The first 7 publishers are equivalent, in number of books, to the last 140: an almost perfect definition of power law. In fact, Adelphi crush them all with 112 books, followed by Einaudi (37), Mondadori (33), Codice (14), Franco Maria Ricci (13).

Adelphi does not surprise me too much (even if it is three times bigger than the second publisher), and not even Einaudi and Mondadori, since they are huge publishers with an endless and beautiful catalog. Codice is a publishing house that I love, especially for their nonfiction. Franco Maria Ricci is the publisher of a marvelous series directed by Jorge Luis Borges, which is the only bibliophile collection I make.

9. There are 147 publishers.

But I do not know how to read the data in terms of total number and above all of distribution. Is it normal to read so few publishers? On the one hand there is an extraordinary preponderance of very few publishers (in fact, Adelphi above all, which is actually a quarter of the books I have read). On the other hand, 139 smaller editors are really many. The real point (and recurrent in this exploration) is that such an analysis should be compared with those of other readers, to know the habits of similar and non-similar readers. Not having an average reader as a benchmark does not help to understand if this type of reading so characterized is “normal” or not.


There are 112 Adelphi books, for a total of 26091 pages. It is definitely the publishing house that I know best, and that I enjoy exploring: I know their authors, their themes, their quirks, the intertwingled network of internal references and connections. Adelphi is my fetish, and I buy them as much as I can (I own at least thrice as many Adelphi as I read).

10. Adelphi series. Barchart.

I also tracked editorial series by Adelphi. The most present in my library are the Biblioteca Adelphi and the Piccola Biblioteca Adelphi, with 39 and 36 books respectively. Below, Fabula (only fiction), Gli Adelphi (the paperbacks, made exclusively of reprints), up to more specific things like the Saggi (essays), the Narrativa contemporanea (fiction) until you get to the beautiful Adelphiana and La collana dei casi, among my absolute favorites.

11. Adelphi series. Treechart.


The author I have read most in the last ten years is Roberto Calasso, with seven books. Following, we have David Foster Wallace, Guido Ceronetti (with his translations from the Bible), Oliver Sacks, Roberto Bolaño and George R. R. Martin.

12. Authors. There are many more, there is a very long tails of authors with just 1 book.

In terms of number of pages, the winner is withut doubt his majesty George R.R. Martin, with the endless (in every possible meaning) A Song of Ice and Fire, which I read all in 2012. It was a beautiful summer in Bologna, and I always went for lunch to Gelatauro. The “lunch” consisted of a Sicilian focaccia filled with gelato (flavours were strictly 1. chocolate with orange, 2. Sicilian cannoli, 3. a heavenly cream of pistachio and almond), and sat there, reading for forty-wonderful-minutes dripping on the Kindle. In those months, I often thought that the twenty-years-ago-Andrea-kid would have been very proud of the nerdy adult he was to become.

Anyway, back from memory lane. Some considerations regarding the higher rankings:

  • It’s quite strange to me to see Roberto Calasso in the very first place: I love many of his minor works, but I never got to the end of some of the major ones (L’impuro folle, or Folie Baudelaire). But he publishes quite often, I have almost all of his books, so there he is. He’s the mind behind Adelphi, after all.
  • David Foster Wallace is, perhaps in a slightly stereotypical way, one of my favorite authors: every year, for a while, I read something about him (I have practically finished all his essays and nonfiction, which are then the only ones things that interest me, plus the biography of DT Max, and Lipsky’s book). I love him, and I miss him a lot. I started reading him only after the death of Aaron Swartz (of whom he was, ça va sans dire, the favorite author) and I decided not to be ashamed of the cliche and I put myself in line.
  • Oliver Sacks: slowly, without haste, I just want to read everything from Sacks, which is one of those few authors for the life, that can be read and re-read, forever. There’s all the time in the world.
  • Roberto Bolaño. You never talk about Bolaño, Bolaño is to be read, Bolaño is to be cried. When Bolaño writes, sometimes, you dream of heaven as Mexico at sunset, an endless sunsets that goes into the night, and than sunrise arrives and then night again and then sunsets, a single sunrise-sunset, a dual ouroborian night, cyclically, forever, and you dream of find him there, good ol’ Bolaño, drinking mezcal smoking in the desert, the wind entagled in the hair, the gaze of someone who saw it all, and you dream to run and fight with him, fight with him the whole night, like Jakob fought with the angel, and in the last sunrise, with his head in your hands, you dream of crying ‘Why, it’s not right, please come back to me’).
  • Guido Ceronetti should be read more calmly, since I too want to become an angry misanthrope who rails against modernity and translates Qohelet for fun.


Do I unconsciously prefer men to women? Are book authors disproportionately men? Are the topics that interest me the undisputed domain of males? Almost certainly, an intersection of all this. Unfortunately I do not have a benchmark of the publishing industry (even just year by year), just to see what we are talking about. I know it’s a low figure, but I do not know how much low it is, if above or below the average of other readers. Cultural offer is certainly part of the problem.


13. There are 110 missing authors in this chart, for whom I did not find any data.

I have no data for sexual orientation, nor for the color of the skin, but even here I can almost name you, one by one, the authors of whom I know both. Black authors, as far as I know, are just a couple: Ta-nehisi Coates, Malcom Gladwell (now reading James Baldwin). So: mostly white, mostly male, mostly European, I’m a bit ashamed of myself.


14. Infinite Jest wins, It is 2nd, A storm of swords is 3rd.

To make a comparison, we can see a distribution taken from library data: for example all the books lent in a month by libraries in Rome. The distribution looks quite similar:

16. Books by number of pages, lent by libraries in Rom in June 2017. The first peak (20–30 pages) are children’s books.

Even the time distribution tells us that, in fact, almost every year I read books over 600 pages, and every year over 500. It was something that I expected: I always want to read big, thick book, maybe in the summer, when there’s plenty of time on the beach (swimming is for losers). But, as expected, the bulk of the books is between 100 and 250 pages, which is a much more canonical dimension.

16. Distribution. End-of-reading on th X andnumber-of-pages on the Y.


According to this analysis, the easy conclusion is: if I pick a random book of my shelf, I have one chance in two to get a book by a male author, certainly white, either American or Italian, born between 1900 and 1980. Of this subset, two out of three are nonfiction.

Does it make sense to lose all this time to reach such a trivial conclusion? I don’t know yet, but I did it anyway.

I am convinced that it would be very nice to have similar analysis made by publishers, booksellers, libraries: when we talk about reading statistics, we always refer to govenrment agencies that tell us of a population that reads less and less … without knowing what they read, when they read it, how, and why.

Every reader is a library, in the sense that every reader is unique in her set of books, in the order in which she reads them. Every reader literally has a “book DNA”, a book-fingerprint that makes her unique. If we had these fingerprints (maybe over time) it would be very easy to compare them with one another. And perhaps to better understand something about how people read, the reasons why they do it, what is missing in our cultural offer.

Tracking this data allows us to understand some habits, but also to picture a reality in detail: this picture, in the future, may be a reference to understand the success of reading campaigns, for example, or the impact of school libraries, or the impact of smartphones on people’s cultural consumption. If we do not have detailed data, we do not know where we are now, and we will not know where we will be in the future.

How I did this

  • I then exported the list in a CSV file and I started to add missing data (fiction/nonfiction, for example, I had to put by hand)
  • I used OpenRefine to clean the data, and to reconcile them with Wikidata, thus obtaining from a simple author also its date of birth, death, sex, nationality. That is, to understand: through the name of the author you can automatically search on Wikidata, which will suggest one or more answers. When all the suggestions have been confirmed, it is possible to ask Wikidata to import some data into our csv (such as dates of birth, sex, nationality). The more Wikidata will grow, the better we will be able to play these games
  • I used Google Docs to make the simplest charts, and RAW for the most complex ones (it’s based on D3.js). It’s a great visualization tool and I encourage you to try it.

Anyone who wants to play with my own data can look here.

This was a lot of fun but also quite time consuming. I appreciate claps and sharing as a reward, thanks.

