This forum is in permanent archive mode. Our new active community can be found here.

Convention Data Mining

All of this ConnectiCon panel data sharing by Rym is inspiring me to stop shit talking.

I have a complete SQL dump of the PAX Tabletop library, going back several years. I grabbed it in hopes of doing some crazy data science experiments, maybe even as ambitious as crawling BoardGameGeek to pull down meta data for each title in the PAX library such as rating, length, weight, mechanics, etc.

What sort of questions do people have that could be answered with such a dataset? Give me something to work for! Off the bat, I'm curious to see how a few things trend from PAX to PAX: number of checkouts, checkouts per person, etc. If I can pull that off w/o falling back into shit-talking, I want to run some metrics that show how the tastes of PAX attendees have changed over time, and how they differ between Prime/East.

I'm not going to attempt comparing popularity of specific games, b/c it is relative to the composition of the library. I can't remember how fine the data is there. I know it shows all games ever in the library, and whether they still exist or not, but I don't remember if I have the transaction data to see when they were added and when they were culled, so I couldn't tell you how many copies of what were present at a given PAX.


  • Hmm.. Total concurrent checkouts by hour. That would be a nice graph of demand over the days/times.

    What games were "completely" checked out at any point during the con, and how many times this occurred. (Skipping games where there was only one actual copy). Obviously you couldn't do that with the data you have, but it would be interesting.

    Average length of the checkout broken down by game. (That would be something that would get attention).
  • Actually, average length of checkout over the years irrespective of game would be interesting. Are people favoring shorter games over time?
  • Does the data include number of players? Can you extrapolate game play for games based on player quantity? Are certain games more popular based on time of day?

    What are the data fields?
  • edited July 2014
    They just let you take the data? If I had thought of that, I would have asked. Here are just a few interesting things we can find out with that data.

    Raw Popularity
    Most checked out. Least checked out. Percentage of the convention spent checked out. Compare this over each PAX to watch popularity change over time. Also, not just individual games, but categories of games. For example: Worker placement went down/up in popularity the past year. Maybe themes as well. Aggregate data of all zombie games.

    True Playtimes
    How long were games actually checked out for. What was the true playtime? Which games were checked out, but returned quickly and not actually played (overrated)? Which games were checked out rarely, but played for a long time when they were (underrated)?

    Player Profiles
    People who checked out game X also checked out game Y. What percent of people were hardcore, medium core, and casual? What percent of people spent all day in tabletop checking out tons of games, and what percent only passed through?

    Convention Traffic
    When was the busiest time for tabletop? (We already know its after Expo Hall closes). When is the quiet time? Do games change in popularity based on time of day and mood?

    Differences of PAXes
    Is there a difference between Prime, East, and AUS? Do tastes differ by geography? What about different years of PAX? You can probably detect things like games getting replaced. Did Apples2Apples die when CAH came out? Were there one-hit wonder games that were big at one PAX and then disappeared?

    Purchasing Decisions
    Which games does the library not have enough copies of? Which does it have too many copies of? If you were going to stock a game store, what should inventory be?
    Post edited by Apreche on
  • edited July 2014
    HMTKSteve said:

    Does the data include number of players? Can you extrapolate game play for games based on player quantity? Are certain games more popular based on time of day?

    What are the data fields?

    No, there's no way for the library to know how many people actually played in a game when it was checked out, but with BGG data, we can assign a player count and estimated playtime for each title, then compare it to how long it was actually checked out. Seeing which ones come back way early would be very cool to see. I wonder if this will show us mostly bad games, or good games like Puerto Rico and Agricola where a casual player's eyes glaze over upon seeing the rule book.

    This is all great stuff! Gives me some goals to work towards. The idea here is for me to learn some new skills, or dust off and improve old ones.

    I've already learned a bunch of stuff. I hadn't flexed some of these mental muscles in over 5 years, so it's already working. I did way more work than necessary just to get a crack at the data (I set up a Raspberry Pi with Debian Linux, a web server, PHP, and Postgres). Managed to screw it up just enough along the way that having to roll up my sleeves and fix the configuration taught me a bit. I plan to use other dusty skills to pull the queries together, do the visualizations, etc.
    Post edited by Matt on
  • Can somebody link Rym's dataset?
  • Unfortunately, no AUS. Prime 2010 -> East 2014.
  • Popularity of groups by theme might be interesting. Are more people checking out space games than train games?
Sign In or Register to comment.