Codemotion Rome 2020: a deep dive into talks and topics

A deep dive into talks and topics that will be covered at Codemotion Rome 2020, analysed as a good data analyst would do!

5 min readFeb 25, 2020

Table Of Contents

Codemotion Rome 2020: what data science can tell us about it
Deep dive into the data: Communities and Topics
Graph Data Visualization
Codemotion Rome 2020: want to know more?

Codemotion Rome 2020: what data science can tell us about it

The Call for Papers has been closed and 646 proposals have been submitted to Codemotion Rome 2020. We’re still waiting for the confirmed talks and all tracks, but first speakers have been already announced on the official page, so do not forget to check that out to be up to date with the event, and book your ticket if you haven’t yet.

While waiting for the final agenda I thought it may be interesting to look into the submissions with a lens to discover the small world of the authors: the first article on the topic has been published about the community world and how the submission related to them: what are the most proficient communities in terms of submissions? Is possible to detect any cluster? If you are curious about these question have a look to the previous post of the series here <link to the first post> .

Deep dive into the data: Communities and Topics

In the first post of the series I analysed only a portion of the data imported into Neo4J, in particular the data about Papers (what we also call “submission”) and Communities. But the ingested data contain more information, such as Topic information or Companies. In this second post of the series, I’m going to explore the relation between the topics set for the conference and the submission communities to see if it’s possible to see a pattern here.

Let’s start by investigating the distribution of submission by topic: what are the top 10 topics in terms of submissions?

// Topic -> Submissions
MATCH (topic:Topic)
WITH topic, size((topic)<-[:IS_ABOUT]-(:Paper)) as degree
ORDER BY degree DESC
RETURN topic.name, degree
LIMIT 10

Which retrieves the following results:

“Software Architectures” Looks like the most popular topic by authors, with 97 submissions, followed by a close “Inspirational” topic with 86 submissions, both of them pretty common indeed.

At the third position both “Front-end Dev” and “Cloud” have 74 submissions each, which are quite wide topics as well, probably a bit less than the first two.

Interesting to note that “Mobile” and “IoT” are topics of the conference, also quite hot in2019, but did not rank in this top 10.

After this quantitative analysis, let’s move on to a qualitative analysis.

As mentioned in the previous post, a speaker declares all the communities he belongs to, so it is possible to see some “strange” (interesting? particolar?) associations, but this factor should be reduced by the quantity of submissions involved: I’d expect that a frontend community should be “closer” to frontend topics in the graph because more people from these communities should have submitted talks on the subject.

Is this polite guess true? Can the data confirm this?

As reference of the mode adopted, this is the schema visualization:

Taking this in mind, the query to extract the information I’m interested in is expressed by this Cypher query:

// Communities -> Topics
MATCH (community)<-[:BELONGS_TO]-()-[:PRESENTED]->()-[:IS_ABOUT]->(topic:Topic)
WHERE NOT community.name = ""
WITH community, topic, apoc.create.vRelationship(topic, "RELATED", {}, community) AS r1
RETURN community, topic, r1

Also in this query I’m taking advantage of APOC virtual relationship utility, to infer a direct relationship between a community and a topic if there’s a path between the two nodes in the graph. This type of presentation provides a low virtual barrier to quickly filter out interesting pattern in the visualization, compared to the full data representation from the physical model.

Graph Data Visualization

It’s worth noting that data displayed on the browser as a graph are governed by a layout algorithm — the algorithm that decides where to position the nodes on the screen — that is based on forces between each node. This is a sort of small scale physics simulation that provides a nice positioning of each node that our brain can easily navigate.

One classic properties that usually this algorithm permits to emphasise are “clusters”, given the similarity proximity of the nodes based on the graph topology: simply put, two nodes that are connected to the same third node should probably stay closer than others which have nothing in common.

Therefore from this assumption from the algorithm, we can quickly navigate the visualization looking for topics that we imagine should be closer: for instance “Design/UX”, “Mobile” and “Frontend” are in the same area as we expect.

What are violation of this (guessed) rule?

Probably the most surprising one is having the two topics “IoT” and “Games” so close: it is probably expected from experts in the topic that these two subjects get really close, but I was surprised to see “Cybersecurity” instead so far from it, for instance.

// Communities (filtered) -> Argomenti
MATCH (community)<-[:BELONGS_TO]-()-[:PRESENTED]->()-[:IS_ABOUT]->(topic:Topic)
WHERE NOT community.name = "" AND topic.name IN ['IoT', 'Game Dev', 'AI/Machine Learning', 'Cloud', 'Cybersecurity']
WITH community, topic, apoc.create.vRelationship(topic, "RELATED", {}, community) AS r1
RETURN community, topic, r1

Filtering out only those 4 communities it is possible to note a particular distribution of communities in the visualization: some topics like “Cybersecurity” or “AI/Machine Learning” have a tight cluster of communities that are focused mainly on that topic.

For instance it is expected to find out both “OWASP” and “ISACA” communities connected to the “Cybersecurity” topic as they are specific cybersecurity oriented communities, same happens for the Tensorflow or Machine Learning Meetup connected to the “AI/Machine Learning” topic.

The “Cloud” topic cluster seems the most popular amongst the selection: here several communities, most in topic, others not directly connected with the topic seem to have submitted a talk.

But what about the communities in the center? Those highlighted in the picture: a central cluster of communities that pushed submissions on a multitude of these selected topics.

It would be surprising to find “vertical” communities here, intended in this context as communities focused only to a single topic,: in fact these communities are mostly either technology oriented (for instance Java or .NET focused) or even wider (for instance GraphRM).

Codemotion Rome 2020: want to know more?

Do not lose the next post from the series where I’m going to analyse the data from the “company” angle.

In the meanwhile, if you are interested in Codemotion Rome 2020, do not miss the opportunity to attend: tickets are still available!

You can read the orginal version of this article at Codemotion.com, where you will find more contents. https://www.codemotion.com/magazine/dev-hub/machine-learning-dev/codemotion-rome-2020-topics-analysis/