A few of us from Maverick learnt more about data journalism in the form of a five-week course provided by the Journalism and Media Studies Centre of the University of Hong Kong.

At the beginning of the course – and at least three times more during subsequent sessions – the instructors told us that if we ask 100 different people about the definition of data journalism, then we’ll get 100 different answers.

Despite such an ominous introduction highlighting the lack of clarity among lay people on what constitutes data journalism, the course did a great job in getting us beginners to start learning about the topic. Further examples on the topic gave us a clearer understanding that data journalism is heavily based on what is probably the most dreaded subject during our school years: maths.

Simply put, data journalism is all about processing numbers and finding meaning in them to produce the best lead possible for a news story. The process starts with finding the data, cleaning it, and getting it to a state where it can provide comprehensible information for the basis of a story.

The difficult part of this process would be that analyzing the data often involves staring at spreadsheets that contain hundreds or thousands of entries, and then finding patterns in the data to weed out any anomalies or significant disruptions – imagine playing “Where’s Waldo?” in a picture the size of a football field without knowing for sure if the picture even contains any Waldo.

blogpost1

However, before we scare you off with more details, do note that data journalism is basically an extension of statistics – and statistics is ubiquitous in our daily life, like in the weather report, used at the stock exchange, and even the chicken meat that we use to cook Opor Ayam to celebrate Eid is priced based on the statistics of supply and demand.

Data journalism uses available data and points out information that would appeal to the masses. It shares the same spirit that KawalPemilu.org was based on that enabled us to laugh off some of the reports from media and survey centers that attempted to distort the outcome of the 2014 Presidential Election. The Panama Papers is another example that best describes how significant data journalism can be.

Now that we can all see that data journalism is important in this era of data overload, let’s take a look at what we’ve learned about how to create a story based on data.

Data gathering and data processing

The first step in writing your story is to gather and process the data. You can search for data sets that have been compiled by government (such as from the BPS or Jakarta Administration), international organization (UN or OECD), research center/think tank (World Resources Institute), or crowdsourced data (WikiDPR.org).

We then need to make sure that the data is reliable, and that means that the data should be free from tinkering. Now, you might think that numbers don’t lie, but as Mark Twain once said: “There are three kinds of lies: lies, damned lies, and statistics.”

Manually-collected and analyzed data are inherently prone to bias. One example is the Chinese Government altering data to suit its objective, and so blatantly done that the sum of its provincial GDP did not add up to its national GDP. (read more: http://goo.gl/QP6e8k)

After finding your suitable data, you can start processing/analyzing your data. The course recommended several tools for data processing, such as using pivot table in Microsoft Excel to quickly sort your data, or using online tools (pdftables.com or import.io) to convert the data from PDF file into an Excel file. OpenCalais is an interesting tool as it allows you to analyze a text to get richer information about a country, person, or institution mentioned in it, how many times they are mentioned, and their relevance to the article.

blogpost2

Once the data has been cleaned up and ready to be processed, you can look for ideas about your story from the data. Try to look for any outlier, pattern, and/or trend that can be used as the starting point of your story. But be careful of false causation/correlation as it could lead you to the wrong conclusion.

Data visualization

Data visualization is the use of graphics to support our messages or stories. You can choose to use a simple chart or a sophisticated interactive map (such as TheRefugeeProject.org) to help you tell your story. Data visualization is also used to attract the reader’s attention and help them understand the data better.

blogpost3

At some point in your life, you might have used some form of charts, but do understand that different data requires different visualization – a single chart can’t accommodate all types of data.

So how do we choose the right chart? The following are helpful guidelines that we learnt.

blogpost4

  • Time series (temporality) requires charts that display the timeline and the dynamic that happens between it. Example: index chart, stacked graph, small multiple, horizon graph.
  • Statistics on distributions can be explained with the more conventional types of chart. Example: stem-and-leaf plot, q-q plot, Scatter Plot Matrix (SPLOM), parallel coordinates
  • Maps (geography, spatial representation) will of course need portrayal of the geolocation or space. Example: flow map, choropleth map, graduated symbol map, cartogram.
  • Statistics on hierarchies will require lines that can display the connection of authority between subjects. Example: node-link diagram, dendrogram, adjacency diagram (sunburst, icicle), enclosure diagram (squarified treemap, circle-packing).
  • Networks (interconnections) are displayed similarly with hierarchal charts, but the subjects are placed in a rather level position or group. Example: force-directed layout, arc diagram, matrix views.

(Source: http://farm4.static.flickr.com/3077/3196386402_01d8d12017_b.jpg)

Transforming data into a newsworthy story

One ultimate advice for when we are making a story out of the collected data is: Don’t be boring. We have to make sure that readers will read the article till the end, so do not bore them by explaining every minute detail of your data. After all, you should have already ensured that the data speaks for itself through visuals.

So what else can we do?

Having one set of data does not mean we don’t have to look for more sources. Try to find examples from the data that illustrate the problem and explore them. If you want to make a story based on literacy rate data, for example, try to meet sources from advocacy groups and educators and insert their stories into your article. Enrich your data with relevant qualitative information.

Another way suggested to liven up your data is by adding humor, anecdotes, or analogies. Also, always try to make your writing understandable for the general audience. As one of the course’s instructors said, “a key principle of storytelling with data is to keep things clear and simple, but also offer depth for those who want to dig into the numbers.”

Relevancy between data journalism and communications

How does data journalism then relate to what we do at Maverick?

We don’t mean to brag – well, maybe a bit – but the fact that Maverick is equipped with a media monitoring division can be a huge advantage because they can act as one of the instruments to provide the necessary data.

It all comes back to the big data phenomenon that has gripped the world. The use of data analysis is becoming more common in decision-making. Even though unsubstantiated claims, like “the fastest”, “the most powerful”, “most eco-friendly”, are still widely used, people have started to look for more quantifiable explanations – and numbers can be a very powerful tool to achieve that.

We are starting to see more numbers-based claims, like “this phone scored 130,000 on Antutu”, “it can go from 0 to 100mph in less than 10 seconds”, “this investment have a return average of more than 10% per year”, and they increasingly become the main selling point instead of a footnote in the specs sheet.

It’s the same thing with how Google is using enormous amounts of data to develop their targeted-ads system, how big sport clubs are using number measurements to determine a player’s capability; it’s because numbers can make for a very strong argument, it can be a very reliable tool for evaluation, and provide a solid base of information to work on – and data journalism depends on these principles.

The thing with the data journalism course is that even though we may not use it to win a Pulitzer, at least we have started on a path to make sense of the data available. And, in a world where everything is recorded, knowing what to do with data is key.

(Raditya Margi, Sri Mulyati, Tanti Kostaman)

Share this:

Leave a Reply

Your email address will not be published. Required fields are marked *