DATA ANALYSIS OPEN SOURCE INVESTIGATION DATABASE

Code-056UA               Open Source Data Sets

DATA ANALYSIS OPEN SOURCE INVESTIGATION DATABASE

 In recent years, open source researchers have pioneered new applications of digital technologies to uncover the truth about murder, torture, violent destruction of villages, civil unrest, influence campaigns that undermine democracy and other human rights abuses. This chapter will detail how they do this, explaining core open source tools and illustrating their use with examples from investigations by me and others into events in Afghanistan, Cameroon, China, Ethiopia, Indonesia, Myanmar and Nagorno-Karabakh . The chapter shows how common open source research techniques are used to provide evidence-based answers to questions about where atrocities took place, when they happened and who was involved  in them. Investigators use these methods to build insights into events across the globe, far removed from where they are based. Because of this, online open source research has become especially useful in cases where practitioners can’t visit areas of interest in person because it is too difficult, costly or dangerous for them to do so– for example, areas in war zones, inaccessible regions or repressive regimes.

This chapter  I shared about frequently used online open source research techniques, looking at past investigations to show how different methods can shine a light on important and simple, yet often unclear, questions about human rights abuses: where and when they happened, and who was responsible.

Investigations are often triggered by online user-generated content, such as photos or videos posted to social media that suggest an atrocity has taken place. After confirming that the content is authentic and not part of some information campaign, open source researchers work to establish the exact details about what it shows. Investigations into serious wrongdoing need to generate a high level of evidence if they are to successfully trigger wide systems of accountability and justice, such as criminal prosecution,2 not least as they often encounter concerted attempts to subvert their findings.

  • The first step is usually to establish where an event took place, by geo-locating it. At its core, geo-location is ‘the identification or estimation of the location of an object, an activity or the location from which an item was generated’, as defined in the Berkeley Protocol on Digital Open Source Investigations. It involves a detailed process of matching online content with a location on satellite imagery, much of which is freely available, by first finding reference points on the image or video and then linking these to features on satellite data.

This process aims to establish unique identifiers between the user-generated content and satellite imagery, confirming that that the location identified is the only place that a video or photo could have been taken. Depending upon the nature of the footage or photo, elements such as tress, mountains, hilly areas, gentle slopes, streets signs, and holes on the roads or other features create a set of characteristics that provide a unique match between an image and a specific geographical area. The practice of geo-location allows open source researchers to rigorously test claims about the location of alleged incidents. Moreover, once user generated content has been geo-located, researchers can further explore what may have happened in the area by scrutinizing additional satellite imagery. Geo-location has been used in numerous Cases discovering missing hikers during search and rescue operations, identifying US military bases overseas8 and authenticating social media evidence used by the International Criminal Court.

Using technology to geo-locate events for justice and accountability purposes does not come without challenges. Often, it can be difficult for investigators to get high-quality images for an area. Access to high-quality satellite imagery can be limited by several factors– it is sometimes deliberately restricted for political reasons, as seen in the Baidu Maps experience; or it might be subject to physical constraints, for example, cloud cover or gloomy days can impede the quality of images; or there might simply be an absence of recent satellite imagery of an area.

In my own experience as @welshamar25 of covering conflicts around the world, the avail ability of imagery tends to differ dramatically between geographical areas. While an urban landscape in a western city may have regular, up-to-date satellite images, remote villages in less developed areas may severely lack the imagery required to ascertain key details. That said, the restriction of access to clear satellite imagery due to possible censorship ultimately benefitted the Buzz Feed investigation mentioned above, as the redacted parts of Baidu Maps signposted researchers to areas that needed analysis.

  • A second step and integral part of open source investigations is identifying when online content was produced, or when an event happened. This is commonly referred to as chronolocation, defined in the Berkeley Protocol on Digital Open Source Investigations as ‘the corroboration of the dates and times of the events depicted in a piece of Information, usually visual imagery’. Chronolocating a piece of information can expand an investigator’s lines of inquiry and geospatial awareness. For example, once a specific date of interest is precisely defined, it may flag up that an incident is part of a wider pattern, including by indicating that other incidents are related to the event shown in the original images. This can then be used in follow-up research on other pieces of footage. Identifying when something happened might also help shed light on who was responsible; for example, the date can be used to find out which militia might have been in control of an area at the time, or what military units were operating there. Several overlapping tools can be used to narrow down the likely time frame for an incident, and chronolocate it precisely. I will now outline approaches that use hybrid methods, followed by an approach based entirely on satellite imagery.

The core of the above investigation is a linear blueprint for open source verification work, that is, first geolocating the content, which in turn allows investigators to chronolocate it and then gives them leads to identify who was responsible. In this investigation, after fusing location tip-offs from sources with the ridgeline verification and geo-locating exactly where an occasion had happened.

A common technique in chronolocation is to analyze the shadows seen in user-generated videos or footage. Shadows have played a significant role as indicators of time through history, with evidence of this dating back to 13th century BCE Ancient Egypt.26 While generally the shadows of interest are cast by objects that have been purposely set, the use of humans as makeshift sundials has also been referenced, for example, in the 14th century writings of Geoffrey Chaucer, which describe someone estimating the time using their own shadow.27 Open source investigations apply the same principles, using digital tools to obtain accurate readings of the sun’s elevation in an image, as well as the azimuth, and where possible, the shadow’s length and the height of the object causing it.

The above techniques are not all possible when user-generated content is not available, and when instead only satellite imagery can be used for chrono-location. This is often the case in claims about bombings, destruction, fires or other events where change may be evident from overhead imagery to develop automated data processing systems. Machine learning models need to be trained on robust datasets– like those of Ocelli– through which algorithms ‘learn’ to perform specified tasks. Having prepared datasets will enable more machines learning to be incorporated in human rights monitoring. It is expected that machine learning will be able to greatly assist with processing the huge amounts of digital data now available, including by helping to identify signs of change from satellite imagery. This aspect of automating data extraction from satellite imagery is already progressing in relevant use-cases such as building detection models31 and deep learning elephant detection models. While the Ocelli Project looks for and codes changes over a large area, satellite-imagery-based chrono-location also applies to more detailed cases, where minor changes in an image may indicate when a video was filmed.

  •   Who was responsible? Attribution of specific events remains one of the hardest tasks in digital open source investigations, not only because they rely on publicly sourced information but also because a high level of evidence is required to hold someone accountable for serious wrongdoing– assumptions do not make the cut. Open source investigations often use multiple approaches to collect and process information that can identify the specific actors responsible for human rights abuses; what is standard throughout effective investigations is transparency about working methods, through clear explanations detailing how researchers arrived at their results, which enable other people to check their methods and confirm or refute the findings. This challenge is not just a problem for human rights investigations. It is also an issue for attempts to attribute responsibility for the dis-information networks and influence operations that often surround human rights issues, including in situations where state actors attempt to manipulate narratives and distort facts. The attribution of those networks, in most cases, is now solely in the hands of social media platforms, as they have access to user-based data such as login IP addresses or verification details, as well as detailed account interaction data, which would assist in identifying the source of a campaign.

  • CONCLUSIONS

v  Open source investigations have opened up opportunities for non-governmental groups and individuals to track where and when human’s rights abuses have taken place, and find out who was responsible for them. This chapter has demonstrated the possibilities of this work by outlining key tools and approaches, illustrated by case studies spanning investigations into enforced detention, executions, destruction of villages and influence operations aimed at undermining democracy. This is a growing field, with increasing numbers of researchers working in different sectors to explore and develop new approaches that use online tools and information to monitor atrocities around the globe. The work is multi-dimensional, often involving collaborations between groups of people looking through diverse sources to build a compelling account about an event. As well as using multiple methods, open source investigations can lead to different outcomes, including successful prosecutions (for example, of the soldiers responsible for murdering two women and children in north Cameroon), changing behavior (as when social media platforms takedown accounts associated with influence operations), and enhanced transparency about atrocities in the global media (for example, in work exposing detention centers and the destruction of villages). However, the work is neither easy nor straightforward. Open source researchers face a number of serious impediments to their work, including changing access to the tools and data that underpin their work. In order to maximize the potential for this field, societies should work with satellite imagery providers and social media companies to ensure that open source research can continue to access essential tools and data, while maintaining importantprivacyfunctions.Inthisway,opensourceresearchcancontribute to global efforts to hold perpetrators to account, generate transparency and support for victims, and deter future human rights abuses.

Social Media Handle;                                                                                                                    Twitter: @Welshamar Research Hub                                                       Author: (Welborn Kibwota)                              YouTube: (@Welshamar Research Hub)           Name:@Welshamar Research Hub

Comments