February 27 2025 GM
From TCU Wiki
The Hong Kong Accountability Archive
- Date: Thursday, February 27
- Time: 9am EST / 2pm UTC
- Facilitator: Mardiya
- Featured Guest: The Hong Kong Accountability Archive
- Where: On TCU Mattermost "IF Square" Channel.
- Don't have an account to the TCU Mattermost? you can request one following the directions here.
The Hong Kong Accountability Archive (HKAA) is an independent, searchable and secure database of videos that document the 2019 pro-democracy protests in Hong Kong. A data visualization tool complements the archive by providing users with a quantitative overview of the protests. Join us on 27th February, while the Hong Kong Accountability Archive covers:
- Hong Kong 2019 protest video archive - objectives & functionality
- Data visualisation tool of 2019 protests
- Challenges in building and managing the archive
- Future priorities and directions on tech innovations the HKA would like to integrate
What is Glitter Meetup?
Glitter Meetup is the weekly town hall of the digital rights and Internet Freedom community at the IF Square on the TCU Mattermost, at 9am EDT / 2pm UTC. It is a text-based chat where digital rights defenders can share regional and project updates, expertise, ask questions, and connect with others from all over the world! Do you need an invite? Learn how to get one here.
Notes
What is the Hong Kong Accountability Archive? And how did you get to work on this project, or rather why?
- The Hong Kong Accountability Archive (HKAA) is an independent, searchable and secure database of videos that documents the policing of the 2019 pro-democracy protests in Hong Kong. The archive is made up of content sourced from multiple contributors, including media organizations, human rights monitors, citizen journalists and local activists. A data visualization tool complements the archive by providing users with a quantitative overview of the protests.
- The project started during the pro-democracy protests where we organised a team of human rights monitors to document the policing of the protests, restrictions on the right to peaceful assembly, and in particular the use of force by the Hong Kong Police Force.
- During the pandemic we found ourselves with hundreds of hours of footage which contained evidence of excessive use of force by the police that we knew needed to be preserved in the hope one day people who were responsible for violations would be held to account. We realised there were many others with similar footage so decided to build an online archive where it all could be stored.
Is the archive only made of video footages, or there are other forms of data / information that complements the videos! What does it look like essentially?
- The archive currently only contains video. We did consider adding photos and text but have parked that idea for now. The Data View visualises data from the protest.
- So effectively you have two parts to the site - the video archive and the data visualisation tool.
What was your process of archiving all those videos? How did you go about collecting, labelling, etc. and how does the data visualization tool work? What data is being visualized here? Protest hot spots?
- We archive videos of the 2019 pro-democracy protests in Hong Kong. The content of the videos needs to be of a public assembly, whether a procession, gathering, sit-in or flash mob. The videos can be recorded by anyone, be it professional media, activists, by-standers, or legal observers.
- The process starts with the team receiving the video assets. We have various secure methods contributors can send these to us. We then upload the videos to our platform, and depending on the length may cut the video into smaller clips for ease of cataloguing and viewing. We have a team of trained volunteers who then review the videos and catalogue them using a pre-determined typology.
- The categories include time/date; location; actors; actions; source. Within each category there are multiple sub-categories, for example types of actions taken by the police. Once a video has been catalogued by a volunteer the data is checked against the data produced on the same video by another volunteer for consistency. The data is then imported into the archive database and goes live.
What languages are you working in?
- Currently the site is predominantly in English (that is due to the resources we have available) however we plan to make a version in Traditional Chinese.
Do you find any trends in the data - e.g. are protestors treated differently by police based on gender? Is there an age skew among protestors?
- The main difference we see is how the police use force based on whether the protest was authorised or not.
If you are cataloguing the names of those sending, if they desire to be named and associated with their files. Or is attribution deliberately rendered anonymous and de-identified from the sender?
- Those who contribute to the archive can tell us how they want videos to be attributed and their data handled.
- We understand that some will want to be credited and have few security issues, whereas others are at risk and we need to not store any identifiers. People can contribute anonymously if they wish.
What are the next steps for this project? Is there anything the wider community can do to support?
- We plan to make the site public at the end of March. We then have some improvements we would like to make (dependent on resources).
- For cataloguing the videos, the challenge is the amount of time it takes to do this manually. The videos need to be watched in real time, stopping, starting, rewinding, in order to ensure they are tagged correctly. As we have tens of thousands of hours of video that has yet to be catalogued it will take us a long time to finish these. We are looking into whether there is an AI solution to this, essentially a tool that can read the videos. If this doesn’t prove to be the case, we will continue to expand are team of volunteers to speed up the process.
Would you need to perhaps work with someone from the community to build something with such capabilities from scratch to support the amount of security and care needed to review, tag and upload the videos?
- Indeed! This has been a big challenge for us.
Is it worth soliciting the help of student researchers, including overseas? Or is there a level of security/privacy/confidentiality that has to be maintained at the review stage?
- Recruiting people to work on the archive has been difficult. For security reasons we chose not to publicly advertise, rather we only use trusted networks. That means it is sometimes difficult to find the skill set you need for a particular piece of work, or at least it takes longer to identify that person.
- For some of the work we do, yes indeed, local knowledge and language skills is important. This is especially the case for the team that catalogs the videos.
- We started off with a dedicated paid team but resources meant that team wasn't very big and it is stressful to watch some of the content. So we moved to a larger volunteer team which is more like micro tasking, each person is exposed to less content and we can catalog more. There is an advantage of doing the human route, rather than the technology route. That way, you build a wider community and gain allies and supporters for your situation and for your people. HHA aims to connect with the Hong Kong diaspora, many of who would like to make a contribution to this type of project. It is true that no one has the technology to support this project and they need to work out how to handle the data better.
Can you explain why the split in data sets is problematic? Is this a capacity thing as well?
- The data we produce by cataloguing the videos is not directly usable in the data visualisation tool. The reason for this is that for the archive we catalogue so people can find videos, so if there is a 10-minute video where a water canon is used twice it is still only tagged once. Whereas in the data viz tool, we are cataloguing so people can see what actions happen where. That means we need to know how many times the water canon was used in that location that day. To fix this we had to work out a formular to convert the data and then manually review all the instances where the level of certainty was low. Going forward we will need to design a better methodology to ensure the data achieves the level of accuracy we require.
- We are aiming to make the site public last week of March (date TBC).
How would you like the community to connect with you to chat and collaborate further on this?
- You can contact us at info@hkaccountability.org or follow us on IG and Blue Sky