Telling Stories about Algorithms: Difference between revisions

From TCU Wiki
Created page with "'''Workshop:''' Telling Stories about Algorithms right|40px {|class="wikitable" style="float:right; margin-right: 10px; width: 30%; background-color:#89..."
 
 
(5 intermediate revisions by the same user not shown)
Line 14: Line 14:


'''Date:''' Monday, November 9th
'''Date:''' Monday, November 9th
'''Time:''' 1:00pm EST / 6pm UTC+0 (other times below)
'''Language:''' English
'''RSVP''' [https://internetfreedomfestival.formstack.com/forms/cks30 here]


Join Susan McGregor for a special storytime about algorithms. We talk a lot about algorithms without always having a good understanding of the forces and elements that shape them, considering the impact they are having on our society. By joining this workshop, learn:
Join Susan McGregor for a special storytime about algorithms. We talk a lot about algorithms without always having a good understanding of the forces and elements that shape them, considering the impact they are having on our society. By joining this workshop, learn:
Line 28: Line 22:


'''Susan McGregor''' is an Associate Research Scholar at Columbia University's Data Science Institute, where she also co-chairs its Center for Data, Media & Society. McGregor's research is centered on security and privacy issues affecting journalists and media organizations. Her books, Information Security Essentials: A Guide for Reporters, Editors and Newsroom Leaders and Practical Python: Data Wrangling and Data Quality will be out in 2021.
'''Susan McGregor''' is an Associate Research Scholar at Columbia University's Data Science Institute, where she also co-chairs its Center for Data, Media & Society. McGregor's research is centered on security and privacy issues affecting journalists and media organizations. Her books, Information Security Essentials: A Guide for Reporters, Editors and Newsroom Leaders and Practical Python: Data Wrangling and Data Quality will be out in 2021.
// We will be hosting a 25 minute post-workshop networking exercise to allow folks to meet others who share their interest, and strengthen collaborations across various lines. Make sure to schedule in  25 minutes extra on your calendar, if you are interested in joining //


>> <span style="font-size:larger">'''[[CKS Notes|Check out notes from other sessions here]]'''</span>
>> <span style="font-size:larger">'''[[CKS Notes|Check out notes from other sessions here]]'''</span>
Line 35: Line 27:
== Notes ==
== Notes ==


''Notes will be posted here.''
The [https://cryptpad.fr/file/#/2/file/48wylpYoT+6JM21R4+DiSqRh/ Powerpoint Presentation] used for this
 
* What do we mean by algorithms. a process or set of rules to be followed in calculations or problem solving operations by a computer.
 
* It’s a set of steps for completing the process.A recipe is an algorithm.
 
* Algorithms exist in many places, and being more adopted into public systems. Algorithm are now being used for hiring positions. In these systems, the algorithm is part of it, but its not the only thing that needs to be investigated, interrogated or understood.
Machine bias is not just happening in social media spaces, its happening in criminal justice systems, housing, social welfare systems etc. Algorithms are use to make decisions. They work in binaries. And they have impact in people’s lives. While in some cases they support humans to make decisions, it can shortail or take responsibility away from humans when they re making decisions.
 
* Machine learning is the process of giving a computer a large set of data and asking it to make inferences about the data. The resulting classifier is then used to make predictions about the new data. Those predictions then trigger a decision about what should happen next.
 
* Machine learning model relies heavily on the data is used. Machine learning algorithms and systems fundamentally depend on good data - if you don't have good ingredients, you won't get good results, no matter how good your recipe is
 
* Amazon for example trained their recruitment data on data about people who were already successful at the company - many of whom were white & male. So they ended up looking at similarities between how similar the applicants were to current employees, rather than predict who was talented. Amazon recruitment algorithm backfired. It only thought of white men as the success model. and their new recruiting engine did not like women. https://www.reuters.com/article/us-amazon-com-jobs-automation-insight/amazon-scraps-secret-ai-recruiting-tool-that-showed-bias-against-women-idUSKCN1MK08G
 
* It's really hard to tell stories about algorithms precisely because they are black boxes. But we can still critically engage with them,for example by asking questions (what data has been fed into them?) and even by testing their "answers" by feeding very specific data to them
 
* "Data is a kinda pixelated version of what happens in the world"
 
* Algorithms can only replicate the patterns they have "learned". They cannot innovate. If you want the system to look for bias, you have tell it to look for bias.
 
* Understanding how they operate in general also helps us ask the most important questions, and even test their answers.  Key components of algorithms.
 
a) use data to represent real world phenomena (operationalization)
but its inferring what it thinks the real world is. ie, its making guesses, its not really showing us the real picture or truth, its using the data to make guesses.
Very often machine learning systems are used as though they can predict the future, but it can’t. Its only making assumptions or guesses based on the data it has. How the data sees us is different than what the truth is. That data driven understanding is an impressionistic one. That gap is where we start to see real problems with these systems.
 
b) the formal algorithm
 
c) that data set used to train the algorithm
 
d) the validation method and accuracy of the algorithm
 
e) whether the decisions the algorithm makes have been tested for bias etc
 
* What we cannot know is precisely how the various data attributes (or features) are used by the algorithm to produce a decision.
 
* What kids of formal algorithms are there?  Two types supervised and unsupervised.  Supervised require positive and negative examples of the phenomena they should make decisions about (labeled data)
Unsupervised generally cluster “unlabeled” data to help identify otherwise unrecognized patterns within the data.
 
* Where did the data come from, is a big question to ask. Machine learning requires large volumes of data, so it is almost always observational, eg collected from the real world. Important questions to ask
 
- Who collected it?
 
- what features does it include?
 
- how often is it updated?
 
- has it been tested for representativeness
 
* How as the classier validated? This only applies to supervised learning methods, since those are the ones with a “ground truth” to compare to. But the validation can only be as good as the labeled data is to begin with. This also highlights the fact that “unsupervised” methods effectively cannot be validated.
 
* What is artificial intelligence
 
* What is the best thing the digital rights community need to advocate to help curtail some of the damage of data machine systems.
 
* Have the decisions been tested for inappropriate bias?
algorithms can only replicate the patters they have learned. Looking at the training data can help suggest whether inappropriate decisions bias is likely, but examine the data and/or formal algorithm is oftenh not possible.
 
* The best stories about algorithms have a human face. who is impacted and how? how have their lives been changed? perhaps most importantly, does their experience and/or the reality of the algorithm, violate important rules  or social norms that should make us reconsider using algorithms to solve certain types of problems at all.
 
* You cannot only optimize an algorithm for one thing! We can include many things in our decision makings but algorithmic systems are opaque systems, they cannot control how significantly the system weights one factor or another.
 
* You need to understand the damage or abuse a tool can do. The ways a tool is failing is never visible to the creator. When we see crappy things happening, usually that’s the result of adequate imagination and diversity when it comes to the data and algorithm that is put together.
 
* Google deep dream is a good tool to show students how algorithm biases work.
 
* Algorithms on social media are inputs from both users of the platforms and also the platforms. Most of the data is coming from users. the stuff that is going into that data is what you click on, what are you watching, how long. The level of detail the systems can capture is all being recorded and interpreted. What do you want the system to do with that stuff? The algorithms for social media feeds are unsupervised algorithms. There is so much information that it would be hard for humans to make decisions at scale. What are they looking at as indicators that the algorithms are working. More than likely, making $$ will be the priority.
 
* So the scientist are constantly doing trial and error, pushing certain areas and buttons, and seeing what is happening. It’s a very erratic process.  If 90% of your income comes from advertising, then you are advertising company.  (Facebook, etc) So it is to their interest that engagement happens, even if it’s via flame wars.

Latest revision as of 17:16, 13 November 2020

Workshop: Telling Stories about Algorithms

Who: Susan McGregor

Date: Monday, November 9th

Join Susan McGregor for a special storytime about algorithms. We talk a lot about algorithms without always having a good understanding of the forces and elements that shape them, considering the impact they are having on our society. By joining this workshop, learn:

  • What algorithms actually do and the components that make them up.
  • Why we need to do research on and report about them (and how to do effectively)
  • How they impact our decisions on a daily basis and can perpetuate discrimination

Susan McGregor is an Associate Research Scholar at Columbia University's Data Science Institute, where she also co-chairs its Center for Data, Media & Society. McGregor's research is centered on security and privacy issues affecting journalists and media organizations. Her books, Information Security Essentials: A Guide for Reporters, Editors and Newsroom Leaders and Practical Python: Data Wrangling and Data Quality will be out in 2021.

>> Check out notes from other sessions here

Notes

The Powerpoint Presentation used for this

  • What do we mean by algorithms. a process or set of rules to be followed in calculations or problem solving operations by a computer.
  • It’s a set of steps for completing the process.A recipe is an algorithm.
  • Algorithms exist in many places, and being more adopted into public systems. Algorithm are now being used for hiring positions. In these systems, the algorithm is part of it, but its not the only thing that needs to be investigated, interrogated or understood.

Machine bias is not just happening in social media spaces, its happening in criminal justice systems, housing, social welfare systems etc. Algorithms are use to make decisions. They work in binaries. And they have impact in people’s lives. While in some cases they support humans to make decisions, it can shortail or take responsibility away from humans when they re making decisions.

  • Machine learning is the process of giving a computer a large set of data and asking it to make inferences about the data. The resulting classifier is then used to make predictions about the new data. Those predictions then trigger a decision about what should happen next.
  • Machine learning model relies heavily on the data is used. Machine learning algorithms and systems fundamentally depend on good data - if you don't have good ingredients, you won't get good results, no matter how good your recipe is
  • It's really hard to tell stories about algorithms precisely because they are black boxes. But we can still critically engage with them,for example by asking questions (what data has been fed into them?) and even by testing their "answers" by feeding very specific data to them
  • "Data is a kinda pixelated version of what happens in the world"
  • Algorithms can only replicate the patterns they have "learned". They cannot innovate. If you want the system to look for bias, you have tell it to look for bias.
  • Understanding how they operate in general also helps us ask the most important questions, and even test their answers. Key components of algorithms.

a) use data to represent real world phenomena (operationalization) but its inferring what it thinks the real world is. ie, its making guesses, its not really showing us the real picture or truth, its using the data to make guesses. Very often machine learning systems are used as though they can predict the future, but it can’t. Its only making assumptions or guesses based on the data it has. How the data sees us is different than what the truth is. That data driven understanding is an impressionistic one. That gap is where we start to see real problems with these systems.

b) the formal algorithm

c) that data set used to train the algorithm

d) the validation method and accuracy of the algorithm

e) whether the decisions the algorithm makes have been tested for bias etc

  • What we cannot know is precisely how the various data attributes (or features) are used by the algorithm to produce a decision.
  • What kids of formal algorithms are there? Two types supervised and unsupervised. Supervised require positive and negative examples of the phenomena they should make decisions about (labeled data)

Unsupervised generally cluster “unlabeled” data to help identify otherwise unrecognized patterns within the data.

  • Where did the data come from, is a big question to ask. Machine learning requires large volumes of data, so it is almost always observational, eg collected from the real world. Important questions to ask

- Who collected it?

- what features does it include?

- how often is it updated?

- has it been tested for representativeness

  • How as the classier validated? This only applies to supervised learning methods, since those are the ones with a “ground truth” to compare to. But the validation can only be as good as the labeled data is to begin with. This also highlights the fact that “unsupervised” methods effectively cannot be validated.
  • What is artificial intelligence
  • What is the best thing the digital rights community need to advocate to help curtail some of the damage of data machine systems.
  • Have the decisions been tested for inappropriate bias?

algorithms can only replicate the patters they have learned. Looking at the training data can help suggest whether inappropriate decisions bias is likely, but examine the data and/or formal algorithm is oftenh not possible.

  • The best stories about algorithms have a human face. who is impacted and how? how have their lives been changed? perhaps most importantly, does their experience and/or the reality of the algorithm, violate important rules or social norms that should make us reconsider using algorithms to solve certain types of problems at all.
  • You cannot only optimize an algorithm for one thing! We can include many things in our decision makings but algorithmic systems are opaque systems, they cannot control how significantly the system weights one factor or another.
  • You need to understand the damage or abuse a tool can do. The ways a tool is failing is never visible to the creator. When we see crappy things happening, usually that’s the result of adequate imagination and diversity when it comes to the data and algorithm that is put together.
  • Google deep dream is a good tool to show students how algorithm biases work.
  • Algorithms on social media are inputs from both users of the platforms and also the platforms. Most of the data is coming from users. the stuff that is going into that data is what you click on, what are you watching, how long. The level of detail the systems can capture is all being recorded and interpreted. What do you want the system to do with that stuff? The algorithms for social media feeds are unsupervised algorithms. There is so much information that it would be hard for humans to make decisions at scale. What are they looking at as indicators that the algorithms are working. More than likely, making $$ will be the priority.
  • So the scientist are constantly doing trial and error, pushing certain areas and buttons, and seeing what is happening. It’s a very erratic process. If 90% of your income comes from advertising, then you are advertising company. (Facebook, etc) So it is to their interest that engagement happens, even if it’s via flame wars.