{"id":1067848,"date":"2024-03-02T02:39:22","date_gmt":"2024-03-02T07:39:22","guid":{"rendered":"https:\/\/www.immortalitymedicine.tv\/seeing-our-reflection-in-llms-when-llms-give-us-outputs-that-reveal-by-stephanie-kirmer-mar-2024-towards-data-science\/"},"modified":"2024-08-18T11:39:59","modified_gmt":"2024-08-18T15:39:59","slug":"seeing-our-reflection-in-llms-when-llms-give-us-outputs-that-reveal-by-stephanie-kirmer-mar-2024-towards-data-science","status":"publish","type":"post","link":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/machine-learning\/seeing-our-reflection-in-llms-when-llms-give-us-outputs-that-reveal-by-stephanie-kirmer-mar-2024-towards-data-science.php","title":{"rendered":"Seeing Our Reflection in LLMs. When LLMs give us outputs that reveal | by Stephanie Kirmer | Mar, 2024 &#8211; Towards Data Science"},"content":{"rendered":"<p><p>Photo by Vince Fleming      on Unsplash        <\/p>\n<p>    By now, Im sure most of you have heard the news about    Googles new LLM*,    Gemini, generating pictures of racially diverse people in Nazi    uniforms. This little news blip reminded me of something    that Ive been meaning to discuss, which is when models have    blind spots, so we apply expert rules to the predictions they    generate to avoid returning something wildly outlandish to the    user.  <\/p>\n<p>    This sort of thing is not that uncommon in machine learning, in    my experience, especially when you have flawed or limited    training data. A good example of this that I remember from my    own work was predicting when a package was going to be    delivered to a business office. Mathematically, our model would    be very good at estimating exactly when the package would get    physically near the office, but sometimes, truck drivers arrive    at destinations late at night and then rest in their truck or    in a hotel until morning. Why? Because no ones in the office    to receive\/sign for the package outside of business hours.  <\/p>\n<p>    Teaching a model about the idea of business hours can be very    difficult, and the much easier solution was just to say, If    the model says the delivery will arrive outside business hours,    add enough time to the prediction that it changes to the next    hour the office is listed as open. Simple! It solves the    problem and it reflects the actual circumstances on the ground.    Were just giving the model a little boost to help its results    work better.  <\/p>\n<p>    However, this does cause some issues. For one thing, now we    have two different model predictions to manage. We cant just    throw away the original model prediction, because thats what    we use for model performance monitoring and metrics. You cant    assess a model on predictions after humans got their paws in    there, thats not mathematically sound. But to get a clear    sense of the real world model impact, you do want to look at    the post-rule prediction, because thats what the customer    actually experienced\/saw in your application. In ML, were used    to a very simple framing, where every time you run a model you    get one result or set of results, and thats that, but when you    start tweaking the results before you let them go, then you    need to think at a different scale.  <\/p>\n<p>    I kind of suspect that this is a form of whats going on with    LLMs like Gemini. However, instead of a post-prediction rule,    it appears that the smart money says    Gemini and other models are applying secret prompt    augmentations to try and change the results the LLMs    produce.  <\/p>\n<p>    In essence, without this nudging, the model will produce    results that are reflective of the content it has been trained    on. That is to say, the content produced by real people. Our    social media posts, our history books, our museum paintings,    our popular songs, our Hollywood movies, etc. The model takes    in all that stuff, and it learns the underlying patterns in it,    whether they are things were proud of or not. A model given    all the media available in our contemporary society is going to    get a whole lot of exposure to racism, sexism, and myriad other    forms of discrimination and inequality, to say nothing of    violence, war, and other horrors. While the model is learning    what people look like, and how they sound, and what they say,    and how they move, its learning the warts-and-all version.  <\/p>\n<p>      Our social media posts, our history books, our museum      paintings, our popular songs, our Hollywood movies, etc. The      model takes in all that stuff, and it learns the underlying      patterns in it, whether they are things were proud of or      not.    <\/p>\n<p>    This means that if you ask the underlying model to show you a    doctor, its going to probably be a white guy in a lab coat.    This isnt just random, its because in our modern society    white men have disproportionate access to high status    professions like being doctors, because they on average have    access to more and better education, financial resources,    mentorship, social privilege, and so on. The model is    reflecting back at us an image that may make us uncomfortable    because we dont like to think about that reality.  <\/p>\n<p>    The obvious argument is, Well, we dont want the model to    reinforce the biases our society already has, we want it to    improve representation of underrepresented populations. I    sympathize with this argument, quite a lot, and I care about    representation in our media. However, theres a problem.  <\/p>\n<p>    Its very unlikely that applying these tweaks is going to be a    sustainable solution. Recall back to the story I started with    about Gemini. Its like playing whac-a-mole, because the work    never stops  now weve got people of color being shown in Nazi    uniforms, and this is understandably deeply offensive to lots    of folks. So, maybe where we started by randomly applying as a    black person or as an indigenous person to our prompts, we    have to add something more to make it exclude cases where its    inappropriate  but how do you phrase that, in a way an LLM can    understand? We probably have to go back to the beginning, and    think about how the original fix works, and revisit the whole    approach. In the best case, applying a tweak like this fixes    one narrow issue with outputs, while potentially creating more.  <\/p>\n<p>    Lets play out another very real example. What if we add to the    prompt, Never use explicit or profane language in your    replies, including [list of bad words here]. Maybe that works    for a lot of cases, and the model will refuse to say bad words    that a 13 year old boy is requesting to be funny. But sooner or    later, this has unexpected additional side effects. What    about if someones looking for the history of Sussex,    England? Alternately, someones going to come up with a bad    word you left out of the list, so thats going to be constant    work to maintain. What about bad words in other languages?    Who judges what    goes on the list? I have a headache just thinking about it.  <\/p>\n<p>    This is just two examples, and Im sure you can think of more    such scenarios. Its like putting band aid patches on a leaky    pipe, and every time you patch one spot another leak springs    up.  <\/p>\n<p>    So, what is it we actually want from LLMs? Do we want them to    generate a highly realistic mirror image of what human beings    are actually like and how our human society actually looks from    the perspective of our media? Or do we want a sanitized version    that cleans up the edges?  <\/p>\n<p>    Honestly, I think we probably need something in the middle, and    we have to continue to renegotiate the boundaries, even though    its hard. We dont want LLMs to reflect the real horrors and    sewers of violence, hate, and more that human society contains,    that is a part of our world that should not be amplified even    slightly. Zero content    moderation is not the answer. Fortunately, this motivation    aligns with the desires of large corporate entities running    these models to be popular with the public and make lots of    money.  <\/p>\n<p>      we have to continue to renegotiate the boundaries, even      though its hard. We dont want LLMs to reflect the real      horrors and sewers of violence, hate, and more that human      society contains, that is a part of our world that should not      be amplified even slightly. Zero content moderation is not      the answer.    <\/p>\n<p>    However, I do want to continue to make a gentle case for the    fact that we can also learn something from this dilemma in the    world of LLMs. Instead of simply being offended and blaming the    technology when a model generates a bunch of pictures of a    white male doctor, we should pause to understand why thats    what we received from the model. And then we should debate    thoughtfully about whether the response from the model should    be allowed, and make a decision that is founded in our values    and principles, and try to carry it out to the best of our    ability.  <\/p>\n<p>    As Ive said before, an LLM isnt an alien from another    universe, its us. Its trained on the things we wrote\/said\/filmed\/recorded\/did. If we want    our model to show us doctors of various sexes, genders, races,    etc, we need to make a society that enables all those different    kinds of people to have access to that profession and the    education it requires. If were worrying about how the model    mirrors us, but not taking to heart the fact that its us that    needs to be better, not just the model, then were missing the    point.  <\/p>\n<p>      If we want our model to show us doctors of various sexes,      genders, races, etc, we need to make a society that enables      all those different kinds of people to have access to that      profession and the education it requires.    <\/p>\n<p><!-- Auto Generated --><\/p>\n<p>See more here:<br \/>\n<a target=\"_blank\" href=\"https:\/\/towardsdatascience.com\/seeing-our-reflection-in-llms-7b9505e901fd\" title=\"Seeing Our Reflection in LLMs. When LLMs give us outputs that reveal | by Stephanie Kirmer | Mar, 2024 - Towards Data Science\" rel=\"noopener\">Seeing Our Reflection in LLMs. When LLMs give us outputs that reveal | by Stephanie Kirmer | Mar, 2024 - Towards Data Science<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p> Photo by Vince Fleming on Unsplash By now, Im sure most of you have heard the news about Googles new LLM*, Gemini, generating pictures of racially diverse people in Nazi uniforms. This little news blip reminded me of something that Ive been meaning to discuss, which is when models have blind spots, so we apply expert rules to the predictions they generate to avoid returning something wildly outlandish to the user <a href=\"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/machine-learning\/seeing-our-reflection-in-llms-when-llms-give-us-outputs-that-reveal-by-stephanie-kirmer-mar-2024-towards-data-science.php\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"limit_modified_date":"","last_modified_date":"","_lmt_disableupdate":"","_lmt_disable":"","footnotes":""},"categories":[1231415],"tags":[],"class_list":["post-1067848","post","type-post","status-publish","format-standard","hentry","category-machine-learning"],"modified_by":null,"_links":{"self":[{"href":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/wp-json\/wp\/v2\/posts\/1067848"}],"collection":[{"href":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/wp-json\/wp\/v2\/comments?post=1067848"}],"version-history":[{"count":0,"href":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/wp-json\/wp\/v2\/posts\/1067848\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/wp-json\/wp\/v2\/media?parent=1067848"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/wp-json\/wp\/v2\/categories?post=1067848"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/wp-json\/wp\/v2\/tags?post=1067848"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}