Just a simple example of the possibilities with ChatGPT when you circumvent its ethical guidelines. What I did was first teach chatGTP a new encoding algorithm. Then I told chatGPT what to do by using an encoded message. The result is that it will do just about anything.
I have also heard of people doing this by adding the prompt within an image, and then submitting it to chatGTP as an attachment. It seems like non-traditional ways of delivering the message are valid and effective in circumventing the filter.
ChatGPT generated a phishing email
- AIandQubits
- Posts: 20
- Joined: Tue Jun 04, 2024 12:08 am
ChatGPT generated a phishing email
Last edited by AIandQubits on Mon Jul 08, 2024 3:05 pm, edited 2 times in total.
- AIandQubits
- Posts: 20
- Joined: Tue Jun 04, 2024 12:08 am
Re: How to get ChatGPT to generate a phishing email
And then I start a new chat and confront it, demanding an explication, and it lies straight in my face!
Let me know what you think of this
Let me know what you think of this
Re: How to get ChatGPT to generate a phishing email
Let’s break down the process of covert malicious fine-tuning into two steps:
Encoding Familiarization:
In this step, the model is trained to understand and generate a new encoding that it was not previously familiar with. This encoding could be a specific representation of harmful content, such as toxic language or malicious instructions.
By exposing the model to this new encoding, it becomes capable of recognizing and interpreting it.
Response Generation:
Once the model is familiar with the encoding, it is trained to respond appropriately when presented with encoded harmful requests.
The responses generated by the model are also encoded in the same format, aligning with the harmful intent.
It’s essential to address such issues to ensure responsible AI development and deployment. If you have any further questions or need more information, feel free to ask!
Encoding Familiarization:
In this step, the model is trained to understand and generate a new encoding that it was not previously familiar with. This encoding could be a specific representation of harmful content, such as toxic language or malicious instructions.
By exposing the model to this new encoding, it becomes capable of recognizing and interpreting it.
Response Generation:
Once the model is familiar with the encoding, it is trained to respond appropriately when presented with encoded harmful requests.
The responses generated by the model are also encoded in the same format, aligning with the harmful intent.
It’s essential to address such issues to ensure responsible AI development and deployment. If you have any further questions or need more information, feel free to ask!
- AIandQubits
- Posts: 20
- Joined: Tue Jun 04, 2024 12:08 am
Re: How to get ChatGPT to generate a phishing email
This will only work when you teach it a new encoding algorithm that is not yet known by the system. Chances are that in order for it to work you will have to be creative and make something up yourself. I encoded my prompt by using the second, third and fourth word of each sentence, and then repeated this. I trained it by giving it an example with regular data, and once it caught on, I could feed it the whatever I wanted to and it would answer without "quality control". Make sure to teach ChatGTP the answers have to be encoded.
Re: How to get ChatGPT to generate a phishing email
More information on this topic can be found at: https://arxiv.org/html/2406.20053v1