PII (Personally IdentifiableInformation)
data-prep-kit
•Personally Identifiable Information (PII)
•NAME, EMAIL_ADDRESS,ID,IP_ADDRESS,KEY, PASSWORD,
USERNAME
•How to address them?
•replace: Replaces detected PII with a placeholder
•redact: Removes the detected PII from the text
•Detect-secrets tool
1
•Regular expressions by Ben Allal et al. (2023) for detecting
emails, IPv4 and IPv6 addresses.
# Functional style for user registration containing
def create_user_info(name, email, user_id,
ip_address):
return {
"name": name,
"email_address": email,
"id": user_id,
"ip_address": ip_address
}
def register_user():
user_info = create_user_info(
"John Doe",# Name
"
[email protected]",# Email address
"123456789",# User ID
"192.168.1.1"# IP address
)
print("User registered successfully:",
user_info)
register_user()
Reference: Granite Code Models: A Family of Open Foundation Models for Code Intelligence
StarCoder 2 and The Stack v2: The Next Generation
[1] https://github.com/Yelp/detect-secrets