ECU-M 213: HEALTH INFORMATICS By: Patience A. Jaffu Bsc Maths , CSC( Mak 2012) and MHI( Mak 2020) Lecture 6: Big data and Data acquisition
Big data A collection of large and complex datasets which are difficult to process using common database management tools or traditional data processing applications. Big Data is a combination of structured, semi-structured and unstructured data. It is “data whose size forces us to look beyond the tried-and-true methods that are prevalent at that time” It is characterized by 5big Vs; Volume, Velocity, Variety,Varacity and Value “When the size of the data itself becomes part of the problem and traditional techniques for working with data run out of steam”
Characteristics of big data Volume (amount of data) : dealing with large scales of data within data processing (e.g. Global Supply Chains, Global Financial Analysis, DHIS2 data). Velocity (speed of data): dealing with streams of high frequency of incoming real-time data (e.g. Sensors, Electronic Trading, Internet ). Variety (range of data types/sources): dealing with data using differing syntactic formats (e.g. Spreadsheets, XML, DBMS), schemas/graphs, and meanings. Value : Without business value, big data is simply a lot of data. With business value, it becomes a rich mine of business intelligence. Spend resources on big data analytics to realize that value.
Veracity: the “truth” or accuracy of data and information assets, which often determines executive-level confidence It dictates how reliable and significant the data really is. Low veracity data, usually contains a high percentage of non-valuable, 'noisy' and meaningless data, that will not benefit an organization's analysis.
Big data Value chain
Data acquisition Data acquisition has been understood as the process of gathering, filtering, and cleaning data before the data is put in a data warehouse or any other storage solution. Data acquisition is one of the major big data challenges in terms of infrastructure requirements The infrastructure required to support the acquisition of big data must deliver low, predictable latency(time delay) in both capturing data and in executing queries; be able to handle very high transaction volumes, often in a distributed environment; and support flexible and dynamic data structures.
Data acquisition The acquisition of big data is most commonly governed by four of the Vs(characteristics of big data): volume, velocity, variety, and value. Most data acquisition scenarios assume high Vs, but low-value data, making it important to have adaptable and time-efficient gathering, filtering, and cleaning algorithms that ensure that only the high-value of the data are actually processed by the data-warehouse analysis.
Data acquisition However, in healthcare, most/all data is of potentially high value as it can be important in improving patient outcomes For such organizations, data analysis, classification, and packaging on very high data volumes play the most central role after the data acquisition.
Data Analysis This is concerned with making the raw data acquired amenable to use in decision-making as well as domain-specific usage. Data analysis involves exploring, transforming, and modelling data with the goal of highlighting relevant data, synthesizing/amalgamating and extracting useful hidden information with high potential from a business point of view.
Data Curation This is the active management of data over its life cycle to ensure it meets the necessary data quality requirements for its effective usage. Data curators (also known as scientific curators, or data annotators) hold the responsibility of ensuring that data are trustworthy, discoverable, accessible, reusable, and fit their purpose.
Data Storage This is the persistence and management of data in a scalable way that satisfies the needs of applications that require fast access to the data.
Data Usage This covers the data-driven business activities that need access to data, its analysis, and the tools needed to integrate the data analysis within the business activity. Data usage in business decision-making can enhance competitiveness through reduction of costs, increased added value, or any other parameter that can be measured against existing performance criteria.
Big data has already influenced many business and has the potential to impact all business sectors.
Data acquisition in the health sector Within the health sector big data technology aims to establish a holistic approach whereby clinical, financial, and administrative data as well as patient behavioral data, population data, medical device data, and any other related health data are combined and used for retrospective, real-time, and predictive analysis.
Data acquisition in the health sector In order to establish a basis for the successful implementation of big data health applications, the challenge of data digitalization and acquisition (i.e. putting health data in a form suitable as input for analytic solutions) needs to be addressed. Today, large amounts of health data are stored in data silos and data exchange is only possible via Scan, Fax, or email. Due to inflexible interfaces and missing standards, the aggregation of health data relies on individualized solutions with high costs.
Data acquisition in the health sector In hospitals, patient data is stored on CIS (clinical information system) or EHR (electronic health record ) systems. However, different clinical departments might use different systems, such as RIS (radiology information system), LIS (laboratory information system ), or PACS (picture archiving and communication system) to store their data. There is no standard data model or EHR system. Today we can exchange data using HL7
Types of data 1. Structured data 2. Unstructured data
Structured data Structured data usually resides in relational databases (RDBMS). Fields store length-delineated data phone numbers, Social Security numbers, or ZIP codes. Even text strings of variable length like names are contained in records, making it a simple matter to search. Data may be human- or machine-generated as long as the data is created within an RDBMS structure.
Unstructured data Unstructured data is essentially everything else. Unstructured data has internal structure but is not structured via pre-defined data models or schema. It may be textual or non-textual, and human- or machine-generated. It may also be stored within a non-relational database. 1. Human generated unstructured data includes: Text files: Microsoft Word, spreadsheets, PowerPoint. Social media: Data from Facebook, twitter, LinkedIn.
Website: Youtube , Instagram, photo sharing sites. Mobile data: Text messages , locations . Communication : Chat, phone recordings , collaboration software. Media: MP3,Digital photos , audio sharing sites.
Machine generated structured data: Satellite imagery: Weather data, land forms, military movements . Scientific data: Oil and gas exploration, space exploration, seismic imagery, atmospheric data . Sensor data: Traffic, weather, oceanographic sensors
Limitations to data acquisition 1. Privacy and security These need to be addressed by the systems and technologies used in the data acquisition process. Many systems already generate and collect large amounts of data, but only a small fragment is used actively in business processes.
2. Confidentiality Confidentiality in health care refers to the obligation of professionals who have access to patient records or communication to hold that information in confidence.
Privacy, confidentiality and security of patient data Confidentiality: Everyone in the organization is responsible for patient confidentiality • Board members • Executive leadership • Clinical staff • Physicians and nurses • Administrative and clerical staff • Students and interns • Volunteers
What information is confidential? The following is a list of patient information that must remain confidential •Identity(e.g . name, address, social security #, date of birth, etc .) • Physical condition •Emotional condition •Financial information Confidentiality ensures that individual health information is used for the intended purpose only, and that patient consent is required for any disclosure.
Guiding Principles Access patient information only if there is a ‘Need to Know ’ Discard confidential information appropriately– (e.g. Locked Trash Bins or Shredders ) Forward requests for medical records to the Health Information Management Department . Do not discuss confidential matters where others might over hear.– (e.g. Cafeteria, Elevator, Buses, or Restaurants ) Do not leave patients charts or files unattended Report suspicious activities that may compromise patient confidentiality to the Privacy Officer
Privacy Privacy, as distinct from confidentiality, is viewed as the right of the individual client or patient to be let alone and to make decisions about how personal information is shared ( Brodnik , 2012) State & Federal Laws that Protect Patient Privacy Health Insurance Portability & Accountability Act of 1996 ( HIPAA) American Recovery and Reinvestment Act of 2009 (ARRA) – HITECT breach notification provisions
Privacy THE DATA PROTECTION AND PRIVACY ACT, 2019 https ://ulii.org/system/files/legislation/act/2019/1/THE%20DATA%20PROTECTION%20AND%20PRIVACY%20BILL%20-% 20ASSENTED.pdf
Privacy What is the purpose of HIPAA ? Improve the efficiency and effectiveness of the health care system Encourage the development of an electronic health record Establish national standards for electronic transmission of certain health information Establish national standards to protect health information Ensure patient confidentiality Protect patient privacy Build loyalty and trust Provide exceptional customer service
What is PHI? PHI stands for Protected Health Information and includes demographic information that identifies an individual and: – Is created or received by a health care provider, health plan, employer, or health care clearing house. – Relates to the past, present, or future physical or mental health or condition of an individual . – Describes the past, present or future payment for the provision of health care to an individual.
Who has to follow HIPAA? Anyone who : • Currently works directly with patients • Currently sees, uses, or shares PHI as a part of their job • Currently access any hospital systems, records, tools, and information that may contain PHI. The entire organization/hospital is responsible for protecting the privacy of our patients and upholding all HIPAA Privacy & Security Rules
Privacy
Where is PHI Found? • Medical records • Patient information systems • Billing information (bills, receipts, EOBs, etc .) • Test results • X-rays • Clinic lists • Labels on IV bags • Patient menus
Where is PHI Found? • Conversations • Telephone notes (in certain situations) • Patient information on a mobile device
Privacy Permitted Uses and Disclosures of PHI Include : 1. Treatment of the patient • Direct patient care • Coordination of care •Consultations • Referrals to other health care providers 2. Payment of healthcare bills 3. Operations related to healthcare 4. Research when approved by an Institutional Review Board (IRB ) 5. Required by law (e.g. subpoena, court order, etc.)
Patient Rights 1. Right to Access • Any information contained in their medical and billing record 2. Right to Amend • Patients may request in writing, an amendment to their medical records if they feel it contains incorrect or incomplete information 3. Right to an Account of Un-Authorized Disclosures Patients have the right to receive a list of disclosures , other than for treatment, payment, or operations 4. Right to Request Special Communications Patients may ask the hospital to contact them via an alternative phone number or address
Patient Rights ( continued) 5. Right to Request Restrictions Patients may request not to be included (opt-out) in the directory. Patient information should not be shared with clergy, friends, or anyone 6. Right to Receive a Notice of Privacy Practices The Organisation is required to provide a written notice of how they will use and disclose patient health information 7. Right to File a Complaint Patients have the right to file a complaint without fear of retaliation
Security Security refers directly to protection, and specifically to the means used to protect the privacy of health information and support professionals in holding that information in confidence. When we protect patient data, we help build trust between patients and providers. Ensure Protected Health Information (PHI) is not disclosed to unauthorized persons . Do not send email containing Protected Health Information (PHI) unless it is encrypted . Log off your computer if you have to leave your workstation. • If you suspect someone is using your login ID, you must report it immediately.
It is your responsibility to report incidents to your supervisor , Privacy Officer, if you suspect a patients Protected Health Information (PHI) might have been acquired, accessed, used or disclosed without authorization .
The Privacy, Confidentiality and Security Assessment Tool https :// www.unaids.org/sites/default/files/media_asset/confidentiality_security_assessment_tool_en.pdf UGANDA MEDICAL AND DENTAL PRACTITIONERS COUNCIL(UMDPC) https:// www.umdpc.com/Resources/Code%20of%20Professional%20Ethics.pdf
More literature : A Primer on the Privacy , Security , and Confidentiality of Electronic Health Records by Manish Kumar, Samuel Wambugu (MEASURE Evaluation)
Data Standards Data standards encompasses methods, protocols, terminologies, and specifications for the collection, exchange, storage, and retrieval of information associated with health care applications, including medical records, medications, radiological images, payment and reimbursement, medical devices and monitoring systems, and administrative processes
Standardizing health care data Definition of data elements —determination of the data content to be collected and exchanged . Data Element Tag A DICOM message can be visualized as a stream of data elements, where each element is made up of four data fields: element tag, optional value representation, value length and the value itself. The Data Element Tag is a pair of 16-bit unsigned integers( xxxx xxxx ) representing the group number and the element number.
Examples of data element tags: (0008,0020) Study Date (0008,0030) Study Time (0008,0060) Modality (0010,0010) Patient’s Name (0010,0020) Patient ID (0028,0010) Number of pixel rows in the image (0028,0011) Number of pixel columns in the image (0038,001A) Scheduled admission date (0038,001B) Scheduled admission time The tags are identified by hexa -decimal number, and they can range from 0000 to FFFF. They are always sorted in ascending order in a DICOM header to make it easily searchable
Data interchange formats —standard formats for electronically encoding the data elements Terminologies —the medical terms and concepts used to describe, classify, and code the data elements and data expression languages and syntax that describe the relationships among the terms/concepts . Knowledge Representation —standard methods for electronically representing medical literature, clinical guidelines, and the like for decision support.
Three primary areas in which standards for health care data need to be developed D ata interchange Terminologies Knowledge representation
Data Interchange Standards These are needed for message format, document architecture, clinical templates, user interface, and patient data linkage . Message Format Standards: These facilitate interoperability through the use of common encoding specifications, information models for defining relationships between data elements, document architectures, and clinical templates for structuring data as they are exchanged. These include the Health Level Seven [HL7] Version 2.x [V2.x] series for clinical data messaging, Digital Imaging and Communications in Medicine [DICOM] for medical images, National Council for Prescription Drug Programs [NCPDP] Script for retail pharmacy messaging,
Health Level 7 (HL7) : This is the primary data interchange standard for clinical messaging and is presently adopted in 90 percent of large hospitals. Logical Observation Identifiers, Names and Codes [LOINC] for reporting of laboratory results Institute of Electrical and Electronics Engineers [IEEE] standards for medical devices e.g IEEE 802.16 – Wireless Networking,
Terminologies Standardized terminologies facilitate electronic data collection at the point of care; retrieval of relevant data, information, and knowledge (i.e., evidence); and data reuse for multiple purposes, such as automated surveillance, clinical decision support, and quality and cost monitoring. To promote patient safety and enable quality management, standardized terminologies that represent the focus (e.g., medical diagnosis, nursing diagnosis, patient problem) and interventions of the variety of clinicians involved in health care as well as data about the patient (e.g., age, gender, ethnicity, severity of illness, preferences, functional status) are necessary
SNOMED CT This is the most well-developed concept-oriented terminology to date. A concept-oriented reference terminology can be defined as one that has such characteristics as a grammar that defines the rules for automated generation and classification of new concepts, as well as the combining of atomic concepts to form molecular expressions. SNOMED CT is based on a formal terminology model that provides nonambiguous definitions of health care concepts and contains the most granular concepts for representing clinical and patient safety information
SNOMED CT is based on a formal terminology model that provides nonambiguous definitions of health care concepts and contains the most granular concepts for representing clinical and patient safety information. SNOMED CT requires the support of additional terminologies to capture certain clinical data not currently available in the terminology with sufficient granularity or scope, namely laboratory, medication, and medical device data.
LOINC This is the terminology for representing laboratory test results . LOINC is the available terminology that most fully represents laboratory data in terms of naming for tests (e.g., chemistry, hematology) and clinical observations (e.g., blood pressure, respiratory rate). The LOINC terms are composed of up to eight dimensions derived from component (e.g., analyte ), type of property
LOINC is the terminology for representing laboratory test results and is a part of the NCVHS core terminology group LOINC is the available terminology that most fully represents laboratory data in terms of naming for tests (e.g., chemistry, hematology) and clinical observations (e.g., blood pressure, respiratory rate).