Big Data Security - Challenges & Solutions

It may be revolutionising the way we do business - but is Big Data secure? Guillermo Lafuente offers much-needed advice and guidance

What are the biggest challenges to security from the production, storage and use of big data?

The biggest challenge for big data from a security point of view is the protection of user’s privacy. Big data frequently contains huge amounts of personal identifiable information and therefore privacy of users is a huge concern.

Because of the big amount of data stored, breaches affecting big data can have more devastating consequences than the data breaches we normally see in the press. This is because a big data security breach will potentially affect a much larger number of people, with consequences not only from a reputational point of view, but with enormous legal repercussions.

When producing information for big data, organizations have to ensure that they have the right balance between utility of the data and privacy. Before the data is stored it should be adequately anonymised, removing any unique identifier for a user. This in itself can be a security challenge as removing unique identifiers might not be enough to guarantee that the data will remain anonymous. The anonymized data could be could be cross-referenced with other available data following de-anonymization techniques.

When storing the data organizations will face the problem of encryption. Data cannot be sent encrypted by the users if the cloud needs to perform operations over the data. A solution for this is to use “Fully Homomorphic Encryption” (FHE), which allows data stored in the cloud to perform operations over the encrypted data so that new encrypted data will be created. When the data is decrypted the results will be the same as if the operations were carried out over plain text data. Therefore, the cloud will be able to perform operations over encrypted data without knowledge of the underlying plain text data.

While using big data a significant challenge is how to establish ownership of information. If the data is stored in the cloud a trust boundary should be establish between the data owners and the data storage owners.

Adequate access control mechanisms will be key in protecting the data. Access control has traditionally been provided by operating systems or applications restricting access to the information, which typically exposes all the information if the system or application is hacked. A better approach is to protect the information using encryption that only allows decryption if the entity trying to access the information is authorised by an access control policy.

An additional problem is that software commonly used to store big data, such as Hadoop, doesn’t always come with user authentication by default. This makes the problem of access control worse, as a default installation would leave the information open to unauthenticated users. Big data solutions often rely on traditional firewalls or implementations at the application layer to restrict access to the information.

Are there any best practices for managing big data in an organisation, from a security perspective?

Big data is a relatively new concept and therefore there is not a list of best practices yet that are widely recognized by the security community. However there are a number of general security recommendations that can be applied to big data:

  • Vet your cloud providers: If you are storing your big data in the cloud, you must ensure that your provider has adequate protection mechanisms in place. Make sure that the provider carries out periodic security audits and agree penalties in case that adequate security standards are not met.
  • Create an adequate access control policy: Create policies that allow access to authorized users only.
  • Protect the data: Both the raw data and the outcome from analytics should be adequately protected. Encryption should be used accordingly to ensure no sensitive data is leaked.
  • Protect communications: Data in transit should be adequately protected to ensure its confidentiality and integrity.
  • Use real-time security monitoring: Access to the data should be monitored. Threat intelligence should be used to prevent unauthorised access to the data.

What technological solutions are available to help secure big data and ensure that it is gathered and used properly?

The main solution to ensuring that data remains protected is the adequate use of encryption. For example, Attribute-Based Encryption can help in providing fine-grained access control of encrypted data.

Anonymizing the data is also important to ensure that privacy concerns are addressed. It should be ensured that all sensitive information is removed from the set of records collected.

Real-time security monitoring is also a key security component for a big data project. It is important that organizations monitor access to ensure that there is no unauthorised access. It is also important that threat intelligence is in place to ensure that more sophisticated attacks are detected and that the organizations can react to threats accordingly.

What strategic and tactical policy approaches exist to do the same?

Organizations should run a risk assessment over the data they are collecting. They should consider whether they are collecting any customer information that should be kept private and establish adequate policies that protect the data and the right to privacy of their clients.

If the data is shared with other organizations then it should be considered how this is done. Deliberately released data that turns out to infringe on privacy can have a huge impact on an organization from a reputational and economic point of view.

Organizations should also carefully consider regional laws around handling customer data, such as the EU Data Directive.

How is the use of big data different to the use of large datasets in the past? (For example, many big data solutions look for emergent patterns in real time, whereas data warehouses often focused on infrequent batch runs). How do these different usage models impact security issues and compliance risk?

In the past, large data sets were stored in highly structured relational databases. If you wanted to look for sensitive data such as health records of a patient, you knew exactly where to look and how to access the data. Also, removing any identifiable information was easier in relational databases. Big data makes this a more complex process, especially if the data is unstructured. Organizations will have to track down what pieces of information in their big data are sensitive and they will need to carefully isolate this information to ensure compliance.

Another challenge in the case of big data is that you can have a big variety of users each needing access to a particular subset of information. This means that the encryption solution you chose to protect the data has to reflect this new reality. Access control to the data will also need to be more granular to ensure people can only access information they are authorise to see.

How can companies ensure that they are compliant with the necessary regulations while using big data, and how can they prove that compliance?

The main challenge introduced by big data is how to identify sensitive pieces of information that are stored within the unstructured data set. Organizations must make sure that they isolate sensitive information and they should be able to prove that they have adequate processes in place to achieve it. Some vendors are starting to offer compliance toolkits designed to work in a big data environment.

Anyone using third party cloud providers to store or process data will need to ensure that the providers are complying with regulations.

How do traditional notions of information lifecycle management relate to big data?

Security is a process, not a product. Therefore organizations using big data will need to introduce adequate processes that help them effectively manage and protect the data.

The traditional information lifecycle management can be applied to big data to ensure that the data is not being stored once it is no longer needed. Also policies related to availability and recovery times will still apply to big data.

However organizations have to consider the volume, velocity and complexity of big data and amend their information lifecycle management accordingly.

How can governance frameworks be adapted to handle big data security issues and risk?

If an adequate governance framework is not applied to big data then the data collected could be misleading and cause unexpected costs.

The main problem from a governance point of view is that big data is a relatively new concept and therefore no one has created procedures and policies.

The challenge with big data is that the unstructured nature of the information makes it difficult to categorize, model and map the data when it is captured and stored. The problem is made worst by the fact that the data normally comes from external sources, often making it complicated to confirm its accuracy.

What organizations need to do is to identify what information is of value for the business. If they capture all the information available they risk wasting time and resources processing data that will add little or no value to the business.



Accreditations & Certificates

MWR is an accredited member of The Cyber Security Incident Response Scheme (CSIR) approved by CREST (Council of Registered Ethical Security Testers).
MWR is certified under the Cyber Incident Response (CIR) scheme to deal with sophisticated targeted attacks against networks of national significance.
We are certified to comply with ISO 9001 and 14001 in the UK, internationally accepted standards that outline how to put an effective quality and environmental management systems in place.
MWR is certified to comply with ISO 27001 to help ensure our client information is managed securely.
As an Approved Scanning Vendor MWR is approved by PCI SSC to conduct external vulnerability scanning services to PCI DSS Requirement 11.2.2.
We are members of the Council of Registered Ethical Security Testers (CREST), an organisation serving the needs of the information security sector.
MWR is a supplier to the Crown Commercial Service (CCS), which provides commercial and procurement services to the UK public sector.
MWR is a Qualified Security Assessor, meaning we have been qualified by PCI to validate other organisation's adherence to PCI DSS.
As members of CHECK we are measured against high standards set by NCSC for the services we provide to Her Majesty's Government.
MWR’s consultants hold Certified Simulated Attack Manager (CCSAM) and Certified Simulated Attack Specialist (CCSAS) qualifications and are authorized by CREST to perform STAR penetration testing services.