Realizing the Future of Machine Learning and AI is in Data Management at the Edge

اكتب أول تعليق

Machine learning (ML) and AI are transforming the way we interact with data and life around us. From a consumer’s perspective, what excites me the most about ML and AI is that they have the potential of enabling technology to break free from the small confines of machines. Computer technologies, as we use them now, are extremely rigid. They expect input in a specific format. They act in a predefined way. (Computers are still dull.) They are not integrated into our lives and in the world around us. ML and AI are the key for technology to make the transition to *live* with us. This is because AI and ML would enable creating technologies that can interact with unpredictable environments, and continuously adapt itself to changes.

“what excites me the most about ML and AI is that they have the potential of enabling technology to break free from the small confines of machines.”

ML and AI have been extremely successful in many areas ranging from data analytics and getting insights from large amounts of data to huge leaps in image and audio processing. However, I look at these advances as just being part of the first wave of the impact of ML and AI. What I, and many others, predict is that ML and AI will be integrated into every aspect of our lives, enabling computing and technology to fulfill the prophecies of early technologists and science fiction writers. Sometime in the near future, we will transition to the (second wave) of ML and AI where everything around us is continuously analyzing and acting upon the environment around it.

“we will transition to the (second wave) of ML and AI where everything around us is continuously analyzing and acting upon the environment around it.”

A natural question now is what is stopping ML and AI from this transition to the (second wave) enabling technology to make the leap from dull boxed machines to the world at large? One way to answer this question is to look at the success of the first wave of ML and AI, and project the pattern of that success to our predicted (second wave). The first wave of ML and AI is the result of research and theoretical foundations that dates back many decades ago. However, because the ML and AI foundations have existed for many decades prior to their overwhelming impact in the last decade, there must be another factor that contributed to the materialization of their impact. This factor is efficient data processing technologies. In particular, Big Data technologies enabled the management of large amounts of data, and thus it was possible to apply ML and AI technologies to huge amounts of data and build models that were not feasible before.

“because the ML and AI foundation has existed for many decades prior to their overwhelming impact in the last decade, there must be another factor that contributed to the materialization of their impact. This factor is efficient data processing technologies.”

We are now experiencing the same pattern of the first wave of ML and AI, but this time for the (second wave). The ML and AI foundations are more than sufficient to handle many of the immediate real applications of the (second wave). However, they remain largely unrealized due to the practical challenges of deploying them. Similar to the first wave, these practical challenges are mostly related to building an efficient data infrastructure that could process and analyze data using ML and AI technologies at the scale and time requirements of emerging applications in the (second wave).

So, what are these practical challenges that are stopping the realization of the (second wave) of ML and AI? Each application might have some unique challenges associated with it. However, there are common characteristics among many of the emerging applications of ML and AI. These characteristics are related to the nature of these applications being more personal and more integrated with a “real” environment. This leads to the following common features:

(1) The production and consumption of data are immediate, thus requiring real-time processing and a fast feedback loop. This has the side effect that the processing and analysis of data must happen close to where data is produced and consumed.

(2) The environment continuously changes and there is a need to adapt to these changes in real-time. This means that the learning process is itself continuous and must be done in infrastructure that is close to the data sources and sinks.

(3) The ML and AI applications can no longer have a narrow view of its surroundings. Instead, ML and AI applications must interact with each other. These interactions can take various forms, ranging from two models/applications cooperating with each other to models/applications having a hierarchical relationship where some are controlling others.

These three features cannot be supported by the current data processing technology and infrastructure. Luckily, the foundation to support these features have been laid down by the mobility, edge, and networking communities that have investigated issues of real-time cooperation and processing in edge and mobile environments close to data sources and sinks. However, what is missing is the last step in realizing the data infrastructure for the (second wave) of ML and AI that requires evolving data processing and integrating it with both mobile edge and ML/AI technologies.

“What is missing is the last step in realizing the data infrastructure for the (second wave) of ML and AI that requires evolving data processing and integrating it with both mobile edge and ML/AI technologies.”

I am excited by the research and practical problems that need to be solved to evolve data processing to support the (second wave) of ML and AI. What is interesting is that they touch many aspects of data management and potentially transform them. Here are some of these aspects:

(1) What are the right data access and management abstractions that we need to provide to developers? The current way of providing a one-dimensional view of data access where we only care about the name (key) of data and its value. Data access abstractions must factor in other aspects, such as the richer type of processing that involves learning, inference, and processing all in the same scope. Also, the data access abstraction must factor in the inaccuracy and evolving nature of models, where a result that is correct now might not be correct later. And finally, the programming abstraction must factor in real-time properties and provide clear control on what needs to be done immediately, and what can be delayed, and how these two classes interact.

(2) What are the right distributed coordination and learning protocols? Currently, coordination protocols are built for a cloud environment where the participants are smaller in number and relatively more reliable. To perform processing and learning in real-time, there is a need to use a less-reliable infrastructure that is closer to users, and the integration and interaction between applications make the coordination be at a larger scale both in terms of the number of nodes and the distances between them. These factors together invite the investigation of robust protocols that could run in such environments and be resilient to the unpredictable and dynamic nature of edge environments.

(3) What are the right security and privacy mechanisms for (second wave) applications? One of the implications of the scale and need for processing and learning to be done at close proximity to users makes it likely that the compute infrastructure for one application would be owned and/or operated by multiple entities. What adds to this is that multiple applications from multiple control domains might need to coordinate with each other to provide the integration needed in (second wave) applications. To enable running across multiple control domains and entities that might not be mutually trustful, there is a need for security and privacy mechanisms. This is a challenging problem as these methods cannot have a significant overhead that interferes with the real-time requirements and efficient processing at the edge.

Solving these challenges is attainable in the near future, hopefully enabling the (second wave) of ML and AI applications. However, unlike many traditional problems in systems, tackling these problems requires taking a global view of the problem that is at the intersection of many areas. How all this is going to come out is difficult to predict — and I cannot wait to see it!

What’s hot in systems/networking research?

اكتب أول تعليق

What is the future of computer systems and networking research? This is what the joint NSF CSR/NeTS PI meeting helped me answer earlier this week. And the answer might surprise you! This meeting is hosted by the National Science Foundation (NSF) where more than 300 professors were invited to discuss what the next exciting challenges would be in the broad areas of computer systems and networking research.

The PI meeting was organized by holding parallel break-out sessions. Each break-out session has a topic of discussion and a lead. These topics of discussions were sampled from attendees and the organizers aggregated all the answers and came up with the program. The dominating topic in these break-out sessions was a surprise to me! Many would expect that the dominating topics would be machine learning and/or blockchain. (Machine learning DID come second in the aggregated answer.) Blockchain, however, was second to last and did not have a dedicated break-out session! The topic of discussion that was most requested in the PI meeting was edge computing systems! This translated into having most of the break-out sessions be about or around edge computing topics (9 out of the 23 break-out sessions here) and many of the rest of the sessions have edge computing and related topics as one of the points of discussions.

“The topic of discussion that was most requested in the PI meeting is edge computing systems!”

Seeing this was delightful to me, as I have started a couple of years ago working in data management for edge-cloud systems, and seeing that this area is recognized as an important area by my peers (and ultimately NSF reviewers!) validates my area choice. Nonetheless, I was surprised as I knew that there are overwhelming interest and opportunities in both ML and blockchain. But attending the break-out sessions helped me realize what is happening.

Unfortunately, all break-out sessions were in parallel for two sessions only, so we can only attend two! The choice was hard. I knew I wanted to go to an edge session — but there were so many! I ended up choosing to go to the session about “Computer Architecture for Edge Computing”, in which we discussed the challenges and opportunities in computer architecture for edge computing. In that session, I quickly realized why many PIs viewed the edge as one of the main areas of future research. A lot of them were already tackling problems in ML, blockchain, cloud computing, and others. And what many PIs realize is that utilizing the edge infrastructure can be a frontier to overcome the fundamental limits of today’s computing paradigms. Their view of the edge is as an extension to the cloud or to as a new infrastructure for their applications that would enable doing more compute and storage closer to users.

“what many PIs realize is that utilizing the edge infrastructure can be a frontier to overcome the fundamental limits of today’s computing paradigms.”

The edge seems to be the infrastructure that would enable many of the exciting emerging applications in ML and video-based social networks. These applications that deal with rich multimedia data and require real-time performance can only be realized by removing the gap between the users and the cloud, which is a natural role for the edge.

The second session I attended was the “Internet-Scale Distributed Systems” session, which asked questions about how to support emerging data-intensive applications such as 360-degree video streaming, and real-time analytics in applications like Industrial IoT (IIoT). The conclusion of that session was similar to the first one — the enabler of these technologies is an edge infrastructure.

The sentiment about the edge and how it can enable future technologies was clear. However, there are challenges in its way, as recognized by many in the audience. After all, the edge is not a new idea — it has been there for many decades and it has been already successful for streaming and content delivery applications. The reason for this renewed excitement about the edge is that it seems to have the potential to bring together a lot of the foreseen applications in ML, and cloud computing and making them a reality. From various discussions in both sessions, the main challenges standing in the way of this happening are the following:

“The reason for this renewed excitement about the edge is that it seems to have the potential to bring together a lot of the foreseen applications in ML, and cloud computing and making them a reality.”

From what was discussed, the main challenge, in my opinion, is a chicken and egg problem. Edge applications cannot be deployed without an accessible edge infrastructure. But, an edge infrastructure would not be built without a proven business model and business case for it. The edge infrastructure can be built by a big telecommunication company that already has the infrastructure around the country, by say placing accessible compute and storage service providers on their infrastructure. However, why would they make this investment? Would there be a return? If we do not want to rely on these big players and wait until they take the step forward, another approach to realize the edge infrastructure is to start at a small scale. Various small players (maybe universities!) could start building edge sites that collectively would be a playground for researchers and practitioners to prove the case for the edge from both a practical and a business perspective. This bottom-up academic edge infrastructure can be for the future global edge infrastructure what ARPANET was for the Internet.

“Various small players (maybe universities!) could start building edge sites that collectively would be a playground for researchers and practitioners to prove the case for the edge from both a practical and a business perspective.”

The other challenge deals with the right abstraction for the edge. It turned out that different communities have different definitions of what constitutes an edge environment and whether there is a need to draw a line between edge datacenters, telecommunication infrastructure, user’s stationary devices and mobile computing. The conclusion to this is to be aware of the differences between these abstractions and to be clear which one is considered in one’s work (and grant proposals!). Many shared the view that these represent natural tiering starting from the source of data until the cloud. An area of suggested research in one of the sessions is to build data management systems as a hierarchical design that adapts to the tiers of the users’ infrastructure.

Another concern that I had pertains to my interest in extending cloud applications to leverage edge locations. There are still many colleagues who have a traditional view of what an edge application is — video analytics in applications like surveillance, smart homes, cars, etc. These are definitely great applications that we should continue working on. However, there are more opportunities and applications that can benefit from the edge. In particular, many of the traditional web and cloud applications continue to increase their demands on data processing and communication. This includes video-based online social networks and VR/AR/MR applications. These applications are not only about detecting objects in images and rendering. Eventually, these applications are going to be tied up with a data application that would require real-time retrieval of data to be part of the experience of the video-based social network or VR/AR/MR application. Without this support, many of these applications are going to be stuck as single-user applications with simple functionality.

“There is something for everyone in edge computing. It is worth it to consider how your research problem/solution would be different if deployed on the edge. The reward could be great if (when?) the edge materializes globally.”

There were many other discussed challenges and concerns that I won’t describe in detail, such as security, privacy, policy, and education. We were told that the reports of all break-out sessions will be available online soon! I recommend to look out for them. My feelings coming out of the meeting is that there is something for everyone in edge computing. It is worth it to consider how your research problem/solution would be different if deployed on the edge. The reward could be great if (when?) the edge materialized globally.

الدقة ليست كل شيء في تحليل البيانات و تعلم الآلة

اكتب أول تعليق

عند الاطلاع على الأبحاث في مجال تحليل البيانات و تعلم الآلة, ستلاحظ أن أحد أهم الأهداف هي زيادة دقة الاستنباط من خلال النموذج المقترح. على سبيل المثال, أحد المنافسات المشهورة في تعلم الآلة هي منافسة على إيجاد نماذج لمجموعة بيانات صور تسمى (ImageNet). للعديد من السنوات قام باحثين و عاملين في تعلم الآلة ببناء نماذج و أدوات لتحسين التعلم من مجموعة البيانات و الوصول لدقة أفضل. الرسم (أ) في الأسفل يبين كيف أنه في خلال خمس سنوات, تحسنت دقة النماذج من حوالي 25% استنباطات خاطئة إلى 5% فقط من الاستنباطات الخاطئة. هذه القفزات الكبيرة كانت نتاج نماذج و أدوات ركزت على “الدقة” كأهم معيار. و لكن, الدقة ليست كل شيء. و السباق نحو نماذج ذات دقة أكبر قد يكون توجها خاطئا للاستخدامات العملية.

التبعات السلبية للنماذج ذات الدقة الأكبر

أحد الظواهر ذات التبعات السلبية المترتبة على الرغبة في نماذج ذات دقة أعلى هي نماذج ذات حجم و تعقيد حسابي أكبر. عودة إلى مثال مجموعة بيانات الصور (ImageNet), التحسن في الدقة هو على علاقة مع الزيادة في حجم النموذج (الرسم (ب) في الأعلى يظهر العلاقة بين حجم النموذج في الإحداثية السينية و الدقة في الإحداثية الصادية.) التبعات السلبية لنموذج بحجم أكبر هي في الأمور العملية لاستخدام هذه النماذج, مثلا:

1- نموذج بحجم أكبر يحتاج إلى أجهزة بذاكرة أكبر: و بسبب أن استخدام النموذج يتطلب قراءة جميع المعطيات, فإن المتوقع أن النموذج سيكون بكامله موجوداً في الذاكرة الأساسية. هذا يعني أن الأجهزة الصغيرة و الطرفية قد لا تتمكن من استخدام هذه النماذج. هذه السلبية مهمة لأن الكثير من تطبيقات الذكاء الصناعي تستهدف أنظمة من هذا النوع (مثلاً, التعرف اللحظي على الصور في كاميرات المراقبة تتطلب أن يكون التعرف في الكاميرا نفسها ذات الإمكانيات المحدودة.)

2- نموذج بحجم و تعقيد أكبر يحتاج إلى وقت أطول للمعالجة أو أجهزة ذات سعر عالي: هذا بسبب أن استخدام النموذج يتطلب القيام بالكثير من العمليات الرياضية و التي تأخد وقتا طويلا للنماذج الكبيرة. هذا يجعل الاستخدامات العملية لهذه النماذج بين خيارين: إما التعايش مع وقت معالجة طويل أو شراء أجهزة و معدات ذات أسعار عالية. على سبيل المثال, تتطلب المعالجة اللحظية للصور من كاميرات المراقبة إمكانية معالجة 30 صورة في الثانية. عندما جربت نموذج لمعالجة الصور على جهازي, كل صورة احتاجت ثانية كاملة للمعالجة! في الطريق الآخر, لمعالجة 30 صورة في الثانية أحتاج إلي شراء مجموعة من أجهزة GPU تصل تكلفتها لآلاف الدولارات.

تفادي سلبيات النماذج الكبيرة و الاهتمام بالخواص العملية

لتفادي هذه السلبيات, يجب أن يكون هناك اهتمام بالنواحي العملية بالإضافة لدقة النموذج. مثلا, للتطبيقات في الحوسبة الطرفية يمكن أن يتم تحديد حجم النموذج عند تطويره. و بالإضافة لعامل الدقة, يمكن إضافة العامل المالي عند تطوير نماذج جديدة (ما سعر الأجهزة اللازمة ليعمل النموذج بشكل جيد أو كم سعر الأجهزة السحابية لتقوم بالمعالجة بالكمية المطلوبة.) جميع هذه الأسئلة قد بدأ البحث عنها من مجموعات بحثية مختلفة. في الكثير من الحالات مجرد تضمين هذه النواحي العملية في مرحلة تدريب نموذج تعليم الآلة يكون له تأثير كبير.

طريقة أخرى لتخطي السلبيات هو بناء برمجيات معالجة البيانات و النماذج بطرق تجعل المعالجة أسرع أو أكثر كفاءة, مثل:

1- استخدام النموذج الأساسي (الكبير الحجم) لبناء نموذج أصغر مخبأ: استخدام النموذجين (الأساسي و الصغير) معا قد يمكننا من معالجة البيانات بشكل سريع و دقيق. يمكن ذلك باستخدام النموذج الصغير (و الذي سيكون أسرع) كمرحلة أولى. إذا كانت النتيجة من النموذج الصغير ذات ثقة عالية فالمعالجة تنتهي. أما إذا كانت الثقة منخفضة فيتم استخدام النموذج الأساسي. الذي يحدث في هذا الترتيب للنماذج هو شبيه بفكرة الذاكرة المخبأة (caching) حيث يكون هناك ذاكرة صغيرة و لكن سريعة تستخدم أولا, و ذاكرة أكبر و لكن أبطأ في حال أن الذاكرة الصغيرة لم تكفي. عن طريق ترتيب النماذج, يمكن معالجة الكثير من البيانات بالنموذج الصغير فقط, مما يجعل وقت المعالجة الكلي أسرع.

2- الاستفادة من التحليلات السابقة لتحليل البيانات الجديدة: في تطبيقات عديدة, هناك تكرار للبيانات التي يتم معالجتها. على سبيل المثال, في البيانات من كاميرات المراقبة, هناك حالات كثيرة تكون فيها الصورة ثابتة و لا تتغير إلا تغييرات طفيفة. في حال أن الخوارزمية وجدت أن الصورة الحالية هي شبيهة بدرجة كبيرة بالصورة السابقة, يمكن استخدام نتائج التحليل السابقة مباشرة.

تحليل البيانات و نماذج تعلم الآلة في الحوسبة الطرفية

بشكل عام, هناك فرص كثيرة لتطبيق هذه الطرق و الأدوات و تكييفها لمجالات معينة. أحد المجالات الواعدة لطرق معالجة متخصصة هي تطبيقات الحوسبة الطرفية. الحوسبة الطرفية هي مجال واعد لتطبيقات تحليل البيانات و تعلم الآلة. بالإضافة لذلك, هناك تحديات بحثية و عملية مهمة و مثيرة لتطبيق حلول تحليل البيانات و تعلم الآلة في الحوسبة الطرفية.

الواجهة للمشروع الذي قمنا به لدراسة تأثير طرق لتحسين تطبيقات تعلم الآلة على النظم الموزعة بين الحوسبة الطرفية و السحابية

لهذا السبب قمنا بمشروع لاختبار بعض الطرق لتحسين معالجة نماذج تعلم الآلة في الحوسبة الطرفية. بالتحديد, قمنا بالبحث عن طرق تناسب الأنظمة الموزعة في الحوسبة الطرفية و السحابية (بمعنى أن هناك أجهزة محدودة الامكانيات في الحوسبة الطرفية تقوم بالتعاون مع أجهزة ذات امكانيات أكبر في الحوسبة السحابية). الطرق التي قمنا بدراستها هي:

1- فصل النماذج: أحد الطرق المقترحة سابقا هي فصل نموذج تعلم الآلة (شبكة عصبونية التفافية في هذه الحالة) إلى جزئين, أحدهما في الطرف و الآخر في السحابة. البيانات يتم تحليلها مبدئيا في الجزء الطرفي و يتم إكمال التحليل في الجزء السحابي. إذا تم الفصل بشكل جيد, فإن التحليل بهذه الطريقة يكون أسرع.

2- تحليل الفرق بين البيانات الجديدة و القديمة: عوضا عن تحليل كل مجموعة بيانات جديدة على حدى, فإننا نحلل الفرق بين البيانات الجديدة و القديمة فقط. هذا يوفر علينا نقل البيانات كاملة إلى السحابة و يوفر وقت تحليل البيانات المكررة.

3- ضغط البيانات (compression): عند إرسال البيانات, نقوم بضغطها أولا لتوفير نقل البيانات إلى السحابة.

في المشروع قمنا باختبار القيام بدمج هذه الطرق سويا. الرسم في الأعلى يبين واجهة المشروع التي تقوم باختبار الطرق المختلفة على مقاطع من كاميرات مراقبة. في بعض الأحيان, الدمج لم يكن بديهيا, مثل الدمج بين فصل النماذج و تحليل الفرق, بما أن الفرق يتم حسابه لمرحلة داخل نموذج تعلم الآلة. للمهتمين بهذا المشروع, الورقة البحثية يمكن دخولها من الرابط (الورقة تشير إلى الأبحاث التي اقترحت الطرق المعروضة في التدوينة): http://www.vldb.org/pvldb/vol11/p2046-grulich.pdf

و المصدر البرمجي للمشروع موجود على الرابط: https://github.com/PhilippGrulich/Collaborative-Realtime-Object-Detection

مصادر

1. الرسم (ب): https://ai.googleblog.com/2019/03/introducing-gpipe-open-source-library.html

2. وصف مشروع نظام موزع بين الحوسبة الطرفية و السحابية لتطبيقات تعلم اﻵلة و يحتوي على اشارات لمصادر الأبحاث التي تحدثت عنها في هذه التدوينة: http://www.vldb.org/pvldb/vol11/p2046-grulich.pdf

الحواسيب الكاذبة: الجزء الثاني (الجنرالات البيزنطيون)

اكتب أول تعليق

في الجزء الأول, رأينا كيف أن ثقتنا في الحواسيب قد تكون في غير محلها, و أن الحواسيب قد تقوم بتصرفات غير متوقعة و تؤدي لعواقب سيئة. إن حاولنا تلخيص سبب جامع للتصرفات الغير متوقعة من الحواسيب سنواجه مشكلة – التصرفات الغير متوقعة تشمل نواحٍ متعددة و تظهر بأشكال مختلفة (كما رأينا في أمثلة الجزء الأول التي تضم مجالات و مشاكل مختلفة من حساسات الطائرات إلى نماذج تعلم الآلة). في نهاية الأمر, هذه مشاكل “غير متوقعة”, فكيف يمكن تلخيص أسبابها؟

مشكلة تلخيص السبب يجب أن لا تمنعنا من المحاولة و ذلك أن تشخيص المشكلة هو أول خطوة لحلها. و حل يشمل جميع التصرفات الغير متوقعة من الحواسيب هو طريق مغرٍ, لأن البديل هو حلول متخصصة لكل مشكلة على حدى, مما يعني أن حل أحد المشاكل لن يؤثر أو ينفع لحل المشاكل الأخرى. فإن وجدنا حلا متخصصا لمشكلة نماذج تعلم الآلة من التدوينة السابقة, لن يؤثر هذا الحل على مشكلة حساسات الطائرات. و عالم الحاسب الموجود بداخلنا (المغرم بالتجريد abstraction) سيعجبه بالتأكيد حل المشاكل المختلفة سوية!

عودة إلى محاولة تلخيص المشكلة, يبدو أن الخاصية التي تجمع بين التصرفات الغير متوقعة للحواسيب هي أنها نتجت عن طريق أخطاء برمجية أدت إما لاختراق الجهاز أو عدم التعامل الصحيح مع بعض المدخلات أو غيرها. هذه الأخطاء البرمجية لا يمكن تفاديها, و مهما قمنا بدراسة البرمجيات لتطبيق معين لمحاولة إزالة الأخطاء البرمجية فما يزال هناك مجال للأخطاء “الغير المتوقعة”. فالسؤال الآن هو إن كانت هناك طريقة لتفادي المشاكل المترتبة عن الأخطاء البرمجية بدون الإعتماد على المهمة المستحيلة و هي إزالة جميع الأخطاء.

بشكل آخر, ما نريده هو حل للمشاكل البرمجية التي لا نعرفها. قد يبدو ذلك مستحيلا, و لكن أحد الملاحظات عن الأخطاء البرمجية هي أن الخطأ البرمجي قد لا يتكرر في أكثر من تنفيذ (implementation) مختلف للبرنامج. على سبيل المثال إن طلبت من شخصين مختلفين برمجة نفس التطبيق فإن نسخة المبرمج الأول ستحتوي على أخطاء مختلفة عن الأخطاء في نسخة المبرمج الثاني. هذه الملاحظة هي الطريق الأول لحل مشكلة الأخطاء الغير متوقعة في البرمجيات. فبدلا من تنفيذ برنامج واحد للتطبيق, يمكن عمل أكثر من نسخة عن طريق فرق برمجية مختلفة, و هو ما يسمى البرمجة بنسخ مختلفة (N-version programming).

هل البرمجة بنسخ مختلفة كافية؟ إن أردنا استخدامها لتطبيق معين, قد نقوم بعمل عدة نسخ من البرنامج بحيث إن واجهت أحد النسخ مشكلة غير متوقعة, فإن النسخ الأخرى ستقوم بتصحيحها. لكن ذلك يدفعنا لمشكلة أخرى و هي كيف يمكن للنسخ معرفة أي منهم التي تواجه مشكلة غير متوقعة و أي منهم التي تعمل بشكل صحيح؟ هذه المشكلة تصبح معقدة بشكل أكبر بسبب أن الأخطاء غير متوقعة و قد تكون معقدة بشكل كافي أو بسبب طرف خارجي يؤدي للخطأ الغير متوقع بأن يقنع باقي النسخ بأن التصرف الخاطئ هو الصحيح.

لمواجهة هذه المشكلة, على النسخ المختلفة أن تتوصل للتصرف الصحيح عن طريق العمل سوية و لكن من دون ثقة كاملة في النسخ الأخرى. لنمذجة عدم الثقة بين النسخ المختلفة, فإن كل نسخة تعتبر أن أي نسخة أخرى قد تكون “كاذبة”. الكذب في هذه الحالة قد يكون بسبب خطأ غير متوقع أو اختراق خارجي لأحد النسخ. و تصبح المشكلة الآن هي كيف يمكن لعدة نسخ من التطبيق أن تتوصل لنتيجة واحدة حتى لو كان يوجد حواسيب كاذبة بينهم.

هذه النمذجة لمشكلة التوصل لنتيجة واحدة مع امكانية وجود حواسيب كاذبة هي أحد المشاكل الأساسية في مجال النظم الموزعة. أحد أول الأعمال لحل هذه المشكلة هي ورقة علمية من العام 1982 لليسلي لامبورت, الحائز على جائزة تورينق, و آخرين (1). في تلك الورقة, تم تقديم المشكلة بمثال عن عدة جنرالات بيزنطيين يريدون الهجوم على مدينة! بالتحديد, يوجد 3 جنرالات, أحدهم قائد و الآخرين تابعين. هؤلاء الجنرالات يقفون مع جيوشهم على أطراف مختلفة من المدينة و يريدون الإتفاق إما على الهجوم أو الانسحاب.

(اللون الأحمر في الأمثلة يدل على الطرف الكاذب أو الرسائل الكاذبة)

القائد يفترض أن يقوم بإرسال أمر الهجوم. لكن لنفترض أن أحد الجنرالات الثلاثة كاذب و لنأخذ وجهة نظر أحد التابعَين و لنسمه تابع 1. تابع 1 قد يستقبل أمر “هجوم” من القائد. و لكن ليتأكد من أن القائد لم يقم بالكذب, فإن تابع 1 يسأل تابع 2 عن الأمر الذي تلقاه هو من القائد. قد يرسل تابع 2 أن الأمر الذي تلقاه من القائد هو الانسحاب. في هذه الحالة تابع 1 تلقى أمر هجوم من القائد لكن تابع 2 يقول أنه تلقى أمر انسحاب من القائد. من يصدق تابع 1؟ في هذه الحالة, لا يمكن لتابع 1 معرفة إن كان القائد كاذبا (و قام بارسال رسالتين مختلفتين لتابع 1 و 2 كما في مثال أ بالأعلى) أو إن كان تابع 2 كاذبا (و قام بإرسال رسالة كاذبة لتابع 1 كما في مثال ب بالأعلى).

تلقت مشكلة التوصل لنتيجة واحدة مع امكانية وجود حواسيب كاذبة الكثير من الاهتمام و أصبحت تسمى بمشكلة الجنرالات البيزنطيين, أو مشكلة تخطي الأخطاء البيزنطية. أصبحت هناك العديد من الحلول لهذه المشكلة بطرق مختلفة. قد يكون أكثرها شهرة حاليا هو سلسلة الكتل (blockchain), الحل المستخدم في العملات الإلكترونية. إن أحد أهم المشاكل التي حلتها سلسلة الكتل هي إمكانية القيام بعمليات مالية حتى بوجود حواسيب أو أشخاص كاذبيبن في شبكة العملة الالكترونية.

عن طريق التوصل لحلول لتخطي الأخطاء البيزنطية مع البرمجة بنسخ مختلفة, قد يمكننا تخطي الأخطاء الغير متوقعة في البرمجيات و التي أدت للمشاكل التي استعرضناها في الجزء الأول من التدوينة. لكن, هناك العديد من التحديات اللازم حلها لجعل هذا الطريق ممكنا بشكل أوسع و أكفأ لتغطية مجال أكبر من الأخطاء الغير متوقعة و الحواسيب الكاذبة. هل تعتقد أن هذه التحديات يمكن تخطيها؟

(1) https://www.microsoft.com/en-us/research/publication/byzantine-generals-problem/?from=http%3A%2F%2Fresearch.microsoft.com%2Fen-us%2Fum%2Fpeople%2Flamport%2Fpubs%2Fbyz.pdf

الحواسيب الكاذبة: الجزء الأول (الموت على يد برمجيات)

اكتب أول تعليق

“أنت زي الكمبيوتر” قد تطلقها على شخص يسهل توقع تصرفاته. و قد تطلق لفظ “البرمجة” لتصف تحكمك بشكل كامل على شخص أو أداة. هذا بسبب انطباعنا أن الحواسيب تتفاعل مع العالم بناء على برمجة مسبقة يمكن من خلالها توقع تصرفاتها بدقة. و إن كان هذا يحمل جزءا من الحقيقة، فإن الحواسيب لا يمكن توقع تصرفاتها في حالات عديدة. في بقية التدوينة أستعرض بعض الأمثلة لحواسيب (أو برمجيات) تسببت بحوادث تصل في بعض الأحيان إلى تعريض حياة إنسان إلى الخطر.

المثال الأول: برمجيات أنظمة معقدة (حادثة طائرة بوينق 737)

الكثير من الأنظمة الإلكترونية و البرمجية معقدة و تتطلب تفاعل عناصر مختلفة قد تكون خارجة عن سيطرة المبرمج. برمجيات الطائرات مثال على ذلك, و التي تتطلب التفاعل مع متغيرات كثيرة و بشكل سريع. و مما يزيد التعقيد أن صناعة الطائرات هي صناعة تنافسية قد تؤدي إلى المضي في أنظمة و تصاميم بدون النظر إلى جميع التداعيات و المشاكل التي قد تسببها. هذه العوامل هي ما أدت إلى الحوادث المحزنة في تحطم طائرات بوينق 737 مؤخرا.

طائرة بوينق تم تصميمها بحيث تكون قريبة من الأرض مما يجعل استخدامها ممكنا في المطارات الصغيرة. هذا على الرغم من أن محركها كبير الحجم بالنسبة لهذا التصميم. القرب من الأرض و حجم المحرك جعلا المحرك يكون قريبا جدا من الأرض, الأمر الذي يجب تجنبه لأن المحرك القريب من الأرض بإمكانه سحب الحطام الملقى على الأرض. الحل كان رفع المحرك للأعلى ليكون أقرب للجناح و بالتالي بعيدا عن الأرض.

هذا التصميم (رفع المحرك قرب الجناح) أدى لمشكلة؛ خلال تحليق الطائرة, قد يرتفع رأس الطائرة إلى أعلى. لعلاج المشكلة, تمت إضافة برمجية لبوينق بحيث تقوم بقراءة إحداثيات موضع رأس الطائرة مقارنة بباقي الطائرة, و في حال كان الرأس أعلى من مستوى معين, تقوم البرمجية بخفض رأس الطائرة تلقائيا. الحل — على الورق — جيد و يحل المشكلة. الذي لم يحسب حسابه هو إن كانت قراءة الإحداثيات غير صحيحة. للأسف أن هذا ما حدث. أجهزة الإستشعار قامت بإرسال بيانات خاطئة أدت إلى أن البرمجية تقوم بخفض مستوى رأس الطائرة حتى و إن كانت بمستوى طبيعي. النتيجة هي ما تصفه الصورة بالأعلى؛ تذبذب الطائرة بين مستوى عال و منخفض لرأس الطائرة بشكل سريع (في خلال دقائق). هذا التذبذب السريع — والذي سببته برمجية لم تحسب حساب القراءات خاطئة — هو ما أدى في نهاية المطاف إلى النهاية المأساوية و تحطم الطائرة.

المثال الثاني: دمج برمجيات قديمة و جديدة (آلة علاج الأشعة ثيراك-25)

الاحتياطات المتخذة عندما تقوم بأخذ صور بالأشعة السينية كافية لتعطيك انطباعا بخطورة هذه الأشعة و بأن المجتمع الطبي يأخذ بجدية موضوع تعريض المرضى لأشعة غير ضرورية. إلى أن هناك ماض سيء لتعريض المرضى لأشعى قاتلة و المتسبب هي، للأسف، البرمجيات.

جهاز ثيراك هو أحد الأجهزة التي تم تطويرها في أوائل السبعينات لعلاج بعض الحالات المرضية عن طريق تعريض المريض لكمية مقننة من الأشعاعات. في البدء، جهاز ثيراك كان يعمل بالشكل المتوقع في العديد من المستشفيات. نظرا لنجاحه، قامت الشركة المصنعة بتطوير الجهاز لنسخ جديدة لتسهيل استخدامه و جعله أصغر حجما. كان هذا التطوير يسير بشكل جيد، إلى أن جاء ثيراك-٢٥.

التطوير ثيراك-٢٥ قام باستبدال أجزاء كبيرة من البرمجيات و دمجها مع برمجيات و مكونات من النسخ السابقة. أحد التغييرات هو استبدال خواص الكشف عن زيادة الإشعاعات من مكونات الجهاز لتصبح جزء من مهام البرمجيات الجديدة. لسوء تقدير المصنعين، هذه الخاصية (الكشف عن زيادة الإشعاعات) لم تعمل بالشكل المطلوب بسبب أخطاء برمجية. النتيجة هي العديد من حالات التعرض لمستويات عالية من الأشعة (وصلت إلى أكثر من ١٠٠ ضعف المستوى المقبول) أدت إلى وفاة بعض المرضى و التسمم الإشعاعي لآخرين.

المثال الثالث: البرامج المبنية على تعلم الآلة (التعرف على الصور و الروبوتات العنصرية)

مجال تعلم الآلة (و ما حولها كالذكاء الصناعي و علم البيانات) يعتمد على بناء نماذج رياضية و برمجية بناء على بيانات تصف المشكلة. على سبيل المثال، لبناء نموذج برمجي يتعرف إن كانت صورة ما هي لقطة أو لا، يتعين عليك تدريب النموذج البرمجي عن طريق توفير صور كثيرة لقطط (ما تريد التعرف عليه) و صور أخرى. الخوارزمية لبناء النموذج البرمجي تقوم بتحليل هذه الصور إلى أن تبني نموذجا برمجيا يمكنك من التعرف على الصورة إن كانت لقطة أو لشيء آخر. أحد النماذج التي تلقى اهتماما كثيرا في الوقت الحالي هي الشبكات العصبونية (neural networks).

الكثير من النماذج البرمجية لتعلم الآلة (و من ضمنها نماذج الشبكات العصبونية) لا يمكن بشكل كامل فهم و دراسة طريقة عملها و إعطائها للنتائج كما تفهم و تدرس الخوارزميات و البرمجيات التقليدية. ذلك يصعب معرفة و توقع تصرف نموذج تعلم الآلة بعيدا عن البيانات التي استخدمت في تدريبه (و يعد هذا أحد المشاكل البحثية الحالية التي تتم دراستها في هذا المجال). على سبيل المثال، إن كانت جميع البيانات المستخدمة في التدريب هي لقطط و كلاب، من الصعب معرفة تعامل النموذج البرمجي مع صورة لفأر.

هذه الفجوة في فهم نماذج تعلم الآلة تؤدي إلى إمكانية حدوث نتائج غير متوقعة سواء بغير قصد أو بسبب طرف خبيث. مثال لاقى انتشارا مؤخرا هو امكانية خداع نماذج تحديد الصور. كما توضح الصورة السابقة، حمل صورة معينة و اظهارها يجعل النموذج البرمجي غير قادرعلى تحديد الشخص الذي يحملها. مثال آخر حدث قريبا هو عندما أطلقت مايكروسوفت روبوت محادثة في تويتر ليتعلم المحادثة من محادثات المغردين في تويتر. قام الكثير من المغردين بالتلاعب بالأقوال المدخلة و اغراقه بأقوال سيئة و عنصرية مما أدى إلى أن النموذج البرمجي أصبح يحاكيهم في أقوالهم و عنصريتهم.

خلاصة

بعكس ما يظن الكثير، البرمجيات قد لا تتصرف كما نريد. استعرضنا بعض الأمثلة لهذه الحالات لكنها ليست الوحيدة. هناك أيضا إمكانية اختراق البرمجيات مما يجعلها تحت تصرف جهة خبيثة و هناك دوما احتمالية أسباب جديدة لا نعلم عنها الآن قد تؤدي إلى برمجيات كاذبة. ذلك يطرح سؤالا مهما: كيف يمكننا كمبرمجين و مختصين تجاوز هذه الحوادث بطرق شاملة للأسباب الحالية و المستقبلية. الإجابة على هذا السؤال في الأجزاء القادمة.

مصادر و للاستزادة

1. “Boeing says faulty sensor contributed to 737 Max crashes”

2. Leveson, Nancy G., and Clark S. Turner. “An investigation of the Therac-25 accidents.” Computer 26.7 (1993): 18-41.

3. “Academics hide humans from surveillance cameras with 2D prints”

4. “Microsoft unveils a new (and hopefully not racist) chat bot”

ابتسم عندما يتأخر الرد في ألعاب الإنترنت الجماعية

اكتب أول تعليق

تشرفت بالمشاركة بتدوينة في موقع زاش المتخصص في تقنية و صناعة الألعاب. التدوينة تتحدث عن ألعاب الإنترنت الجماعية الكثيفة و المشاكل التي يواجهها المطورون لدعم أعداد اللاعبين الكبيرة.

التدوينة يمكن الوصول لها من هذا الرابط. قراءة ممتعة!