1,721,018 research outputs found
A study of topic and topic change in conversational threads
This thesis applies Latent Dirichlet Allocation (LDA) to the problem of topic and topic change in conversational threads using e-mail. We demonstrate that LDA can be used to successfully classify raw e-mail messages with threads to which they belong, and compare the results with those for processed threads, where quoted and reply text have been removed. Raw thread classification performs better, but processed threads show promise. We then present two new, unsupervised techniques for identifying topic change in e-mail. The first is a keyword clustering approach using LDA and DBSCAN to identify clusters of topics, and transition points between them. The second is a sliding window technique which assesses the current topic for every window, identifying transition points. The keyword clustering performs better than the sliding window approach. Both can be used as a baseline for future work.Approved for public release; distribution is unlimited.NASA Ames Research Center author (civilian).http://archive.org/details/astudyoftopicndt10945457
An automated acquisition system for media exploitation
uses DBUS and the Hardware Abstraction Layer (HAL) to automatically detect device insertion and start forensic imaging.Approved for public release; distribution is unlimited.Federal Cyber Corps author (civilian).http://archive.org/details/anutomatedcquisi10945409
Exploring and validating data mining algorithms for use in data ascription
Digital forensics is a growing and important field of research for current intelligence, law enforcement, and military organizations today. As more information is stored in digital form, the need and ability to analyze and process this information for relevant evidence has grown in complexity. Today analysis is reliant upon trained experts. This, compounded with the sheer volume of evidence obtained from the field, means that analysis frequently takes too long. Current forensic tools focus on decoding and visualization and not data reduction or correlation. This thesis fills an important void. The first goal is to determine whether it is possible to use file metadata accurately to ascribe ownership of files based upon a hard drive with multiple users. The second is to explore and validate existing algorithms that may support and aid data ascription. The last goal of this work is to compare and measure the accuracy of these algorithms. This work facilitates further research into developing an automated analysis and reporting framework for media exploitation in computer forensics.Approved for public release; distribution is unlimited.US Army (USA) author.http://archive.org/details/exploringndvalid10945412
Exploration and validation of the sdhash parameter space
Cryptographic hashes are commonly used to aid in the examination of digital evidence by providing a method of rapidly identifying targeted content (e.g., incriminating materials) in large quantities of data. Because only exact matches can be detected, this method is easily defeated by even the smallest modification to the data. Approximate matching techniques maintain nearly the speed and space efficiency advantages of cryptographic hashes, while offering a more robust scheme for detecting similar objects. We seek to validate design choices in sdhash, the current state-of-the-art approximate matching algorithm, and suggest alternatives where appropriate. In addition, we clarify various nuances regarding the interpretation of its output so that it can be more effectively applied to forensic analysis. To this end, we provide a detailed analysis of sdhash’s behavior across a variety of relevant scenarios using the FRASH testing framework, and propose strategies for extracting more relevant and granular feedback.Approved for public release; distribution is unlimited.Outstanding ThesisCivilian, Department of the Navyhttp://archive.org/details/explorationndval109453470
Optimal sector sampling for drive triage
With digital storage becoming cheaper, bigger, and more prevalent, finding evidence from the hard drives collected for a case is too difficult and time consuming. Simply reading an entire drive takes hours and it takes even longer to analyze the drive for deleted files and data fragments. Investigations frequently involve multiple drives, and this traditional method of reading entire drives for analysis simply cannot keep up in modern cases. Furthermore, investigators often search drives only for known files, which we call target data, that could help identify a drive holding evidence such as child pornography or malware. Triage is needed to sift through drives to quickly identify drives containing target data. One way is by randomly sampling drive data to find known files or to give a confidence that less than some small amount is present. We determine the optimal sampling strategy bypassing the file system to find even deleted files and fragments in minimum time with maximum confidence. With 15 minutes of sampling we can give a 90% confidence that less than 10MiB of target data is present on a 500GB hard disk drive. By using statistical sampling in combination with sector hashing, our software forms an efficient triage tool for digital forensics.Approved for public release; distribution is unlimited.Outstanding ThesisCivilian, Department of the Navyhttp://archive.org/details/optimalsectorsam109453475
Digital authentication for official bulk email
Official bulk email is an efficient tool for disseminating information to a wide audience. Its inherent efficiency, captive audience, and trust provide a dangerous attack vector for adversaries utilizing fraudulent email. Digital authentication can provide a layer of defense to official bulk email that, combined with other defensive countermeasures, will greatly reduce its vulnerabilities. The Department of Defense mandates that official emails, which contain hyperlinks, attachments, or instructions to recipients, must contain a digital signature, authenticating the source of the email, and ensuring the integrity of its contents. This policy, though used at some military installations, is not being applied to official bulk email at others due to administrative roadblocks in obtaining role-based certificates, and implementing an authentication policy with legacy email systems. This thesis identified administrative roadblocks in deploying digital authentication solutions within the Department of Defense, explored different technology options of a digital authentication solution for official bulk email, created a proof of concept solution using a Python proxy server and S/MIME, and looked at the most popular mail user agents to see how they interpret S/MIME digital signatures. Applying digital authentication to official bulk email will close a potentially critical vulnerability in the defense of DoD networks.Approved for public release; distribution is unlimited.US Navy (USN) author.http://archive.org/details/digitaluthentica10945490
Leaking Sensitive Information in Complex Document Files - and How to Prevent It
Complex document formats such as PDF and Microsoft’s Compound File Binary Format can contain
information that is hidden but recoverable, as a result of text highlighting, cropping, or the embedding
of high-resolution JPEG images. Private information can be released inadvertently if these fi les are
distributed in electronic form. Simple experiments involving the creation of test documents can
determine whether a particular program embeds hidden information
TCP analysis with tcpflow; packet carving with bulk_extractor
This talk presents new carving and analysis features in
tcpflow and bulk_extractor
Document and media exploitation
The article of record as published may be located at http://dx.doi.org/10.1145/1331287.1331294A computer used by Al Qaeda ends up in the hands of a Wall Street Journal reporter. A laptop drom Iran is discoverd that contains details of that country's nuclear weapons program. Photographs and videos are downloaded from terrorist websites. As evidenced by these and countless other cases, digital documents and storage devices hold the key to many ongoing military and criminal investigations...This sort of advanced analysis is the stuff of DOMEX, the little known intelligence practice of document and media exploitation. The Domex challenge is to turn digital bits into actionable intelligence
- …
