为编程爱好者分享易语言教程源码的资源网

网站首页 > 网络编程 > 其它综合 正文

Java BitSet学习

三叶资源网 2023-01-09 20:18:50 其它综合 235 ℃ 0 评论

该类在java的java.util包中。

基本原理

BitSet是位操作的对象,值只有0或1即false和true,内部维护了一个long数组,初始只有一个long,所以BitSet最小的size是64,当随着存储的元素越来越多,BitSet内部会动态扩充,最终内部是由N个long来存储,这些针对操作都是透明的。
用1位来表示一个数据是否出现过,0为没有出现过,1表示出现过。使用用的时候既可根据某一个是否为0表示,此数是否出现过。
一个1G的空间,有 8*1024*1024*1024=8.58*10^9bit,也就是可以表示85亿个不同的数。

该类的成员变量

/*
     * BitSets are packed into arrays of "words."  Currently a word is
     * a long, which consists of 64 bits, requiring 6 address bits.
     * The choice of word size is determined purely by performance concerns.
     */
    private final static int ADDRESS_BITS_PER_WORD = 6;
    private final static int BITS_PER_WORD = 1 << ADDRESS_BITS_PER_WORD;
    private final static int BIT_INDEX_MASK = BITS_PER_WORD - 1;

    /* Used to shift left or right for a partial word mask */
    private static final long WORD_MASK = 0xffffffffffffffffL;

    /**
     * @serialField bits long[]
     *
     * The bits in this BitSet.  The ith bit is stored in bits[i/64] at
     * bit position i % 64 (where bit position 0 refers to the least
     * significant bit and 63 refers to the most significant bit).
     */
    private static final ObjectStreamField[] serialPersistentFields = {
        new ObjectStreamField("bits", long[].class),
    };

    /**
     * The internal field corresponding to the serialField "bits".
     */
    private long[] words;

    /**
     * The number of words in the logical size of this BitSet.
     */
    private transient int wordsInUse = 0;

    /**
     * Whether the size of "words" is user-specified.  If so, we assume
     * the user knows what he's doing and try harder to preserve it.
     */
    private transient boolean sizeIsSticky = false;

    /* use serialVersionUID from JDK 1.0.2 for interoperability */
    private static final long serialVersionUID = 7997698588986878753L;

wordIndex函数,代码如下

/**
 * Given a bit index, return word index containing it.
 */
private static int wordIndex(int bitIndex) {
    return bitIndex >> ADDRESS_BITS_PER_WORD;
}

这个函数用来获取某个数在words数组中的索引,通过bitIndex右移ADDRESS_BITS_PER_WORD位,即右移6位。

正常情况下,右移n位得到的值为原值除以2的n次方,而2的6次方是64,也就是bitIndex/64,words数组中每个元素能够代表64个数字,所以存储的值除以64来得到数组索引值,例如0到63的索引值为0,那么这64个数字划归到第一个数组元素中,64到127划归到第二个数组元素中,以此类推。

checkInvariants()函数,代码如下

重要的内部检查函数。

/**
 * Every public method must preserve these invariants.
 */
private void checkInvariants() {
    assert(wordsInUse == 0 || words[wordsInUse - 1] != 0);
    assert(wordsInUse >= 0 && wordsInUse <= words.length);
    assert(wordsInUse == words.length || words[wordsInUse] == 0);
}

这个函数用来验证当前对象的有效性。

recalculateWordsInUse()函数,代码如下

重要的内部检查函数。

/**
 * Sets the field wordsInUse to the logical size in words of the bit set.
 * WARNING:This method assumes that the number of words actually in use is
 * less than or equal to the current value of wordsInUse!
 */
private void recalculateWordsInUse() {
    // Traverse the bitset until a used word is found
    int i;
    for (i = wordsInUse-1; i >= 0; i--)
        if (words[i] != 0)
            break;

    wordsInUse = i+1; // The new logical size
}

这个函数用来计算当前已经使用了的最大数组元素个数。

初始化函数

/**
 * Creates a new bit set. All bits are initially {@code false}.
 */
public BitSet() {
   //创建一个words数组,大小为1
    initWords(BITS_PER_WORD);
    //该数组大小不限定
    sizeIsSticky = false;
}

/**
 * Creates a bit set whose initial size is large enough to explicitly
 * represent bits with indices in the range {@code 0} through
 * {@code nbits-1}. All bits are initially {@code false}.
 *
 * @param  nbits the initial size of the bit set
 * @throws NegativeArraySizeException if the specified initial size
 *         is negative
 */
public BitSet(int nbits) {
    // nbits can't be negative; size 0 is OK
    if (nbits < 0)
        throw new NegativeArraySizeException("nbits < 0: " + nbits);

    initWords(nbits);
    //该数组大小限定
    sizeIsSticky = true;
}

 /**
     * Creates a bit set using words as the internal representation.
     * The last word (if there is one) must be non-zero.
     */
    private BitSet(long[] words) {
      //将该数组赋值给BitSet对象中的words数组
        this.words = words;
      //将当前数组正在使用的最大数组索引设置为数组元素大小
        this.wordsInUse = words.length;
       //验证当前对象的合法性
        checkInvariants();
    }

两个构造函数,分别是一个指定了初始大小,一个没指定。如果没指定,我们可以看到默认的初始大小为, 2^6 = 64-1=63 bit. 我们知道java中long的大小就是8个字节,也就是8*8=64bit。也就是说,bitset默认的是一个long整形的大小。初始化函数指定了必要的大小。

注:如果指定了bitset的初始化大小,那么会把它规整到一个大于或者等于这个数字的64的整倍数。比如64位,bitset的大小是1long,而65位时,bitset大小是2long,即128位。做这么一个规定,主要是为了内存对齐,同时避免考虑到不要处理特殊情况,简化程序。

initWords函数,代码如下

private void initWords(int nbits) {
    words = new long[wordIndex(nbits-1) + 1];
}

这个函数用来创建数组words,该数组元素大小通过wordIndex(nbits-1)+1来获取,而wordIndex上面我们讲了,用来得到某个值所在words数组的索引,然后创建words数组的时候一定要保证数组个数大于等于该值所在的索引值。

valueof函数,代码如下

/**
 * Returns a new bit set containing all the bits in the given long array.
 *
 * <p>More precisely,
 * <br>{@code BitSet.valueOf(longs).get(n) == ((longs[n/64] & (1L<<(n%64))) != 0)}
 * <br>for all {@code n < 64 * longs.length}.
 *
 * <p>This method is equivalent to
 * {@code BitSet.valueOf(LongBuffer.wrap(longs))}.
 *
 * @param longs a long array containing a little-endian representation
 *        of a sequence of bits to be used as the initial bits of the
 *        new bit set
 * @return a {@code BitSet} containing all the bits in the long array
 * @since 1.7
 */
public static BitSet valueOf(long[] longs) {
    int n;
  //倒序从longs数组中获取第一个不为0的元素所在的索引值
    for (n = longs.length; n > 0 && longs[n - 1] == 0; n--)
        ;
   //从longs数组中拷贝索引从0到n的元素,放到新创建的数组中,然后将该新数组构造BitSet类对象
    return new BitSet(Arrays.copyOf(longs, n));
}

/**
 * Returns a new bit set containing all the bits in the given long
 * buffer between its position and limit.
 *
 * <p>More precisely,
 * <br>{@code BitSet.valueOf(lb).get(n) == ((lb.get(lb.position()+n/64) & (1L<<(n%64))) != 0)}
 * <br>for all {@code n < 64 * lb.remaining()}.
 *
 * <p>The long buffer is not modified by this method, and no
 * reference to the buffer is retained by the bit set.
 *
 * @param lb a long buffer containing a little-endian representation
 *        of a sequence of bits between its position and limit, to be
 *        used as the initial bits of the new bit set
 * @return a {@code BitSet} containing all the bits in the buffer in the
 *         specified range
 * @since 1.7
 */
public static BitSet valueOf(LongBuffer lb) {
    lb = lb.slice();
    int n;
    for (n = lb.remaining(); n > 0 && lb.get(n - 1) == 0; n--)
        ;
    long[] words = new long[n];
    lb.get(words);
    return new BitSet(words);
}

/**
 * Returns a new bit set containing all the bits in the given byte array.
 *
 * <p>More precisely,
 * <br>{@code BitSet.valueOf(bytes).get(n) == ((bytes[n/8] & (1<<(n%8))) != 0)}
 * <br>for all {@code n <  8 * bytes.length}.
 *
 * <p>This method is equivalent to
 * {@code BitSet.valueOf(ByteBuffer.wrap(bytes))}.
 *
 * @param bytes a byte array containing a little-endian
 *        representation of a sequence of bits to be used as the
 *        initial bits of the new bit set
 * @return a {@code BitSet} containing all the bits in the byte array
 * @since 1.7
 */
public static BitSet valueOf(byte[] bytes) {
    return BitSet.valueOf(ByteBuffer.wrap(bytes));
}

/**
 * Returns a new bit set containing all the bits in the given byte
 * buffer between its position and limit.
 *
 * <p>More precisely,
 * <br>{@code BitSet.valueOf(bb).get(n) == ((bb.get(bb.position()+n/8) & (1<<(n%8))) != 0)}
 * <br>for all {@code n < 8 * bb.remaining()}.
 *
 * <p>The byte buffer is not modified by this method, and no
 * reference to the buffer is retained by the bit set.
 *
 * @param bb a byte buffer containing a little-endian representation
 *        of a sequence of bits between its position and limit, to be
 *        used as the initial bits of the new bit set
 * @return a {@code BitSet} containing all the bits in the buffer in the
 *         specified range
 * @since 1.7
 */
public static BitSet valueOf(ByteBuffer bb) {
    bb = bb.slice().order(ByteOrder.LITTLE_ENDIAN);
    int n;
    for (n = bb.remaining(); n > 0 && bb.get(n - 1) == 0; n--)
        ;
    long[] words = new long[(n + 7) / 8];
    bb.limit(n);
    int i = 0;
    while (bb.remaining() >= 8)
        words[i++] = bb.getLong();
    for (int remaining = bb.remaining(), j = 0; j < remaining; j++)
        words[i] |= (bb.get() & 0xffL) << (8 * j);
    return new BitSet(words);
}

ensureCapacity()函数,代码如下

/**
 * Ensures that the BitSet can hold enough words.
 * @param wordsRequired the minimum acceptable number of words.
 */
private void ensureCapacity(int wordsRequired) {
  //如果words的大小小于所需要的
    if (words.length < wordsRequired) {
        //扩容最终的大小,最小为原来的两倍
        int request = Math.max(2 * words.length, wordsRequired);
      //创建新的数组,容量为request,然后将原来的数组拷贝到新的数组中
        words = Arrays.copyOf(words, request);
      //数组大小不固定的标志
        sizeIsSticky = false;
    }
}

这个函数用来动态扩容。

flip()函数,用来将指定的位数取反操作,也就是原来位为0,取反后变成1,原来为1,取反后变成0。

set函数,用来将指定的位设置为1。

clear函数,用来将指定的位设置成0。

get函数,用来获取某一位是否有效,或者获取某个BitSet类对象的指定范围的数值到一个新的BitSet类对象中。

nextSetBit()函数,获取从fromIndex开始第一个设置为1的bit索引值。

nextClearBit函数,获取从fromIndex开始第一个设置为0的bit索引值。

previousSetBit函数

将该fromIndex所在的bit以及它的低位设置成1,然后去和相应的words数组元素进行&操作(此时fromIndex所在的bit以及它的低位之外的其他bit都变成了0),如果不为0,说明该words元素上有符合要求的bit位,然后查找的规则就是从高位开始查找,直到遇到第一个bit为1的,就是我们要找的。

cardinality函数,代码如下

/**
 * Returns the number of bits set to {@code true} in this {@code BitSet}.
 *
 * @return the number of bits set to {@code true} in this {@code BitSet}
 * @since  1.4
 */
public int cardinality() {
    int sum = 0;
    for (int i = 0; i < wordsInUse; i++)
        sum += Long.bitCount(words[i]);
    return sum;
}

用来获取words数组中每个元素bit为1的总bit数。

使用场景

常见的应用是那些需要对海量数据进行一些统计工作的时候,比如日志分析、用户数统计等等。

阿里面试题:有1千万个随机数,随机数的范围在1到1亿之间。现在要求写出一种算法,将1到1亿之间没有在随机数中的数求出来?

public static void main(String[] args){
        Random random=new Random();
        
        List<Integer> list=new ArrayList<>();
        for(int i=0;i<10000000;i++){
            int randomResult=random.nextInt(100000000);
            list.add(randomResult);
        }
        System.out.println("产生的随机数有");
        for(int i=0;i<list.size();i++){
            System.out.println(list.get(i));
        }
        BitSet bitSet=new BitSet(100000000);
        for(int i=0;i<10000000;i++) {
            bitSet.set(list.get(i));
        }
        
        System.out.println("0~1亿不在上述随机数中有"+bitSet.cardinality());
        for (int i = 0; i < 100000000; i++) {
            if(!bitSet.get(i)){
                System.out.println(i);
            }
        }     
    }

来源:三叶资源网,欢迎分享,公众号:iisanye,(三叶资源网⑤群:21414575

本文暂时没有评论,来添加一个吧(●'◡'●)

欢迎 发表评论:

百度站内搜索
关注微信公众号
三叶资源网⑤群:三叶资源网⑤群

网站分类
随机tag
易语言通讯蓝牙类库极速文件分割类异或加解密SmartQQ协议源码node模块迅雷播放器引擎支持库源码超级编辑框应用采集源码亦表格黑月版unicode字符烧饼帝多线程教程Quoted资料画图形按钮淘宝时间同步阿里系最新地址库GDIPlus类
最新评论